DFG FOR “Learning to Sense” – Repositories - ZESS

ABOUT – L2S NEWS – L2S VISION – L2S TEAM – L2S PROJECTS – L2S REPOS – L2S COLLABORATORS – L2S TALKS – OPEN POSITIONS

L2S Repositories and Data Sets

The research unit “Learning to Sense” makes its data sets as well as the code repositories publicly available. If a project has eitehr a data set or both a data set and a code repository, it will be listed under DATA SETS. Projects with only a code repository can be found under CODE REPOSITORIES.

A) DATA SETS

SynthEllips

https://github.com/Muhammad-Kazim/qpi-deep-cwfs

SynthEllips is a coded wavefront sensing-based synthetic dataset consisting of 20.000 datapoints created using this repository. For each datapoint, a refractive index volume composed of a random configuration of ellipsoids (refractive indices, positions, diameters, and rotations) is imaged using a wave-optical simulation of the Coded Wavefront Sensing pipeline to create a reference-specimen speckle image pair. The amplitude and scaled gradient vector field in the phase mask are also provided to assist with the supervised training of optical flow neural networks to estimate the quantitative phase from a reference-specimen speckle image pair. Training with SynthEllips demonstrated strong generalization to real biological specimens recorded experimentally, and to optical systems with different diffusers/phase masks and microscopes. The QPI performance of these networks is found to be quantitatively and qualitatively superior to classical phase retrieval methods.

Synthetic Dataset:
https://doi.org/10.5281/zenodo.18983874

Further Reading:
Syed Muhammad Kazim, Patrick Mueller, Andrii Nehrych, Ivo Ihrke: Quantitative phase imaging with deep1 coded wavefront sensing. Preprint, Optica Open 2026.

RAWDet-7

https://github.com/Mishalfatima/RawDet-7

Most vision models are trained on RGB images processed through ISP pipelines optimized for human perception, which can discard sensor-level information useful for machine reasoning. RAW images preserve unprocessed scene data, enabling models to leverage richer cues for both object detection and object description, capturing fine-grained details, spatial relationships, and contextual information often lost in processed images. To support research in this domain, we introduce RAWDet-7, a large-scale dataset of ~25k training and 7.6k test RAW images collected across diverse cameras, lighting conditions, and environments, densely annotated for seven object categories following MS-COCO and LVIS conventions

Further reading:
Mishal Fatima*, Shashank Agnihotri*, Kanchana Vaishnavi Gandikota, Michael Moeller, Margret Keuper: RAWDet-7: A Multi-Scenario Benchmark for Object Detection and Description on Quantized RAW Images. arXiv preprint, arXiv:2602.03760 (2026).

RobustSpring

https://spring-benchmark.org/

Learning-based vision systems for scene understanding must not only be accurate, but also robust to degradations that arise in real-world sensing conditions, such as blur, noise, compression artifacts, and adverse weather. To support research in this direction, we introduce RobustSpring, a benchmark dataset for evaluating robustness to image corruptions in optical flow, scene flow, and stereo. Based on the Spring benchmark, RobustSpring provides 20 corrupted versions of stereo video data, with corruptions integrated consistently in time, stereo, and depth where applicable. In total, the dataset contains 40,000 frames, or 20,000 stereo frame pairs, and enables standardized robustness evaluation alongside accuracy on the same benchmark. This makes it possible to systematically study the stability and real-world applicability of dense matching models under challenging visual conditions.

Dataset:
https://darus.uni-stuttgart.de/dataset.xhtml?persistentId=doi:10.18419/DARUS-5047

Further reading:
Victor Oei*, Jenny Schmalfuss*, Lukas Mehl, Madlen Bartsch, Shashank Agnihotri, Margret Keuper, Andreas Bulling, Andres Bruhn: RobustSpring: Benchmarking Robustness to Image Corruptions for Optical Flow, Scene Flow and Stereo. In The Fourteenth International Conference on Learning Representations (ICLR) (2026).

GeoDiv

https://abhipsabasu.github.io/geodiv/samples.html

Learning-based vision systems are increasingly used to generate visual content, yet their outputs often lack geographical diversity and can reinforce region-specific socio-economic stereotypes. In the spirit of Learning to Sense, this raises the broader question of how well modern generative models capture the diversity of the visual world across different countries and contexts, and how such behavior can be measured systematically. To support research in this direction, we introduce GeoDiv, a framework and dataset for measuring geographical diversity in text-to-image models. GeoDiv comprises 160,000 synthetic images generated by four open-source diffusion models across 10 common entities and 16 countries, together with structured annotations and interpretable diversity scores along socio-economic and visual dimensions. This enables systematic analysis of geographical bias and diversity in generative vision systems, and supports the study of how learning-based models represent the world beyond conventional performance metrics.

Dataset:
https://drive.google.com/drive/folders/1QtXcxCzPq8iteq1FehFDjNmLMKZhaLUE

Further reading:
Abhipsa Basu*, Mohana Singh*, Shashank Agnihotri, Margret Keuper, R. Venkatesh Babu: GeoDiv: Framework for Measuring Geographical Diversity in Text-to-Image Models. In The Fourteenth International Conference on Learning Representations (ICLR), (2026).

InverseTHzSim

https://github.com/memamsaleh/InverseThzSim

A differentiable electromagnetic wave simulator at mm-wave and terahertz frequencies for reconstruction-free parameter estimation.

Further reading:
Mohamed Saleh, Matthias Kahl, Peter Haring Bolívar, Andreas Kolb: Direct Detection of Object Parameters From Raw Data in MIMO-SAR Imaging at 235–270 GHz via Inverse Simulation. In IEEE Journal of Microwaves, vol. 6, no. 1, pp. 126-136 (Jan. 2026).

NAG

https://github.com/jp-schneider/nag

Learning editable high-resolution scene representations for dynamic scenes is an open problem with applications across the domains from autonomous driving to creative editing – the most successful approaches today make a trade-off between editability and supporting scene complexity: neural atlases represent dynamic scenes as two deforming image layers, foreground and background, which are editable in 2D, but break down when multiple objects occlude and interact. In contrast, scene graph models make use of annotated data such as masks and bounding boxes from autonomous-driving datasets to capture complex 3D spatial relationships, but their implicit volumetric node representations are challenging to edit view-consistently. We propose Neural Atlas Graphs (NAGs), a hybrid high-resolution scene representation, where every graph node is a view-dependent neural atlas, facilitating both 2D appearance editing and 3D ordering and positioning of scene elements. Fit at test-time, NAGs achieve state-of-the-art quantitative results on the Waymo Open Dataset – by 5 dB PSNR increase compared to existing methods – and make environmental editing possible in high resolution and visual quality – creating counterfactual driving scenarios with new backgrounds and edited vehicle appearance. We find that the method also generalizes beyond driving scenes and compares favorably – by more than 7 dB in PSNR – to recent matting and video editing baselines on the DAVIS video dataset with a diverse set of human and animal-centric scenes.

Project Page:
https://princeton-computational-imaging.github.io/nag/

Further Reading:
Jan Philipp Schneider, Pratik Singh Bisht, Ilya Chugunov, Andreas Kolb, Michael Moeller, Felix Heide: Neural Atlas Graphs for Dynamic Scene Decomposition and Editing. arXiv preprint, arXiv:2509.16336, (2025).

LEECH

https://zenodo.org/records/17021089

A Hirudo T.S. (leech) sample was imaged using a Fourier Ptychographic Microscope (FPM) under green LED illumination. The optical system comprises a 10× objective lens with a numerical aperture of 0.3 which allows to image a sufficiently wide field of view of the sample. Using the FPM setup, a sequence of low-resolution intensity images was captured by sequentially illuminating the sample from different angles using an LED array. For each illumination angle, multiple exposures were taken and merged to form High Dynamic Range (HDR) images, allowing better preservation of both dark and bright regions in the sample. The HDR dataset was captured and processed by John Meshreki [1] at the CSE Optics Lab, ZESS, University of Siegen.

Further reading:
[1] John Meshreki, Syed Muhammad Kazim, Ivo Ihrke: Optical system characterization in Fourier ptychographic microscopy. In Opt. Continuum 3, pp. 2218-2231 (2024).

WEAR

https://mariusbock.github.io/wear/

Research has shown the complementarity of camera- and inertial-based data for modeling human activities, yet datasets with both egocentric video and inertial-based sensor data remain scarce. In this paper, we introduce WEAR, an outdoor sports dataset for both vision- and inertial-based human activity recognition (HAR). Data from 22 participants performing a total of 18 different workout activities was collected with synchronized inertial (acceleration) and camera (egocentric video) data recorded at 11 different outside locations. WEAR provides a challenging prediction scenario in changing outdoor environments using a sensor placement, in line with recent trends in real-world applications. Benchmark results show that through our sensor placement, each modality interestingly offers complementary strengths and weaknesses in their prediction performance. Further, in light of the recent success of single-stage Temporal Action Localization (TAL) models, we demonstrate their versatility of not only being trained using visual data, but also using raw inertial data and being capable to fuse both modalities by means of simple concatenation.

Further reading:
Marius Bock, Hilde Kuehne, Kristof Van Laerhoven, Michael Moeller: WEAR: An Outdoor Sports Dataset for Wearable and Egocentric Activity Recognition. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Volume 8, Issue 4, Article No.: 175, pp. 1-21 (2024).

OpticsBench

https://github.com/PatMue/classification_robustness
https://github.com/PatMue/opticsbench_generate

Deep neural networks (DNNs) are increasingly deployed in safety-critical computer vision tasks. Therefore, vision models have to behave robustly to disturbances such as noise or blur. While seminal benchmarks exist to evaluate model robustness to diverse corruptions, blur is often approximated in an overly simplistic way to model defocus, while ignoring the different blur kernel shapes that result from lenses. To bridge this gap, we introduce OpticsBench, which evaluates primary aberrations such as coma, defocus, and astigmatism, i.e. aberrations that can be represented by varying a single parameter of Zernike polynomials. In addition, the provided code allows to create user-defined aberrations to investigate e.g. the model robustness to a specific lens. OpticsAugment effectively enhances model robustness to optical blur and common corruptions.

Further reading:
Patrick Müller, Alexander Braun, Margret Keuper: Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models. In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) Vol. 48, Issue 3, pp. 2139-2153 (2026).
Patrick Müller, Alexander Braun, Margret Keuper: Classification Robustness to Common Optical Aberrations. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pp. 3632-3643 (2023).

AWESOME

https://github.com/jp-schneider/awesome

Image segmentation has greatly advanced over the past ten years. Yet, even the most recent techniques face difficulties producing good results in challenging situations, e.g., if training data are scarce, out-of-distribution examples need to be segmented, or if objects are occluded. In such situations, the inclusion of (geometric) constraints can improve the segmentation quality significantly. In this paper, we study the constraint of the segmented region being segmented convex. Unlike prior work that encourages such a property with computationally expensive penalties on segmentation masks represented explicitly on a grid of pixels, our work is the first to consider an implicit representation. Specifically, we represent the segmentation as a parameterized function that maps spatial coordinates to the likeliness of a pixel belonging to the fore- or background. This enables us to provably ensure the convexity of the segmented regions with the help of input convex neural networks. Numerical experiments demonstrate how to encourage explicit and implicit representations to match in order to benefit from the convexity constraints in several challenging segmentation scenarios.

Further Reading:
Jan Philipp Schneider, Mishal Fatima, Jovita Lukasik, Andreas Kolb, Margret Keuper, Michael Moeller: Implicit representations for constrained image segmentation. In International Conference on Machine Learning (ICML), pp. 43765-43790 (2024).
Jan Philipp Schneider, Mishal Fatima, Jovita Lukasik, Andreas Kolb, Margret Keuper, Michael Moeller: Implicit Representations for Image Segmentation. In UniReps Workshop: Unifying Representations in Neural Models. (2023).

DEFECT-GFRT

https://doi.org/10.5281/zenodo.18863663

This dataset contains experimental FMCW terahertz imaging measurements of glass fiber reinforced thermoplastic (GFRT) composites for non-destructive testing. It includes raw THz data and processing scripts for analyzing samples with and without defects such as delamination and consolidation variations. The dataset was first introduced in the following publication:

Further Reading:
A. Souliman, M. Kahl, D. Stock, M. Möller, B. Engel and P. H. Bolívar: Defect Detection in Bidirectional Glass Fabric Reinforced Thermoplastics Based on 3-D-THz Imaging. in IEEE Transactions on Terahertz Science and Technology, vol. 13, no. 3, pp. 209-220, May 2023, doi: 10.1109/TTHZ.2023.3247609.

B) CODE REPOSITORIES

CWFS

https://github.com/Muhammad-Kazim/Deep-Coded-Wavefront-Sensing—Bridging-the-Simulation-Experiment-Gap

Coded wavefront sensing (CWFS) is a recent computational quantitative phase imaging technique that enables one-shot phase retrieval of biological and other phase specimens. CWFS is readily integrable with standard laboratory microscopes and does not require specialized labor for its usage. The CWFS phase retrieval method is inspired by optical flow, but uses conventional optimization techniques. A main reason for this is the lack of publicly available datasets for CWFS, which prevents researchers from using deep neural networks in CWFS. In this paper, we present a forward model that utilizes wave optics to generate SynthBeads: a CWFS dataset obtained by modeling the complete experimental setup, including wave propagation through refractive index (RI) volumes of spherical microbeads, a standard microscope, and the phase mask, which is a key component of CWFS, with high fidelity. We show that our forward model enables deep CWFS, where pre-trained optical flow networks finetuned on SynthBeads successfully generalize to our SynthCell dataset, experimental microbead measurements, and, remarkably, complex biological specimens, providing quantitative phase estimates and thereby bridging the simulation-experiment gap.

Further Reading:
S. M. Kazim, P. Müller, and I. Ihrke: Deep Coded Wavefront Sensing: Bridging the Simulation–Experiment Gap. NeurIPS 2025 Workshop: Learning to Sense.

DispBench: Benchmarking Disparity Estimation to Synthetic Corruptions

https://github.com/shashankskagnihotri/benchmarking_robustness/tree/disparity_estimation/final/disparity_estimation

Learning-based disparity estimation methods are typically evaluated on clean benchmark data, although real-world sensing conditions often introduce degradations such as noise, blur, and other image corruptions. To support research in this direction, we introduce DispBench, a benchmarking framework for studying the robustness and generalization of disparity estimation methods under synthetic corruptions. The framework enables standardized evaluation beyond in-distribution accuracy and makes it possible to analyze how well disparity models remain reliable under challenging visual conditions.

Further Reading:
Shashank Agnihotri*, Amaan Ansari*, Annika Dackermann*, Fabian Rösch*, Margret Keuper: DISPBENCH. Benchmarking Disparity Estimation to Synthetic Corruptions. Accepted at CVPR 2025 Workshop on Synthetic Data for Computer Vision.

CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasks

https://github.com/shashankskagnihotri/cospgd

Adversarial robustness research has largely focused on image classification, while pixel-wise prediction tasks such as semantic segmentation, optical flow, and disparity estimation remain comparatively underexplored. To support research in this direction, we introduce CosPGD, an efficient white-box adversarial attack for dense prediction tasks. By providing a practical and effective attack formulation tailored to pixel-wise outputs, CosPGD enables systematic robustness evaluation of models whose predictions are structured over the full image domain rather than a single label.

Further Reading:
Shashank Agnihotri, Steffen Jung, Margret Keuper: CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasks. Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024.

SemSegBench & DetecBench: Benchmarking Reliability and Generalization Beyond Classification

https://github.com/shashankskagnihotri/benchmarking_reliability_generalization

Learning-based vision systems are increasingly deployed in real-world settings, yet their reliability under distribution shifts and adversarial perturbations remains insufficiently understood beyond image classification. To support research in this direction, we introduce SemSegBench and DetecBench, benchmarking frameworks for semantic segmentation and object detection. They enable large-scale, standardized evaluation of reliability and generalization across architectures, backbones, and model sizes, making it possible to systematically study robustness trends for dense prediction and detection tasks under challenging conditions.

Further Reading:
Shashank Agnihotri*, David Schader*, Jonas Jakubassa*, Nico Sharei*, Simon Kral*, Mehmet Ege Kaçar*, Ruben Weber*, Margret Keuper: SEMSEGBENCH & DETECBENCH: Benchmarking. Reliability and Generalization Beyond Classification. arXiv 2505.18015 (2025).

FlowBench: Benchmarking Optical Flow Estimation Methods for Reliability and Generalization

https://github.com/shashankskagnihotri/FlowBench

Learning-based optical flow estimation methods have achieved strong performance on standard benchmarks, but their reliability under perturbations and distribution shifts is less well understood. To support research in this direction, we introduce FlowBench, a benchmarking framework for evaluating the robustness and generalization of optical flow methods. The framework supports systematic analysis under adversarial perturbations and common corruptions, enabling standardized comparison of model stability and behavior beyond conventional accuracy metrics.

Further Reading:
Shashank Agnihotri*, Julian Yuya Caspary*, Luca Schwarz*, Xinyan Gao*, Jenny Schmalfuss, Andrés Bruhn, Margret Keuper: FlowBench: Benchmarking Optical Flow Estimation Methods for Reliability and Generalization. Published in Transactions on Machine Learning Research (2025).

A Granular Study of Safety Pretraining under Model Abliteration

https://github.com/shashankskagnihotri/safety_pretraining

Safety alignment in open-weight language models is often evaluated only at the final model stage, making it difficult to understand which parts of safety pretraining remain effective under post-hoc editing interventions. To support research in this direction, we present a granular study of safety pretraining under model abliteration. By analyzing a sequence of safety-pretrained checkpoints, the work enables systematic study of how different safety properties evolve and which of them remain robust when refusal-related behaviors are removed through lightweight model edits.

Further reading:
Shashank Agnihotri*, Jonas Jakubassa*, Priyam Dey, Sachin Goyal, Bernt Schiele, R. Venkatesh Babu, Margret Keuper: A Granular Study of Safety Pretraining
under Model Abliteration. 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop.

Task-driven Sensor Layouts

https://github.com/hesom/task_driven_sensors

Computational imaging concepts based on integrated edge AI and neural sensor concepts solve vision problems in an end-to-end, task-specific manner, by jointly optimizing the algorithmic and hardware parameters to sense data with high information value. They yield energy, data, and privacy efficient solutions, but rely on novel hardware concepts, yet to be scaled up. In this work, we present the first truly end-to-end trained imaging pipeline that optimizes imaging sensor parameters, available in standard CMOS design methods, jointly with the parameters of a given neural network on a specific task. Specifically, we derive an analytic, differentiable approach for the sensor layout parameterization that allows for task-specific, locally varying pixel resolutions.

Further reading:
Hendrik Sommerhoff, Shashank Agnihotri, Mohamed Saleh, Michael Moeller, Margret Keuper, Bhaskar Choubey, Andreas Kolb: Task Driven Sensor Layouts – Joint Optimization of Pixel Layout and Network Parameters (https://doi.org/10.1109/ICCP61108.2024.10644474). In IEEE International Conference on Computational Photography (ICCP) (2024).

————————————————
* equal contributions