Welcome to the official repository of the HODOR dataset, a unique long-term multimodal underwater dataset combining synchronized forward-looking sonar and stereo camera data, collected in the Baltic Sea via the autonomous UFO (Underwater Fish Observatory) platform. HODOR supports research in object detection, tracking, and sensor fusion in challenging underwater environments.
Keywords: sonar dataset, underwater imaging, fish detection dataset, marine robotics dataset, multimodal dataset
- What's Inside
- Sequence Lengths
- How to Access the Data
- Tools & Examples
- Join the Discussion
- Data Availability
- Cite Us
Snapshot of sequence 2895 generated by the visualizeSequence.m script.
🔹 5850 synchronized sequences of imaging sonar and stereo camera data
🔹 About 430 hours of continuous sequences per sensor
🔹 Associated abiotic measurements (e.g., CTD, ADCP, fluorometer)
🔹 Machine-learning-based annotations (coming soon)
The individual sequences in HODOR are of variable length, ranging from as short as 1 minute up to several hours. This variability is by design, i.e. each sequence duration is determined by observed biological activity rather than fixed time intervals.
A sequence continues recording as long as any kind of fish activity can be observed in the sensor data. This means:
- Short sequences (1-10 minutes): Brief encounters with individual fish or small groups
- Long sequences (hours): Persistent activity, such as when schools of small fish remain in the observation area for extended periods
This activity-based segmentation ensures that each sequence captures complete behavioral events, making the dataset particularly valuable for studying natural fish behavior patterns and movement dynamics in their marine habitat.
Using the hodor-python Package (Recommended)
The preferred approach is to use the hodor-python package detailed below.
This allows you to:
- Filter and identify sequences of interest (e.g., by specific species activity)
- Download only the sequences you need using the built-in download functions
- Automatically handle metadata and file organization
Download the tab-delimited metadata at https://doi.pangaea.de/10.1594/PANGAEA.980000.
Look for the "Download Data" section and select "Download ZIP file containing all datasets as tab-delimited text".
Note: This downloads only the metadata for sonar and camera files (no binary video files!) and the activity counts data, which specifies the amount of biological activity per sequence. The metadata files contain the information on how to create the absolute download links for the camera and sonar video files.
In a nutshell, this is how it can be done:
Bulk Downloads (PANGAEA registration required):
Single File Downloads (no registration required):
- Camera videos:
https://download.pangaea.de/dataset/980001/files/cam1_XXXX.mp4orcam2_XXXX.mp4where XXXX is the sequence ID (0000-5849) - Sonar videos:
https://download.pangaea.de/dataset/980002/files/sonar_XXXX.mp4where XXXX is the sequence ID (0000-5849)
The HODOR dataset has a dedicated, installable Python package for easy data access and analysis: hodor-python. This package provides a pandas-based interface for working with the dataset metadata and species activity counts, and is maintained in a separate repository: https://github.com/gboeer/hodor_python
Install directly from PyPI:
pip install hodor-pythonOr if you're using uv:
uv add hodor-pythonfrom hodor_python import HODOR_Dataset, Species
# Create dataset instance (downloads metadata automatically)
hodor = HODOR_Dataset(dataset_folder="HODOR")
# Access activity counts as pandas DataFrame
print(hodor.counts.head())
# Filter by species activity
cod_sequences = hodor.counts[hodor.counts[Species.FISH_COD] > 0]
# Download Video and Sonar videos for a specific sequence id (42)
hodor.download_sequence(42)- Easy data access: Load HODOR metadata into pandas DataFrames
- Species filtering: Built-in Species enum for consistent filtering
- Automatic downloads: Integration with pangaeapy for seamless data retrieval
- Example notebooks: Complete usage examples in
meta/hodor_pythondownload_data.ipynb: How to download specific sequencesusage_examples.ipynb: Data analysis and visualization examples
-
visualizeSequence.m: Display all sensor data at the same time (see example-gif above)
-
analyzeCompleteSet: Runs through all dataset video files and performs statistical measurements
We welcome your feedback, questions, and contributions!
-
👉 Use the Issues tab for bug reports and feature requests
-
👉 Start a thread in Discussions to share research ideas or get support
-
🤘 Fork the repo and send us a pull request to contribute new tools, improvements, or annotations
Let’s build the future of underwater perception—together. 🌊🤿
🔹 The dataset is published at PANGAEA: https://doi.org/10.1594/PANGAEA.980000
🔹 Read the paper: https://ieeexplore.ieee.org/ielx8/10347231/10677474/11121653.pdf
🔹 The associated abiotic measurements: https://doi.org/10.1594/PANGAEA.973019
The species counts listed in Table 1 of the HODOR publication were adopted from the previous study [1]. These values were derived using the hybrid estimation algorithm described in that work, which combines the camera-based MaxN detections with sonar detections and extrapolates the near-field (camera) observations to a larger field of view.
In contrast, the metadata provided with the HODOR dataset contains only the partially filtered camera-based MaxN counts, without the extrapolated sonar component.
Consequently, the absolute species counts obtained when analyzing the released dataset will differ from those reported in the publication’s Table 1.
Additionally, the naming conventions in the metadata differ slightly from those used in the publication.
The following mappings apply:
| Metadata label | Publication name |
|---|---|
| fish_unspecified | Unspecified fish |
| fish_clupeidae | Clupeidae |
| fish_cod | Gadus morhua |
| fish_mackerel | Scomber scombrus |
| fish_salmonidae | Salmonidae |
| fish_pipefish | Syngnathinae |
| fish_plaice | Pleuronectes platessa |
| fish_scad | Trachurus trachurus |
| jellyfish_unspecified | Unspecified jellyfish |
| jellyfish_ctenophora | Mnemiopsis leidyi |
| jellyfish_cyanea | Cyanea sp. |
| jellyfish_aurelia | Aurelia aurita |
| bird_cormorant | Cormorant |
| crab_crustacea | Crustacea |
Finally, due to quality-based filtering in the released HODOR dataset, the following species reported in the publication are not present in the available video material:
Anguilla anguilla (previously reported with one occurrence).
If you use HODOR in your research, please cite:
@ARTICLE{11121653,
author={Wilts, Thomas and Böer, Gordon and Winkler, Julian and Cisewski, Boris and Schramm, Hauke and Badri-Hoeher, Sabah},
journal={IEEE Data Descriptions},
title={Descriptor: Hydroacoustic and Optical Dataset for Oceanic Research (HODOR)},
year={2025},
volume={2},
number={},
pages={262-270},
keywords={Sonar;Cameras;Optical sensors;Optical imaging;Fish;Optical recording;Acoustics;Synchronization;Sonar measurements;Baltic Sea;camera;sonar;stereo camera},
doi={10.1109/IEEEDATA.2025.3596913}}
