Skip to content

TAWilts/HODOR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI DOI Ask DeepWiki

🐟 HODOR: Hydroacoustic and Optical Dataset for Oceanic Research

Welcome to the official repository of the HODOR dataset, a unique long-term multimodal underwater dataset combining synchronized forward-looking sonar and stereo camera data, collected in the Baltic Sea via the autonomous UFO (Underwater Fish Observatory) platform. HODOR supports research in object detection, tracking, and sensor fusion in challenging underwater environments.

Keywords: sonar dataset, underwater imaging, fish detection dataset, marine robotics dataset, multimodal dataset


📚 Table of Contents


📦 What's Inside

exampleSequence

Snapshot of sequence 2895 generated by the visualizeSequence.m script.

🔹 5850 synchronized sequences of imaging sonar and stereo camera data

🔹 About 430 hours of continuous sequences per sensor

🔹 Associated abiotic measurements (e.g., CTD, ADCP, fluorometer)

🔹 Machine-learning-based annotations (coming soon)

📏 Sequence Lengths

The individual sequences in HODOR are of variable length, ranging from as short as 1 minute up to several hours. This variability is by design, i.e. each sequence duration is determined by observed biological activity rather than fixed time intervals.

A sequence continues recording as long as any kind of fish activity can be observed in the sensor data. This means:

  • Short sequences (1-10 minutes): Brief encounters with individual fish or small groups
  • Long sequences (hours): Persistent activity, such as when schools of small fish remain in the observation area for extended periods

This activity-based segmentation ensures that each sequence captures complete behavioral events, making the dataset particularly valuable for studying natural fish behavior patterns and movement dynamics in their marine habitat.

🚀 How to Access the Data

Using the hodor-python Package (Recommended)

The preferred approach is to use the hodor-python package detailed below.

This allows you to:

  • Filter and identify sequences of interest (e.g., by specific species activity)
  • Download only the sequences you need using the built-in download functions
  • Automatically handle metadata and file organization

Direct Download from PANGAEA

pangaea

Download the tab-delimited metadata at https://doi.pangaea.de/10.1594/PANGAEA.980000.

Look for the "Download Data" section and select "Download ZIP file containing all datasets as tab-delimited text".

Note: This downloads only the metadata for sonar and camera files (no binary video files!) and the activity counts data, which specifies the amount of biological activity per sequence. The metadata files contain the information on how to create the absolute download links for the camera and sonar video files.

In a nutshell, this is how it can be done:

Bulk Downloads (PANGAEA registration required):

Single File Downloads (no registration required):

🔧 Tools

🐍 Python

The HODOR dataset has a dedicated, installable Python package for easy data access and analysis: hodor-python. This package provides a pandas-based interface for working with the dataset metadata and species activity counts, and is maintained in a separate repository: https://github.com/gboeer/hodor_python

Installation

Install directly from PyPI:

pip install hodor-python

Or if you're using uv:

uv add hodor-python

Quick Start

from hodor_python import HODOR_Dataset, Species

# Create dataset instance (downloads metadata automatically)
hodor = HODOR_Dataset(dataset_folder="HODOR")

# Access activity counts as pandas DataFrame
print(hodor.counts.head())

# Filter by species activity
cod_sequences = hodor.counts[hodor.counts[Species.FISH_COD] > 0]

# Download Video and Sonar videos for a specific sequence id (42)
hodor.download_sequence(42)

Features

  • Easy data access: Load HODOR metadata into pandas DataFrames
  • Species filtering: Built-in Species enum for consistent filtering
  • Automatic downloads: Integration with pangaeapy for seamless data retrieval
  • Example notebooks: Complete usage examples in meta/hodor_python

🧠 MATLAB

  • visualizeSequence.m: Display all sensor data at the same time (see example-gif above)

  • analyzeCompleteSet: Runs through all dataset video files and performs statistical measurements

💬 Join the Discussion

We welcome your feedback, questions, and contributions!

  • 👉 Use the Issues tab for bug reports and feature requests

  • 👉 Start a thread in Discussions to share research ideas or get support

  • 🤘 Fork the repo and send us a pull request to contribute new tools, improvements, or annotations

Let’s build the future of underwater perception—together. 🌊🤿

Data Availability

🔹 The dataset is published at PANGAEA: https://doi.org/10.1594/PANGAEA.980000

🔹 Read the paper: https://ieeexplore.ieee.org/ielx8/10347231/10677474/11121653.pdf

🔹 The associated abiotic measurements: https://doi.org/10.1594/PANGAEA.973019

Discrepancy Between Published Species Counts and HODOR Metadata

The species counts listed in Table 1 of the HODOR publication were adopted from the previous study [1]. These values were derived using the hybrid estimation algorithm described in that work, which combines the camera-based MaxN detections with sonar detections and extrapolates the near-field (camera) observations to a larger field of view.

In contrast, the metadata provided with the HODOR dataset contains only the partially filtered camera-based MaxN counts, without the extrapolated sonar component.
Consequently, the absolute species counts obtained when analyzing the released dataset will differ from those reported in the publication’s Table 1.

Additionally, the naming conventions in the metadata differ slightly from those used in the publication.
The following mappings apply:

Metadata label Publication name
fish_unspecified Unspecified fish
fish_clupeidae Clupeidae
fish_cod Gadus morhua
fish_mackerel Scomber scombrus
fish_salmonidae Salmonidae
fish_pipefish Syngnathinae
fish_plaice Pleuronectes platessa
fish_scad Trachurus trachurus
jellyfish_unspecified Unspecified jellyfish
jellyfish_ctenophora Mnemiopsis leidyi
jellyfish_cyanea Cyanea sp.
jellyfish_aurelia Aurelia aurita
bird_cormorant Cormorant
crab_crustacea Crustacea

Finally, due to quality-based filtering in the released HODOR dataset, the following species reported in the publication are not present in the available video material:
Anguilla anguilla (previously reported with one occurrence).

📄 Cite Us

If you use HODOR in your research, please cite:

@ARTICLE{11121653,
  author={Wilts, Thomas and Böer, Gordon and Winkler, Julian and Cisewski, Boris and Schramm, Hauke and Badri-Hoeher, Sabah},
  journal={IEEE Data Descriptions}, 
  title={Descriptor: Hydroacoustic and Optical Dataset for Oceanic Research (HODOR)}, 
  year={2025},
  volume={2},
  number={},
  pages={262-270},
  keywords={Sonar;Cameras;Optical sensors;Optical imaging;Fish;Optical recording;Acoustics;Synchronization;Sonar measurements;Baltic Sea;camera;sonar;stereo camera},
  doi={10.1109/IEEEDATA.2025.3596913}}

About

Accessory code and meta information to the HODOR dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages