Skip to content

Best practice for handling unevenly sampled (jagged) observations in Black-Box models? (SimpleBbAsciiFile length constraint & Documentation inquiry) #473

@hurys20

Description

@hurys20

Hello OpenDA developers/community,

I am currently integrating a 2D hydrodynamic and water quality model (CE-QUAL-W2) into OpenDA using the BlackBoxWrapper. Since the model is not natively supported, I am using Python scripts to bridge the model I/O with OpenDA.

Currently, I am using org.openda.blackbox.io.SimpleBbAsciiFile for model results and noosObserver for observations. However, I have hit a severe architectural limitation regarding real-world, unevenly sampled data.

The Physical Scenario (The "Jagged Data" Problem):
In our real-world reservoir monitoring (multiple sites and multiple depths), the sampling frequency is naturally uneven.
For example, within a 60-day simulation window:

Site_A_Depth_0.5m might have 15 observations.

Site_B_Depth_10.0m might have 8 observations.

Site_C_Depth_90.0m might only have 2 observations.

The Technical Bottleneck:
My Python bridge successfully extracts a full, continuous 60-day time series for all simulation points and writes them into model_results.output. However, SimpleBbAsciiFile seems to enforce a strict length contract based on the .noo files.

If the .noo file for Site_B has 8 records, OpenDA throws an error when parsing the 60-day model output:

“Error preparing algorithm.
Error message: expecting vector of length 8 for time, but length was 60
...
at org.openda.blackbox.io.SimpleBbAsciiFile.initialize”

Workarounds we considered (but are not ideal):

Intersection Method: Only keep the exact dates where all sites were simultaneously monitored. (This forces all .noo files to be the exact same length, but we lose a massive amount of valuable field data).

Scalarization Method: Abandon timeSeries entirely and treat every single observation point at every specific timestamp as an independent, length-1 scalar variable (e.g., SiteA_Day1, SiteA_Day2). This explodes the XML configuration and loses the semantic meaning of a time series.

My Questions:

Advanced I/O Handling: Is there a more advanced generic IO class (e.g., a generic NetCDF wrapper) or a specific Observation Operator configuration in OpenDA that allows us to feed a full, continuous model time series (e.g., length 60) and let OpenDA automatically interpolate or pick the matching timestamps based on the varying lengths of individual .noo files?

Best Practices: How do advanced models integrated via the Black-Box approach usually handle varying observation frequencies across different measurement vectors without triggering the length mismatch error?

Documentation & Manuals: Is there a continuously updated reference manual or comprehensive documentation for OpenDA's built-in wrapper classes and algorithms? We often find ourselves unsure about what newer classes might exist, what their underlying constraints/limitations are, how to correctly format their XML paradigms, and how they compare to one another. A detailed guide would greatly help us fully utilize the framework's potential.

Any guidance, documentation links, or examples pointing to a more advanced I/O handler for Black-Box models would be greatly appreciated. Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions