-
Notifications
You must be signed in to change notification settings - Fork 33
Best practice for handling unevenly sampled (jagged) observations in Black-Box models? (SimpleBbAsciiFile length constraint & Documentation inquiry) #473
Description
Hello OpenDA developers/community,
I am currently integrating a 2D hydrodynamic and water quality model (CE-QUAL-W2) into OpenDA using the BlackBoxWrapper. Since the model is not natively supported, I am using Python scripts to bridge the model I/O with OpenDA.
Currently, I am using org.openda.blackbox.io.SimpleBbAsciiFile for model results and noosObserver for observations. However, I have hit a severe architectural limitation regarding real-world, unevenly sampled data.
The Physical Scenario (The "Jagged Data" Problem):
In our real-world reservoir monitoring (multiple sites and multiple depths), the sampling frequency is naturally uneven.
For example, within a 60-day simulation window:
Site_A_Depth_0.5m might have 15 observations.
Site_B_Depth_10.0m might have 8 observations.
Site_C_Depth_90.0m might only have 2 observations.
The Technical Bottleneck:
My Python bridge successfully extracts a full, continuous 60-day time series for all simulation points and writes them into model_results.output. However, SimpleBbAsciiFile seems to enforce a strict length contract based on the .noo files.
If the .noo file for Site_B has 8 records, OpenDA throws an error when parsing the 60-day model output:
“Error preparing algorithm.
Error message: expecting vector of length 8 for time, but length was 60
...
at org.openda.blackbox.io.SimpleBbAsciiFile.initialize”
Workarounds we considered (but are not ideal):
Intersection Method: Only keep the exact dates where all sites were simultaneously monitored. (This forces all .noo files to be the exact same length, but we lose a massive amount of valuable field data).
Scalarization Method: Abandon timeSeries entirely and treat every single observation point at every specific timestamp as an independent, length-1 scalar variable (e.g., SiteA_Day1, SiteA_Day2). This explodes the XML configuration and loses the semantic meaning of a time series.
My Questions:
Advanced I/O Handling: Is there a more advanced generic IO class (e.g., a generic NetCDF wrapper) or a specific Observation Operator configuration in OpenDA that allows us to feed a full, continuous model time series (e.g., length 60) and let OpenDA automatically interpolate or pick the matching timestamps based on the varying lengths of individual .noo files?
Best Practices: How do advanced models integrated via the Black-Box approach usually handle varying observation frequencies across different measurement vectors without triggering the length mismatch error?
Documentation & Manuals: Is there a continuously updated reference manual or comprehensive documentation for OpenDA's built-in wrapper classes and algorithms? We often find ourselves unsure about what newer classes might exist, what their underlying constraints/limitations are, how to correctly format their XML paradigms, and how they compare to one another. A detailed guide would greatly help us fully utilize the framework's potential.
Any guidance, documentation links, or examples pointing to a more advanced I/O handler for Black-Box models would be greatly appreciated. Thank you!