Enhance ResamplerConfig with closed and label options and add comprehensive tests#1344
Enhance ResamplerConfig with closed and label options and add comprehensive tests#1344malteschaaf wants to merge 7 commits intofrequenz-floss:v1.x.xfrom
Conversation
| ) | ||
| minimum_relevant_timestamp = timestamp - period * conf.max_data_age_in_periods | ||
|
|
||
| min_index = bisect( |
There was a problem hiding this comment.
I think the behavior how to resample, i.e. left or right open and the labeling should be config parameters.
There was a problem hiding this comment.
I think we should make left or right opened configurable with the corresponding label, such as:
right open: [t,t+1) -> labeled as t
left open: (t-1,t] -> labeled as t (the current behavior)
Or do you want to allow the user to additionally do something like:
right open, label in the end: [t,t+1) -> labeled as t+1
left open, label in the beginning:(t-1,t] -> labeled as t-1
Because I think the later could lead to a lot of confusion (not sure if its ever needed)
There was a problem hiding this comment.
I think the latter is also reasonable options (see e.g. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html), but don't see a strong reason to implement this now if not needed. If it's well-documented, the users can also adjust the timestamps trivially. So your proposal sounds good to me.
There was a problem hiding this comment.
Yeah, whatever we do, we should probably be much more explicit of how output samples are calculated and structured.
llucax
left a comment
There was a problem hiding this comment.
In general LGTM, I think this would need quite some testing, and we probably want to introduce this using a config option to change the behavior, specially if configurability is something we want to allow in the future (if we don't see having this level of configurability useful, we can hide it in a feature flag using a env var like we did with the new wall clock timer). As we experimented recently with the changes in the component graph, subtle changes in behavior can go easily unnoticed and cause issues.
| ) | ||
| minimum_relevant_timestamp = timestamp - period * conf.max_data_age_in_periods | ||
|
|
||
| min_index = bisect( |
There was a problem hiding this comment.
Yeah, whatever we do, we should probably be much more explicit of how output samples are calculated and structured.
83c6774 to
96dbff9
Compare
closed and label options and add comprehensive tests
closed and label options and add comprehensive tests|
Ready for Review 🧐 Changes implemented:
|
57865ce to
5bd2919
Compare
5bd2919 to
297ec7c
Compare
b39f3de to
1231852
Compare
|
Ready for Review 🧐 Changes implemented:
|
1231852 to
d99614a
Compare
|
Ready for Review again Changes:
|
d99614a to
f745efc
Compare
daniel-zullo-frequenz
left a comment
There was a problem hiding this comment.
LGTM. I'll leave to Luca to review it and approve it
|
Sorry for the delay, will have a look now. |
There was a problem hiding this comment.
LGTM, a few very nitpick comments, and one that is moderately important to me (using match instead of if). I know it might look irrelevant, but it looks to me like one of these "64k is enough for everyone" situation, where it seems impossible that anyone could add a new enum member... until someone does.
- Introduced the `WindowSide` enum with values `LEFT` and `RIGHT`: - `LEFT`: Represents the start of the resampling window. - `RIGHT`: Represents the end of the resampling window. - Exposed the `WindowSide` enum in the `__init__.py` of the timeseries module for public use. Signed-off-by: Malte Schaaf <malte.schaaf@frequenz.com>
…havior - Introduced the `closed` parameter in ResamplerConfig with options `WindowSide.RIGHT` (default) and `WindowSide.LEFT`. - Updated the resampling logic to respect the `closed` configuration: - `RIGHT`: Includes samples at the end of the window, excludes those at the start. - `LEFT`: Includes samples at the start of the window, excludes those at the end. - Adjusted documentation to reflect the new `closed` parameter and its behavior. Signed-off-by: Malte Schaaf <malte.schaaf@frequenz.com>
… resampled data - Introduced the `label` parameter in ResamplerConfig with options `WindowSide.RIGHT` (default) and `WindowSide.LEFT`. - Updated the resampling logic to respect the `label` configuration: - `RIGHT`: The timestamp of the resampled data corresponds to the end of the resampling window. - `LEFT`: The timestamp of the resampled data corresponds to the start of the resampling window. - Adjusted the logic for setting `sample_time` to use the `label` configuration. - Updated documentation to reflect the new `label` parameter and its behavior. Signed-off-by: Malte Schaaf <malte.schaaf@frequenz.com>
- Enhanced the `test_resampler_closed_option` to include additional samples at 2.5, 3, and 4 seconds. - Verified the behavior of the `closed` option (`WindowSide.RIGHT` and `WindowSide.LEFT`) with the extended timeline. - Added assertions to ensure correct resampling function calls and sink outputs for the new samples. - Confirmed that source properties and buffer length are updated correctly after processing the additional samples. Signed-off-by: Malte Schaaf <malte.schaaf@frequenz.com>
…havior - Introduced `test_resampler_label_option` to validate the `label` configuration in ResamplerConfig. - Tested both `WindowSide.LEFT` and `WindowSide.RIGHT` options to ensure the resampled datas timestamp corresponds to the start or end of the resampling window, respectively. - Verified sink outputs with the expected timestamp and resampled value. Signed-off-by: Malte Schaaf <malte.schaaf@frequenz.com>
Improve the docstring of the `_StreamingHelper`s `resample` method to better describe how the timestamp parameter is used in resampling calculations. The timestamp now clearly represents the upper bound for buffered samples considered in the resampling operation. Signed-off-by: Malte Schaaf <malte.schaaf@frequenz.com>
Signed-off-by: Malte Schaaf <malte.schaaf@frequenz.com>
|
READY FOR REVIEW
|
|
Unfortunately I have no rights to merge myself @llucax |
This PR introduces the
WindowSideenum and two new configuration options—closedandlabel—to theResamplerConfigclass, providing a clearer and more flexible way to define the behavior of the resampling process. Comprehensive tests have been added to ensure correctness.Changes:
Introduce
WindowSideenumWindowSideenum with valuesLEFTandRIGHT:LEFT: Represents the start of a resampling window.RIGHT: Represents the end of a resampling window.WindowSidein the__init__.pyof the timeseries module for public use.Add
closedoption to ResamplerConfigclosedparameter with optionsWindowSide.RIGHT(default) andWindowSide.LEFT.closedconfiguration:RIGHT: Includes samples at the end of the window, excludes those at the start.LEFT: Includes samples at the start of the window, excludes those at the end.closedparameter.Add
labeloption to ResamplerConfiglabelparameter with optionsWindowSide.RIGHT(default) andWindowSide.LEFT.labelconfiguration:RIGHT: Timestamp of the resampled data corresponds to the end of the resampling window.LEFT: Timestamp of the resampled data corresponds to the start of the resampling window.sample_timebased on thelabelconfiguration.Add tests for
closedandlabelclosedbehavior with bothWindowSide.RIGHTandWindowSide.LEFT, including additional sample points.labelbehavior to ensure timestamps of resampled data correspond correctly to the start or end of the window.