Add extended precision/rounding functionality and clarify the asymm unc signs#300
Conversation
…onvention and requirements for asymm uncertainties
|
Thanks, but the test |
… a tuple. Fixes the failing test
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #300 +/- ##
==========================================
+ Coverage 92.21% 92.45% +0.23%
==========================================
Files 5 5
Lines 1118 1153 +35
Branches 237 254 +17
==========================================
+ Hits 1031 1066 +35
+ Misses 60 59 -1
- Partials 27 28 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…: it was not actually treating the error sources independently. Fixed now, and the tests are passing as well as manually sanity-checked
|
@GraemeWatt I've taken a look at the "failed" code coverage -- as far as I can see, from https://app.codecov.io/gh/HEPData/hepdata_lib/pull/300/blob/hepdata_lib/helpers.py the new functions I added are fully covered, it's really complaining about the pre-existing |
There was a problem hiding this comment.
Pull request overview
This PR extends hepdata_lib’s numeric rounding/precision utilities to support array-based rounding (including multi-uncertainty rounding) and updates the user documentation to clarify the signed convention for asymmetric uncertainties (down/up variations).
Changes:
- Add
get_number_size,round_multiple, and new array-oriented value/uncertainty rounding helpers. - Refactor/regroup helper functions (file/command helpers vs numerical helpers) and adjust related tests.
- Clarify asymmetric uncertainty sign conventions in the usage documentation.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
hepdata_lib/helpers.py |
Adds new precision/rounding helpers and reorganizes helper sections; updates rounding logic to support array inputs and multiple uncertainties. |
tests/test_helpers.py |
Adds unit tests for the new size/rounding helpers and array-based rounding functions. |
docs/usage.rst |
Updates documentation to clarify signed asymmetric-uncertainty conventions (down/up deltas). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| : param value : number to evaluate | ||
| : type value : float or tuple[float] | ||
|
|
||
| value_precision = get_number_precision(value) | ||
| absolute_digits = -value_precision + relative_digits # pylint: disable=invalid-unary-operand-type | ||
| : returns : order of magnitude (rounded-up power of 10) of ``value``, | ||
| normally integer except in the zero-value failure mode | ||
|
|
| def round_value_and_multiple_uncertainties_arrs(vals, unclists, sig_digits_unc=2): | ||
| """ | ||
| if not os.path.exists(path_to_file): | ||
| raise RuntimeError("Cannot find file: " + path_to_file) | ||
| return True | ||
| Round values and multiple uncertainty sources according to the precision of the | ||
| largest uncertainty, and also round each (asymm) uncertainty to a given number | ||
| of significant digits. | ||
|
|
||
| def check_file_size(path_to_file, upper_limit=None, lower_limit=None): | ||
| """ | ||
| Check that the file size is between the upper and lower limits. | ||
| If not, raise RuntimeError. | ||
| Named with the _arrs suffix as the pre-existing, canonically named | ||
| versions operate on dicts from the ROOT reader. | ||
|
|
||
| :param path_to_file: File path to check. | ||
| :type path_to_file: string | ||
| The rounding of each error source is performed independently, with at least one | ||
| sd always shown. The smallest precision encountered in the error set (i.e. the | ||
| largest uncertainty component) is used to define the precision of the value's | ||
| rounding. At least one sd of the value will always be reported, though 100% errors | ||
| are not commonly published. | ||
|
|
||
| :param upper_limit: Upper size limit in MB. | ||
| :type upper_limit: float | ||
| : param cont : dictionary as returned e.g. by ``RootFileReader::read_hist_1d()`` | ||
| : type cont : dictionary | ||
|
|
||
| :param lower_limit: Lower size limit in MB. | ||
| :type lower_limit: float | ||
| : param sig_digits_unc: how many significant digits used to round the uncertainty | ||
| : type sig_digits_unc: integer |
| def get_value_size_wrt_reference(value, reference, size_for_zero=float("nan")): | ||
| """ | ||
| round all values in a dictionary to some decimals in one go | ||
| default round to 3 digits after period | ||
| possible use case: correlations where typical values are within -1,1 | ||
| Like the get_value_precision_wrt_reference but calling get_number_size | ||
| rather than get_number_precision, and with the optional zero-return | ||
| option of the former. | ||
|
|
||
| : param cont : dictionary as returned e.g. by RootFileReader::read_hist_1d() | ||
| : type cont : dictionary | ||
| ``value`` and ``reference`` are both float and/or int | ||
| ``value`` can be float when reference is an int and vice-versa | ||
|
|
||
| : param decimals: how many decimals for the rounding | ||
| : type decimals: integer | ||
| : param value: first value | ||
| : type value: float, int | ||
|
|
||
| : param reference: reference value (usually the uncertainty on value) | ||
| : type reference: float, int | ||
|
|
||
| : param size_for_zero: the size value to be used for zero-valued ``value`` or ``reference`` | ||
| : type size_for_zero: float, int | ||
| """ | ||
|
|
||
| decimals = int(decimals) | ||
| this_function = "get_value_size_wrt_reference()" | ||
| good_types = [int, float] | ||
| arguments = [value, reference] | ||
|
|
||
| for i, val in enumerate(cont[key]): | ||
| if isinstance(val, tuple): | ||
| cont[key][i] = (round(val[0], decimals), round(val[1], decimals)) | ||
| else: | ||
| cont[key][i] = round(val, decimals) | ||
| # first check all arguments have appropriate type | ||
| for input_arg in arguments: | ||
| if not any(isinstance(input_arg, x) for x in good_types): | ||
| raise ValueError("Unsupported input type passed to " + this_function) | ||
|
|
||
| return get_number_size(value, size_for_zero) - get_number_size(reference, size_for_zero) |
| subprocess_args = { | ||
| "args": command, | ||
| "stdin": subprocess.PIPE, | ||
| "stdout": subprocess.PIPE, | ||
| "stderr": subprocess.PIPE, | ||
| "shell": True, | ||
| "universal_newlines": True | ||
| } | ||
| with subprocess.Popen(**subprocess_args) as proc: | ||
| exit_code = proc.wait() | ||
| if exit_code == 127: | ||
| print("Command does not exist:", command) | ||
| return False | ||
| if exit_code != 0: | ||
| result = "" | ||
| for line in proc.stderr: | ||
| result = result + line | ||
| raise RuntimeError(result) | ||
| return True | ||
|
|
||
|
|
||
| def convert_pdf_to_png(source, target): | ||
| """ | ||
| Wrapper for the ImageMagick convert utility. | ||
|
|
||
| :param source: Source file in PDF format. | ||
| :type source: str | ||
| :param target: Output file in PNG format. | ||
| :type target: str | ||
| """ | ||
| assert os.path.exists(source), f"Source file does not exist: {source}" | ||
|
|
||
| command = f"convert -flatten -density 300 -fuzz 1% -trim +repage {source} {target}" | ||
| command_ok = execute_command(command) |
| def file_is_outdated(file_path, reference_file_path): | ||
| """ | ||
| Check if the given file is outdated compared to the reference file. | ||
|
|
||
| Also returns true if the reference file does not exist. | ||
|
|
||
| :param file_path: Path to the file to check. | ||
| :type file_path: str | ||
| :param reference_file_path: Path to the reference file. | ||
| :type reference_file_path: str | ||
| """ | ||
| if not os.path.exists(reference_file_path): | ||
| raise RuntimeError(f"Reference file does not exist: {reference_file_path}") | ||
| if not os.path.exists(file_path): |
|
|
||
| # handle tuples like get_number_precision does | ||
| if isinstance(value, tuple): | ||
| return tuple(get_number_size(x) for x in value) |
| try: #< if this fails, uncs isn't iterable -> fall back to scalar | ||
| # get orders of magnitude of each component | ||
| unc_orders = [get_number_size(u) for u in uncs] | ||
| # base the nominal precision on the target number of sd's on the largest component | ||
| ptarget = -int(np.nanmax(unc_orders)) + sig_digits | ||
| # customise the precisions for each component (if instructed to prevent rounding to zero) | ||
| ptargets = [(max(ptarget, -uo+1) if no_round_to_zero else ptarget) for uo in unc_orders] | ||
| # do the (maybe custom) rounding | ||
| newuncs = [round(u, ptargets[i]) for (i, u) in enumerate(uncs)] | ||
| # return as a tuple if the input was a tuple (for ROOT use-case & test consistency) | ||
| if isinstance(uncs, tuple): | ||
| newuncs = tuple(newuncs) | ||
| return newuncs, ptargets if no_round_to_zero else ptarget | ||
| except TypeError: | ||
| unc_order = get_number_size(uncs) | ||
| newunc = relative_round(uncs, sig_digits) | ||
| return newunc, [-unc_order+sig_digits] |
| # return as a tuple if the input was a tuple (for ROOT use-case & test consistency) | ||
| if isinstance(uncs, tuple): | ||
| newuncs = tuple(newuncs) | ||
| return newuncs, ptargets if no_round_to_zero else ptarget |
|
@agbuckley : yes, the reordered functions cause issues for the code coverage and the AI review (just initiated). Can you please look through the AI review comments and address ones relevant to your code changes (if meaningful)? At least the first docstring comments seem valid. You can check Read the Docs for the PR branch. What about adding links from the Uncertainties section to the docs for the relevant helper functions in the code documentation to make them more visible? Maybe @cippy could also review the PR, then I usually leave the final sign-off and merge to @clelange. (It's a CERN holiday today.) |
|
Hopefully all sufficient now |
GraemeWatt
left a comment
There was a problem hiding this comment.
Looks good to me, but I'll leave it to @clelange to approve and merge.
|
I'm currently quite busy with detector operations, but it's on my list and I'll try to get to it soon. Thanks for your patience (and contribution and review)! |
| for iu, u in enumerate(uncs_ipt): | ||
| u_rnd, uprecisions = round_multiple(u, sig_digits_unc, True) | ||
| unclists[iu][ipt] = u_rnd | ||
| minuncprecision = int(np.nanmin(np.hstack((uprecisions, minuncprecision)))) |
There was a problem hiding this comment.
Correct me if I'm wrong, but it might be quick to test as well for you: I think a zero uncertainty source can still dominate the precision chosen for the central value here. For example, with vals=[1.234], unclists=[[0.0], [0.0123]], sig_digits_unc=2, line 417 builds uncs_ipt=[0.0, 0.0123]. The first source goes through round_multiple at line 421; because round_multiple(0.0, 2, True) returns precision [0] at lines 332-333, this line here sets minuncprecision to 0. The second source returns precision [3], but minuncprecision stays 0, then line 425 sets valprecision=0 and line 426 rounds 1.234 to 1.0. Could zero-sized uncertainty sources be ignored when there is at least one finite nonzero uncertainty source, with an explicit all-zero fallback?
There was a problem hiding this comment.
Hi @clelange , thanks for the detailed debugging! I agree -- this is an unintended consquence. I'll rethink the replacement of all-NaN returns from round_multiple, which was written with the all-error-sources-null case in mind, then I had to restructure to handle asymm errors and it broke the following logic without showing up in my tests. Update soon
There was a problem hiding this comment.
Now done. Tests extended and passing.
| uncs[i], uncprecisions = round_multiple(unc, sig_digits_unc, True) | ||
| valprecision = -get_number_size(val)+1 | ||
| vals[i] = round(val, max(int(np.nanmin(uncprecisions)), valprecision)) |
There was a problem hiding this comment.
This changes the behavior of the existing public round_value_and_uncertainty helper for zero scalar uncertainties. For example, round_value_and_uncertainty({"v": [1234.56], "u": [0.0]}, "v", "u", 2) used to leave the value as 1234.56 because relative_round(0.0, 2) returned 0.0 and the old value precision calculation effectively preserved the central value. With this change round_multiple(0.0, 2, True) returns precision [0], so this rounds the value with 0 decimal places and produces 1235.0. Is that intended? If so, could we add an explicit test and documentation for the new zero-uncertainty behavior; otherwise this should probably preserve the previous public-helper behavior. This would also help with coverage, I think.
There was a problem hiding this comment.
Agreed. I've fixed this as well, and added a new optional argument to all to set a sig figs rounding precision for values in the presence of completely null uncertainties (the chances of this actually meaning perfect experimental precision are zero, but the user needs to be the one giving the direction in such a case!)
| """ | ||
| Wrapper for the ImageMagick convert utility. | ||
| #assert isinstance(cont, dict) | ||
| round_value_and_uncertainty_arrs(cont[val_key], cont[unc_key], sig_digits_unc) |
There was a problem hiding this comment.
This is where round_value_and_uncertainty_arrs gets called in round_value_and_uncertainty, see my comment above
…a value now preserve full value precision, unless a new optional arg is used to set a target sig figs for such cases
|
Thanks @agbuckley ! |
Closes #113.
Extended value-precision functions and tidying cf. recent #113 discussion. In particular
The functions have been regrouped as to not mix up the file helpers and data helpers -- sorry, this makes the diff more complicated.
Also adding doc text to clarify the asymm-uncertainty convention and hence (usual) need for minus signs on statistical "down" variations.
📚 Documentation preview 📚: https://hepdata-lib--300.org.readthedocs.build/en/300/