Add extended precision/rounding functionality and clarify the asymm unc signs by agbuckley · Pull Request #300 · HEPData/hepdata_lib

agbuckley · 2026-04-30T12:53:07Z

Closes #113.

Extended value-precision functions and tidying cf. recent #113 discussion. In particular

alternative number-size variants with finer control over the size reported for 0
value-and-uncertainty rounding operating on array arguments rather than a dict from the ROOT reader
a multiple-uncertainty version of the above, performing rounding of the value to match the precision used for the largest uncertainty component

The functions have been regrouped as to not mix up the file helpers and data helpers -- sorry, this makes the diff more complicated.

Also adding doc text to clarify the asymm-uncertainty convention and hence (usual) need for minus signs on statistical "down" variations.

📚 Documentation preview 📚: https://hepdata-lib--300.org.readthedocs.build/en/300/

…onvention and requirements for asymm uncertainties

GraemeWatt · 2026-04-30T13:05:43Z

Thanks, but the test test_round_value_and_uncertainty is now failing. The linting also gives three errors. Please add tests for the new function to ensure code coverage and make sure that tests/docs/linting passes on your local instance, as requested at the top of my comment.

… a tuple. Fixes the failing test

codecov · 2026-04-30T13:53:57Z

Codecov Report

❌ Patch coverage is 83.46457% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.45%. Comparing base (1e912d3) to head (23e6a33).

Files with missing lines	Patch %	Lines
hepdata_lib/helpers.py	83.46%	16 Missing and 5 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #300      +/-   ##
==========================================
+ Coverage   92.21%   92.45%   +0.23%     
==========================================
  Files           5        5              
  Lines        1118     1153      +35     
  Branches      237      254      +17     
==========================================
+ Hits         1031     1066      +35     
+ Misses         60       59       -1     
- Partials       27       28       +1

Flag	Coverage Δ
unittests-3.10	`90.88% <83.46%> (+0.28%)`	⬆️
unittests-3.11	`90.88% <83.46%> (+0.28%)`	⬆️
unittests-3.12	`90.88% <83.46%> (+0.28%)`	⬆️
unittests-3.13	`90.88% <83.46%> (+0.28%)`	⬆️
unittests-3.6	`90.45% <83.46%> (+0.30%)`	⬆️
unittests-3.7	`90.45% <83.46%> (+0.30%)`	⬆️
unittests-3.8	`92.19% <83.46%> (+0.24%)`	⬆️
unittests-3.9	`90.63% <83.46%> (+0.29%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…: it was not actually treating the error sources independently. Fixed now, and the tests are passing as well as manually sanity-checked

agbuckley · 2026-05-01T10:09:04Z

@GraemeWatt I've taken a look at the "failed" code coverage -- as far as I can see, from https://app.codecov.io/gh/HEPData/hepdata_lib/pull/300/blob/hepdata_lib/helpers.py the new functions I added are fully covered, it's really complaining about the pre-existing get_value_size_wrt_reference(), round_value_to_decimals() and round_value_and_uncertainty_to_decimals(). So I think it'd be appropriate to merge anyway, since this patch hasn't actually reduced the level of coverage... maybe just happened to re-highlight it.

Copilot

Pull request overview

This PR extends hepdata_lib’s numeric rounding/precision utilities to support array-based rounding (including multi-uncertainty rounding) and updates the user documentation to clarify the signed convention for asymmetric uncertainties (down/up variations).

Changes:

Add get_number_size, round_multiple, and new array-oriented value/uncertainty rounding helpers.
Refactor/regroup helper functions (file/command helpers vs numerical helpers) and adjust related tests.
Clarify asymmetric uncertainty sign conventions in the usage documentation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File	Description
`hepdata_lib/helpers.py`	Adds new precision/rounding helpers and reorganizes helper sections; updates rounding logic to support array inputs and multiple uncertainties.
`tests/test_helpers.py`	Adds unit tests for the new size/rounding helpers and array-based rounding functions.
`docs/usage.rst`	Updates documentation to clarify signed asymmetric-uncertainty conventions (down/up deltas).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    : param value : number to evaluate
+    : type  value : float or tuple[float]

-    value_precision = get_number_precision(value)
-    absolute_digits = -value_precision + relative_digits  # pylint: disable=invalid-unary-operand-type
+    : returns : order of magnitude (rounded-up power of 10) of ``value``,
+                normally integer except in the zero-value failure mode



+def round_value_and_multiple_uncertainties_arrs(vals, unclists, sig_digits_unc=2):
    """
-    if not os.path.exists(path_to_file):
-        raise RuntimeError("Cannot find file: " + path_to_file)
-    return True
+    Round values and multiple uncertainty sources according to the precision of the
+    largest uncertainty, and also round each (asymm) uncertainty to a given number
+    of significant digits.

-def check_file_size(path_to_file, upper_limit=None, lower_limit=None):
-    """
-    Check that the file size is between the upper and lower limits.
-    If not, raise RuntimeError.
+    Named with the _arrs suffix as the pre-existing, canonically named
+    versions operate on dicts from the ROOT reader.

-    :param path_to_file: File path to check.
-    :type path_to_file: string
+    The rounding of each error source is performed independently, with at least one
+    sd always shown. The smallest precision encountered in the error set (i.e. the
+    largest uncertainty component) is used to define the precision of the value's
+    rounding. At least one sd of the value will always be reported, though 100% errors
+    are not commonly published.

-    :param upper_limit: Upper size limit in MB.
-    :type upper_limit: float
+    : param cont : dictionary as returned e.g. by ``RootFileReader::read_hist_1d()``
+    : type  cont : dictionary

-    :param lower_limit: Lower size limit in MB.
-    :type lower_limit: float
+    : param sig_digits_unc: how many significant digits used to round the uncertainty
+    : type  sig_digits_unc: integer


+def get_value_size_wrt_reference(value, reference, size_for_zero=float("nan")):
    """
-    round all values in a dictionary to some decimals in one go
-    default round to 3 digits after period
-    possible use case: correlations where typical values are within -1,1
+    Like the get_value_precision_wrt_reference but calling get_number_size
+    rather than get_number_precision, and with the optional zero-return
+    option of the former.

-    : param cont : dictionary as returned e.g. by RootFileReader::read_hist_1d()
-    : type  cont : dictionary
+    ``value`` and ``reference`` are both float and/or int
+    ``value`` can be float when reference is an int and vice-versa

-    : param decimals: how many decimals for the rounding
-    : type  decimals: integer
+    : param value: first value
+    : type  value: float, int
+
+    : param reference: reference value (usually the uncertainty on value)
+    : type  reference: float, int
+
+    : param size_for_zero: the size value to be used for zero-valued ``value`` or ``reference``
+    : type  size_for_zero: float, int
    """

-    decimals = int(decimals)
+    this_function = "get_value_size_wrt_reference()"
+    good_types = [int, float]
+    arguments = [value, reference]

-    for i, val in enumerate(cont[key]):
-        if isinstance(val, tuple):
-            cont[key][i] = (round(val[0], decimals), round(val[1], decimals))
-        else:
-            cont[key][i] = round(val, decimals)
+    # first check all arguments have appropriate type
+    for input_arg in arguments:
+        if not any(isinstance(input_arg, x) for x in good_types):
+            raise ValueError("Unsupported input type passed to " + this_function)

+    return get_number_size(value, size_for_zero) - get_number_size(reference, size_for_zero)


    subprocess_args = {
        "args": command,
        "stdin": subprocess.PIPE,
        "stdout": subprocess.PIPE,
        "stderr": subprocess.PIPE,
        "shell": True,
        "universal_newlines": True
    }
    with subprocess.Popen(**subprocess_args) as proc:
        exit_code = proc.wait()
        if exit_code == 127:
            print("Command does not exist:", command)
            return False
        if exit_code != 0:
            result = ""
            for line in proc.stderr:
                result = result + line
            raise RuntimeError(result)
    return True

+
+def convert_pdf_to_png(source, target):
+    """
+    Wrapper for the ImageMagick convert utility.
+
+    :param source: Source file in PDF format.
+    :type source: str
+    :param target: Output file in PNG format.
+    :type target: str
+    """
+    assert os.path.exists(source), f"Source file does not exist: {source}"
+
+    command = f"convert -flatten -density 300 -fuzz 1% -trim +repage {source} {target}"
+    command_ok = execute_command(command)


+def file_is_outdated(file_path, reference_file_path):
+    """
+    Check if the given file is outdated compared to the reference file.
+
+    Also returns true if the reference file does not exist.
+
+    :param file_path: Path to the file to check.
+    :type file_path: str
+    :param reference_file_path: Path to the reference file.
+    :type reference_file_path: str
+    """
+    if not os.path.exists(reference_file_path):
+        raise RuntimeError(f"Reference file does not exist: {reference_file_path}")
+    if not os.path.exists(file_path):


+
+    # handle tuples like get_number_precision does
+    if isinstance(value, tuple):
+        return tuple(get_number_size(x) for x in value)


+    try: #< if this fails, uncs isn't iterable -> fall back to scalar
+        # get orders of magnitude of each component
+        unc_orders = [get_number_size(u) for u in uncs]
+        # base the nominal precision on the target number of sd's on the largest component
+        ptarget = -int(np.nanmax(unc_orders)) + sig_digits
+        # customise the precisions for each component (if instructed to prevent rounding to zero)
+        ptargets = [(max(ptarget, -uo+1) if no_round_to_zero else ptarget) for uo in unc_orders]
+        # do the (maybe custom) rounding
+        newuncs = [round(u, ptargets[i]) for (i, u) in enumerate(uncs)]
+        # return as a tuple if the input was a tuple (for ROOT use-case & test consistency)
+        if isinstance(uncs, tuple):
+            newuncs = tuple(newuncs)
+        return newuncs, ptargets if no_round_to_zero else ptarget
+    except TypeError:
+        unc_order = get_number_size(uncs)
+        newunc = relative_round(uncs, sig_digits)
+        return newunc, [-unc_order+sig_digits]


+        # return as a tuple if the input was a tuple (for ROOT use-case & test consistency)
+        if isinstance(uncs, tuple):
+            newuncs = tuple(newuncs)
+        return newuncs, ptargets if no_round_to_zero else ptarget


GraemeWatt · 2026-05-01T11:14:39Z

@agbuckley : yes, the reordered functions cause issues for the code coverage and the AI review (just initiated). Can you please look through the AI review comments and address ones relevant to your code changes (if meaningful)? At least the first docstring comments seem valid. You can check Read the Docs for the PR branch. What about adding links from the Uncertainties section to the docs for the relevant helper functions in the code documentation to make them more visible? Maybe @cippy could also review the PR, then I usually leave the final sign-off and merge to @clelange. (It's a CERN holiday today.)

agbuckley · 2026-05-05T14:09:45Z

Hopefully all sufficient now

GraemeWatt

Looks good to me, but I'll leave it to @clelange to approve and merge.

clelange · 2026-05-05T14:29:16Z

I'm currently quite busy with detector operations, but it's on my list and I'll try to get to it soon. Thanks for your patience (and contribution and review)!

clelange · 2026-05-05T16:12:40Z

+        for iu, u in enumerate(uncs_ipt):
+            u_rnd, uprecisions = round_multiple(u, sig_digits_unc, True)
+            unclists[iu][ipt] = u_rnd
+            minuncprecision = int(np.nanmin(np.hstack((uprecisions, minuncprecision))))


Correct me if I'm wrong, but it might be quick to test as well for you: I think a zero uncertainty source can still dominate the precision chosen for the central value here. For example, with vals=[1.234], unclists=[[0.0], [0.0123]], sig_digits_unc=2, line 417 builds uncs_ipt=[0.0, 0.0123]. The first source goes through round_multiple at line 421; because round_multiple(0.0, 2, True) returns precision [0] at lines 332-333, this line here sets minuncprecision to 0. The second source returns precision [3], but minuncprecision stays 0, then line 425 sets valprecision=0 and line 426 rounds 1.234 to 1.0. Could zero-sized uncertainty sources be ignored when there is at least one finite nonzero uncertainty source, with an explicit all-zero fallback?

Hi @clelange , thanks for the detailed debugging! I agree -- this is an unintended consquence. I'll rethink the replacement of all-NaN returns from round_multiple, which was written with the all-error-sources-null case in mind, then I had to restructure to handle asymm errors and it broke the following logic without showing up in my tests. Update soon

Now done. Tests extended and passing.

clelange · 2026-05-05T16:18:02Z

+        uncs[i], uncprecisions = round_multiple(unc, sig_digits_unc, True)
+        valprecision = -get_number_size(val)+1
+        vals[i] = round(val, max(int(np.nanmin(uncprecisions)), valprecision))


This changes the behavior of the existing public round_value_and_uncertainty helper for zero scalar uncertainties. For example, round_value_and_uncertainty({"v": [1234.56], "u": [0.0]}, "v", "u", 2) used to leave the value as 1234.56 because relative_round(0.0, 2) returned 0.0 and the old value precision calculation effectively preserved the central value. With this change round_multiple(0.0, 2, True) returns precision [0], so this rounds the value with 0 decimal places and produces 1235.0. Is that intended? If so, could we add an explicit test and documentation for the new zero-uncertainty behavior; otherwise this should probably preserve the previous public-helper behavior. This would also help with coverage, I think.

Agreed. I've fixed this as well, and added a new optional argument to all to set a sig figs rounding precision for values in the presence of completely null uncertainties (the chances of this actually meaning perfect experimental precision are zero, but the user needs to be the one giving the direction in such a case!)

clelange · 2026-05-05T16:19:19Z

    """
-    Wrapper for the ImageMagick convert utility.
+    #assert isinstance(cont, dict)
+    round_value_and_uncertainty_arrs(cont[val_key], cont[unc_key], sig_digits_unc)


This is where round_value_and_uncertainty_arrs gets called in round_value_and_uncertainty, see my comment above

…a value now preserve full value precision, unless a new optional arg is used to set a target sig figs for such cases

clelange · 2026-05-11T16:48:14Z

Thanks @agbuckley !

Add extended precision/rounding functionality, and clarify the sign c…

303a549

…onvention and requirements for asymm uncertainties

agbuckley added 2 commits April 30, 2026 14:11

Address linter complaints

57af632

Make round_multiple return values as a tuple if they were supplied as…

dfa6aff

… a tuple. Fixes the failing test

agbuckley added 3 commits April 30, 2026 17:52

The test implementation caught a subtle bug in the multiple-errs case…

e355087

…: it was not actually treating the error sources independently. Fixed now, and the tests are passing as well as manually sanity-checked

Clean replaced code line

857a040

Use isinstance type check

0cc60fa

GraemeWatt requested a review from Copilot May 1, 2026 10:47

Copilot started reviewing on behalf of GraemeWatt May 1, 2026 10:47 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

agbuckley added 3 commits May 1, 2026 12:32

Extra test function and various small issues via Copilot review

06adfe8

Typo fixes

3c8bda7

Add precision discussion to docs

3d98942

GraemeWatt approved these changes May 5, 2026

View reviewed changes

GraemeWatt requested a review from clelange May 5, 2026 14:13

clelange reviewed May 5, 2026

View reviewed changes

Fixes in response to Clemens Lange's comments. Zero uncertainties on …

23e6a33

…a value now preserve full value precision, unless a new optional arg is used to set a target sig figs for such cases

clelange merged commit 7c8ded7 into HEPData:main May 11, 2026
36 of 37 checks passed

Conversation

agbuckley commented Apr 30, 2026 • edited by GraemeWatt Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GraemeWatt commented Apr 30, 2026

Uh oh!

codecov Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

agbuckley commented May 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

GraemeWatt commented May 1, 2026

Uh oh!

agbuckley commented May 5, 2026

Uh oh!

GraemeWatt left a comment

Choose a reason for hiding this comment

Uh oh!

clelange commented May 5, 2026

Uh oh!

clelange May 5, 2026

Choose a reason for hiding this comment

Uh oh!

agbuckley May 11, 2026

Choose a reason for hiding this comment

Uh oh!

agbuckley May 11, 2026

Choose a reason for hiding this comment

Uh oh!

clelange May 5, 2026

Choose a reason for hiding this comment

Uh oh!

agbuckley May 11, 2026

Choose a reason for hiding this comment

Uh oh!

clelange May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

clelange commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

agbuckley commented Apr 30, 2026 •

edited by GraemeWatt

Loading

codecov Bot commented Apr 30, 2026 •

edited

Loading