Skip to content

Add extended precision/rounding functionality and clarify the asymm unc signs#300

Merged
clelange merged 10 commits into
HEPData:mainfrom
agbuckley:ab-precision-multiple-uncs
May 11, 2026
Merged

Add extended precision/rounding functionality and clarify the asymm unc signs#300
clelange merged 10 commits into
HEPData:mainfrom
agbuckley:ab-precision-multiple-uncs

Conversation

@agbuckley
Copy link
Copy Markdown
Contributor

@agbuckley agbuckley commented Apr 30, 2026

Closes #113.


Extended value-precision functions and tidying cf. recent #113 discussion. In particular

  • alternative number-size variants with finer control over the size reported for 0
  • value-and-uncertainty rounding operating on array arguments rather than a dict from the ROOT reader
  • a multiple-uncertainty version of the above, performing rounding of the value to match the precision used for the largest uncertainty component

The functions have been regrouped as to not mix up the file helpers and data helpers -- sorry, this makes the diff more complicated.

Also adding doc text to clarify the asymm-uncertainty convention and hence (usual) need for minus signs on statistical "down" variations.


📚 Documentation preview 📚: https://hepdata-lib--300.org.readthedocs.build/en/300/

…onvention and requirements for asymm uncertainties
@GraemeWatt
Copy link
Copy Markdown
Member

Thanks, but the test test_round_value_and_uncertainty is now failing. The linting also gives three errors. Please add tests for the new function to ensure code coverage and make sure that tests/docs/linting passes on your local instance, as requested at the top of my comment.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

❌ Patch coverage is 83.46457% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.45%. Comparing base (1e912d3) to head (23e6a33).

Files with missing lines Patch % Lines
hepdata_lib/helpers.py 83.46% 16 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #300      +/-   ##
==========================================
+ Coverage   92.21%   92.45%   +0.23%     
==========================================
  Files           5        5              
  Lines        1118     1153      +35     
  Branches      237      254      +17     
==========================================
+ Hits         1031     1066      +35     
+ Misses         60       59       -1     
- Partials       27       28       +1     
Flag Coverage Δ
unittests-3.10 90.88% <83.46%> (+0.28%) ⬆️
unittests-3.11 90.88% <83.46%> (+0.28%) ⬆️
unittests-3.12 90.88% <83.46%> (+0.28%) ⬆️
unittests-3.13 90.88% <83.46%> (+0.28%) ⬆️
unittests-3.6 90.45% <83.46%> (+0.30%) ⬆️
unittests-3.7 90.45% <83.46%> (+0.30%) ⬆️
unittests-3.8 92.19% <83.46%> (+0.24%) ⬆️
unittests-3.9 90.63% <83.46%> (+0.29%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…: it was not actually treating the error sources independently. Fixed now, and the tests are passing as well as manually sanity-checked
@agbuckley
Copy link
Copy Markdown
Contributor Author

@GraemeWatt I've taken a look at the "failed" code coverage -- as far as I can see, from https://app.codecov.io/gh/HEPData/hepdata_lib/pull/300/blob/hepdata_lib/helpers.py the new functions I added are fully covered, it's really complaining about the pre-existing get_value_size_wrt_reference(), round_value_to_decimals() and round_value_and_uncertainty_to_decimals(). So I think it'd be appropriate to merge anyway, since this patch hasn't actually reduced the level of coverage... maybe just happened to re-highlight it.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends hepdata_lib’s numeric rounding/precision utilities to support array-based rounding (including multi-uncertainty rounding) and updates the user documentation to clarify the signed convention for asymmetric uncertainties (down/up variations).

Changes:

  • Add get_number_size, round_multiple, and new array-oriented value/uncertainty rounding helpers.
  • Refactor/regroup helper functions (file/command helpers vs numerical helpers) and adjust related tests.
  • Clarify asymmetric uncertainty sign conventions in the usage documentation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
hepdata_lib/helpers.py Adds new precision/rounding helpers and reorganizes helper sections; updates rounding logic to support array inputs and multiple uncertainties.
tests/test_helpers.py Adds unit tests for the new size/rounding helpers and array-based rounding functions.
docs/usage.rst Updates documentation to clarify signed asymmetric-uncertainty conventions (down/up deltas).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread hepdata_lib/helpers.py Outdated
Comment on lines 204 to 209
: param value : number to evaluate
: type value : float or tuple[float]

value_precision = get_number_precision(value)
absolute_digits = -value_precision + relative_digits # pylint: disable=invalid-unary-operand-type
: returns : order of magnitude (rounded-up power of 10) of ``value``,
normally integer except in the zero-value failure mode

Comment thread hepdata_lib/helpers.py Outdated
Comment on lines +380 to +399
def round_value_and_multiple_uncertainties_arrs(vals, unclists, sig_digits_unc=2):
"""
if not os.path.exists(path_to_file):
raise RuntimeError("Cannot find file: " + path_to_file)
return True
Round values and multiple uncertainty sources according to the precision of the
largest uncertainty, and also round each (asymm) uncertainty to a given number
of significant digits.

def check_file_size(path_to_file, upper_limit=None, lower_limit=None):
"""
Check that the file size is between the upper and lower limits.
If not, raise RuntimeError.
Named with the _arrs suffix as the pre-existing, canonically named
versions operate on dicts from the ROOT reader.

:param path_to_file: File path to check.
:type path_to_file: string
The rounding of each error source is performed independently, with at least one
sd always shown. The smallest precision encountered in the error set (i.e. the
largest uncertainty component) is used to define the precision of the value's
rounding. At least one sd of the value will always be reported, though 100% errors
are not commonly published.

:param upper_limit: Upper size limit in MB.
:type upper_limit: float
: param cont : dictionary as returned e.g. by ``RootFileReader::read_hist_1d()``
: type cont : dictionary

:param lower_limit: Lower size limit in MB.
:type lower_limit: float
: param sig_digits_unc: how many significant digits used to round the uncertainty
: type sig_digits_unc: integer
Comment thread hepdata_lib/helpers.py
Comment on lines +248 to +276
def get_value_size_wrt_reference(value, reference, size_for_zero=float("nan")):
"""
round all values in a dictionary to some decimals in one go
default round to 3 digits after period
possible use case: correlations where typical values are within -1,1
Like the get_value_precision_wrt_reference but calling get_number_size
rather than get_number_precision, and with the optional zero-return
option of the former.

: param cont : dictionary as returned e.g. by RootFileReader::read_hist_1d()
: type cont : dictionary
``value`` and ``reference`` are both float and/or int
``value`` can be float when reference is an int and vice-versa

: param decimals: how many decimals for the rounding
: type decimals: integer
: param value: first value
: type value: float, int

: param reference: reference value (usually the uncertainty on value)
: type reference: float, int

: param size_for_zero: the size value to be used for zero-valued ``value`` or ``reference``
: type size_for_zero: float, int
"""

decimals = int(decimals)
this_function = "get_value_size_wrt_reference()"
good_types = [int, float]
arguments = [value, reference]

for i, val in enumerate(cont[key]):
if isinstance(val, tuple):
cont[key][i] = (round(val[0], decimals), round(val[1], decimals))
else:
cont[key][i] = round(val, decimals)
# first check all arguments have appropriate type
for input_arg in arguments:
if not any(isinstance(input_arg, x) for x in good_types):
raise ValueError("Unsupported input type passed to " + this_function)

return get_number_size(value, size_for_zero) - get_number_size(reference, size_for_zero)
Comment thread hepdata_lib/helpers.py
Comment on lines 23 to +56
subprocess_args = {
"args": command,
"stdin": subprocess.PIPE,
"stdout": subprocess.PIPE,
"stderr": subprocess.PIPE,
"shell": True,
"universal_newlines": True
}
with subprocess.Popen(**subprocess_args) as proc:
exit_code = proc.wait()
if exit_code == 127:
print("Command does not exist:", command)
return False
if exit_code != 0:
result = ""
for line in proc.stderr:
result = result + line
raise RuntimeError(result)
return True


def convert_pdf_to_png(source, target):
"""
Wrapper for the ImageMagick convert utility.

:param source: Source file in PDF format.
:type source: str
:param target: Output file in PNG format.
:type target: str
"""
assert os.path.exists(source), f"Source file does not exist: {source}"

command = f"convert -flatten -density 300 -fuzz 1% -trim +repage {source} {target}"
command_ok = execute_command(command)
Comment thread hepdata_lib/helpers.py
Comment on lines +80 to +93
def file_is_outdated(file_path, reference_file_path):
"""
Check if the given file is outdated compared to the reference file.

Also returns true if the reference file does not exist.

:param file_path: Path to the file to check.
:type file_path: str
:param reference_file_path: Path to the reference file.
:type reference_file_path: str
"""
if not os.path.exists(reference_file_path):
raise RuntimeError(f"Reference file does not exist: {reference_file_path}")
if not os.path.exists(file_path):
Comment thread hepdata_lib/helpers.py Outdated

# handle tuples like get_number_precision does
if isinstance(value, tuple):
return tuple(get_number_size(x) for x in value)
Comment thread hepdata_lib/helpers.py
Comment on lines +316 to +332
try: #< if this fails, uncs isn't iterable -> fall back to scalar
# get orders of magnitude of each component
unc_orders = [get_number_size(u) for u in uncs]
# base the nominal precision on the target number of sd's on the largest component
ptarget = -int(np.nanmax(unc_orders)) + sig_digits
# customise the precisions for each component (if instructed to prevent rounding to zero)
ptargets = [(max(ptarget, -uo+1) if no_round_to_zero else ptarget) for uo in unc_orders]
# do the (maybe custom) rounding
newuncs = [round(u, ptargets[i]) for (i, u) in enumerate(uncs)]
# return as a tuple if the input was a tuple (for ROOT use-case & test consistency)
if isinstance(uncs, tuple):
newuncs = tuple(newuncs)
return newuncs, ptargets if no_round_to_zero else ptarget
except TypeError:
unc_order = get_number_size(uncs)
newunc = relative_round(uncs, sig_digits)
return newunc, [-unc_order+sig_digits]
Comment thread hepdata_lib/helpers.py Outdated
# return as a tuple if the input was a tuple (for ROOT use-case & test consistency)
if isinstance(uncs, tuple):
newuncs = tuple(newuncs)
return newuncs, ptargets if no_round_to_zero else ptarget
@GraemeWatt
Copy link
Copy Markdown
Member

@agbuckley : yes, the reordered functions cause issues for the code coverage and the AI review (just initiated). Can you please look through the AI review comments and address ones relevant to your code changes (if meaningful)? At least the first docstring comments seem valid. You can check Read the Docs for the PR branch. What about adding links from the Uncertainties section to the docs for the relevant helper functions in the code documentation to make them more visible? Maybe @cippy could also review the PR, then I usually leave the final sign-off and merge to @clelange. (It's a CERN holiday today.)

@agbuckley
Copy link
Copy Markdown
Contributor Author

Hopefully all sufficient now

Copy link
Copy Markdown
Member

@GraemeWatt GraemeWatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but I'll leave it to @clelange to approve and merge.

@GraemeWatt GraemeWatt requested a review from clelange May 5, 2026 14:13
@clelange
Copy link
Copy Markdown
Collaborator

clelange commented May 5, 2026

I'm currently quite busy with detector operations, but it's on my list and I'll try to get to it soon. Thanks for your patience (and contribution and review)!

Comment thread hepdata_lib/helpers.py Outdated
for iu, u in enumerate(uncs_ipt):
u_rnd, uprecisions = round_multiple(u, sig_digits_unc, True)
unclists[iu][ipt] = u_rnd
minuncprecision = int(np.nanmin(np.hstack((uprecisions, minuncprecision))))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but it might be quick to test as well for you: I think a zero uncertainty source can still dominate the precision chosen for the central value here. For example, with vals=[1.234], unclists=[[0.0], [0.0123]], sig_digits_unc=2, line 417 builds uncs_ipt=[0.0, 0.0123]. The first source goes through round_multiple at line 421; because round_multiple(0.0, 2, True) returns precision [0] at lines 332-333, this line here sets minuncprecision to 0. The second source returns precision [3], but minuncprecision stays 0, then line 425 sets valprecision=0 and line 426 rounds 1.234 to 1.0. Could zero-sized uncertainty sources be ignored when there is at least one finite nonzero uncertainty source, with an explicit all-zero fallback?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @clelange , thanks for the detailed debugging! I agree -- this is an unintended consquence. I'll rethink the replacement of all-NaN returns from round_multiple, which was written with the all-error-sources-null case in mind, then I had to restructure to handle asymm errors and it broke the following logic without showing up in my tests. Update soon

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now done. Tests extended and passing.

Comment thread hepdata_lib/helpers.py Outdated
Comment on lines +377 to +379
uncs[i], uncprecisions = round_multiple(unc, sig_digits_unc, True)
valprecision = -get_number_size(val)+1
vals[i] = round(val, max(int(np.nanmin(uncprecisions)), valprecision))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the behavior of the existing public round_value_and_uncertainty helper for zero scalar uncertainties. For example, round_value_and_uncertainty({"v": [1234.56], "u": [0.0]}, "v", "u", 2) used to leave the value as 1234.56 because relative_round(0.0, 2) returned 0.0 and the old value precision calculation effectively preserved the central value. With this change round_multiple(0.0, 2, True) returns precision [0], so this rounds the value with 0 decimal places and produces 1235.0. Is that intended? If so, could we add an explicit test and documentation for the new zero-uncertainty behavior; otherwise this should probably preserve the previous public-helper behavior. This would also help with coverage, I think.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I've fixed this as well, and added a new optional argument to all to set a sig figs rounding precision for values in the presence of completely null uncertainties (the chances of this actually meaning perfect experimental precision are zero, but the user needs to be the one giving the direction in such a case!)

Comment thread hepdata_lib/helpers.py Outdated
"""
Wrapper for the ImageMagick convert utility.
#assert isinstance(cont, dict)
round_value_and_uncertainty_arrs(cont[val_key], cont[unc_key], sig_digits_unc)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where round_value_and_uncertainty_arrs gets called in round_value_and_uncertainty, see my comment above

…a value now preserve full value precision, unless a new optional arg is used to set a target sig figs for such cases
@clelange clelange merged commit 7c8ded7 into HEPData:main May 11, 2026
36 of 37 checks passed
@clelange
Copy link
Copy Markdown
Collaborator

Thanks @agbuckley !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Precision and number of significant digits

4 participants