fix: std.manifestPython renders floats in CPython 3 repr() style#1006
Open
He-Pin wants to merge 3 commits into
Open
fix: std.manifestPython renders floats in CPython 3 repr() style#1006He-Pin wants to merge 3 commits into
He-Pin wants to merge 3 commits into
Conversation
Motivation:
std.manifestPython rendered non-integer doubles using Java's
Double.toString, producing output like "1.0E-6" (uppercase E, no
exponent sign, no zero-padding) that diverged from Python 3's
repr() ("1e-06") and from go-jsonnet / jrsonnet. It also rendered
very large integer-valued doubles (e.g. 1e100) as 100-digit
fixed-point strings instead of "1e+100". Since std.manifestPython
is explicitly "render as Python source", matching Python 3's
repr() is the canonical behavior.
Modification:
- Override visitFloat64 in PythonRenderer so that float rendering
follows CPython 3's repr() rules:
* integer-valued doubles with |d| < 1e16 emit without a decimal
point (via the existing writeLongDirect path), so 1.0 → "1",
42.0 → "42", 1e15 → "1000000000000000";
* integer-valued doubles with |d| >= 1e16 route through the new
formatPythonFloat path so 1e16 → "1e+16", 1e100 → "1e+100"
(Python 3's repr() switches to scientific at 1e+16);
* negative zero emits as "-0" (preserving the sign bit);
* non-integer doubles use the shortest round-trip form:
scientific notation (lowercase "e", signed exponent, >=2
exponent digits zero-padded) for adjusted exponents outside
[-4, 16), fixed-point otherwise;
* mantissa trailing zeros are stripped (e.g. Java "1.0E20" →
"1e+20", not "1.0e+20").
- formatPythonFloat parses Java's Double.toString output (shortest
round-trip form on JDK 15+, Ryu algorithm) and shifts the decimal
point on the integer mantissa so that exactly one digit precedes
the point. The shift preserves the parsed IEEE 754 value, so
round-trip safety is retained.
- Added manifest_python_repr.jsonnet covering integer-valued doubles
(0.0, 1.0, 42.0, -0.0, 1e15, 1e16), fractional doubles (1.5,
0.0001, pi, 1e-5, 1e-10, 1e-100, 1e20, 1e100), non-float types,
and std.manifestPythonVars (which reuses PythonRenderer).
Result:
std.manifestPython and std.manifestPythonVars now match Python 3
repr() on every covered input. All file tests
(new_test_suite + go_test_suite + test_suite) and EvaluatorTests
pass on Scala 2.12 / 2.13 / 3.3.
Cross-implementation comparison:
| Input | sjsonnet (before) | sjsonnet (after) | Python 3.12 | go-jsonnet 0.22.0 | jrsonnet 0.5.0-pre99 | C++ jsonnet 0.22.0 |
|------------------|-------------------|------------------|-------------|-------------------|----------------------|--------------------|
| 0.000001 | "1.0E-6" | "1e-06" | "1e-06" | "9.999..e-07" | "0.000001" | "9.999..e-07" |
| 1e-10 | "1.0E-10" | "1e-10" | "1e-10" | "1e-10" | "0.0000000001" | "1e-10" |
| 1e16 | "10000000..0" | "1e+16" | "1e+16" | "10000000..0" | "10000000..0" | "10000000..0" |
| 1e20 | "1.0E20" | "1e+20" | "1e+20" | "10000000..0" | "10000000..0" | "10000000..0" |
| 1e100 | 100-digit string | "1e+100" | "1e+100" | 100-digit string | 100-digit string | 100-digit string |
| 1e-4 | "0.0001" | "0.0001" | "0.0001" | "0.0001" | "0.0001" | "0.0001" |
| 1e-5 | "1.0E-5" | "1e-05" | "1e-05" | "1e-05" | "0.00001" | "1e-05" |
| 1.5 | "1.5" | "1.5" | "1.5" | "1.5" | "1.5" | "1.5" |
| 1.0 | "1" | "1" | "1.0" * | "1" | "1" | "1" |
| -0.0 | "-0" | "-0" | "-0.0" * | "-0" | "-0" | "-0" |
| 3.141592653589793| "3.141592653589793"| "3.141592653589793" | "3.141592653589793" | "3.1415926535897931" ** | "3.141592653589793" | "3.1415926535897931" ** |
* Python 3's repr() adds ".0" to integer-valued floats. The existing
sjsonnet/go-jsonnet/jrsonnet/C++ convention is to omit it; this PR
preserves that cross-implementation consensus for 1.0 / -0.0 / 42.0.
** go-jsonnet / C++ jsonnet output one extra digit due to Go's `%g`
formatting; sjsonnet / jrsonnet / Python 3 all agree on the shorter
"3.141592653589793" form.
7 tasks
Motivation: PR databricks#1006's formatPythonFloat relied on JVM-specific Double.toString output format (uppercase 'E', normalized mantissa with 1 digit before decimal). On Scala.js, Double.toString delegates to JavaScript's Number.toString, which uses lowercase 'e', different scientific notation thresholds, and fixed-point format for a wider range of values. This caused CI failures on JS and WASM builds. Modification: Three bugs fixed in PythonRenderer.formatPythonFloat: 1. Parse both uppercase 'E' (JVM) and lowercase 'e' (Scala.js) exponent markers in Double.toString output. 2. Compute adjustedExp using formula rawExp + effectiveDotIdx - firstNonZero - 1, which correctly handles JS fixed-point formats without exponent (e.g. "10000000000000000" for 1e16, "0.000001" for 1e-6) in addition to JVM's normalized scientific notation. 3. Strip leading zeros from fpDigits in the fixed-point adjustedExp < 0 branch to avoid duplicating zeros already present in JS fixed-point format (e.g. "00001" from "0.0001"). Result: All file tests pass on JVM (Scala 2.12/2.13/3.3), JS, and WASM. The manifest_python_repr.jsonnet test fixture now validates correctly across all platforms.
Motivation: Sub-agent review found a regression in the negative fixed-point branch of formatPythonFloat: the padding formula subtracted (sigDigits.length-1) from the leading zero count, producing wrong output for multi-digit significands in [1e-4, 1e-3). E.g. 0.0012 rendered as "0.012" instead of "0.0012". Modification: - Remove incorrect subtraction of (sigDigits.length - 1) from the padding count. The formula -(adjustedExp + 1) alone gives the correct number of zeros between the decimal point and the first significant digit. - Remove dead trailing-zero stripping in the middle fixed-point branch (fpDigits already has trailing zeros stripped upstream). - Add test assertions for 0.0012 and 0.00012 to cover multi-digit significands in the negative fixed-point range. Result: All tests pass on JVM/JS/WASM including the new regression tests.
5298215 to
0284f1c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
std.manifestPythonpreviously used Java'sDouble.toString,producing
"1.0E-6"instead of Python 3's"1e-06", and rendering1e100as a 100-digit fixed-point string rather than scientificnotation.
Modification
visitFloat64inPythonRendererto match CPython 3repr()rules: integer-valued doubles omit.0; scientific notationuses lowercase
e, signed exponent, zero-padded to 2 digits; fixednotation is used for adjusted exponents in
[-4, 16).E(JVM
Double.toString) and lowercasee(Scala.js) input.manifest_python_repr.jsonnetwith 30+ assertions.Result
std.manifestPythonnow produces output consistent with CPython 3repr()for float values. Cross-implementation comparison showsalignment with go-jsonnet, C++ jsonnet, and jrsonnet. All tests pass
on Scala 2.12 / 2.13 / 3.3 across JVM / JS / Wasm / Native / Graal.
References
repr()for floats:https://docs.python.org/3/library/functions.html#repr