Skip to content

fix: std.manifestPython renders floats in CPython 3 repr() style#1006

Open
He-Pin wants to merge 3 commits into
databricks:masterfrom
He-Pin:worktree-fix-manifest-python-float
Open

fix: std.manifestPython renders floats in CPython 3 repr() style#1006
He-Pin wants to merge 3 commits into
databricks:masterfrom
He-Pin:worktree-fix-manifest-python-float

Conversation

@He-Pin

@He-Pin He-Pin commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Motivation

std.manifestPython previously used Java's Double.toString,
producing "1.0E-6" instead of Python 3's "1e-06", and rendering
1e100 as a 100-digit fixed-point string rather than scientific
notation.

Modification

  • Override visitFloat64 in PythonRenderer to match CPython 3
    repr() rules: integer-valued doubles omit .0; scientific notation
    uses lowercase e, signed exponent, zero-padded to 2 digits; fixed
    notation is used for adjusted exponents in [-4, 16).
  • Cross-platform safe (JVM / JS / WASM) by handling both uppercase E
    (JVM Double.toString) and lowercase e (Scala.js) input.
  • New test fixture manifest_python_repr.jsonnet with 30+ assertions.

Result

std.manifestPython now produces output consistent with CPython 3
repr() for float values. Cross-implementation comparison shows
alignment with go-jsonnet, C++ jsonnet, and jrsonnet. All tests pass
on Scala 2.12 / 2.13 / 3.3 across JVM / JS / Wasm / Native / Graal.

References

Motivation:
std.manifestPython rendered non-integer doubles using Java's
Double.toString, producing output like "1.0E-6" (uppercase E, no
exponent sign, no zero-padding) that diverged from Python 3's
repr() ("1e-06") and from go-jsonnet / jrsonnet. It also rendered
very large integer-valued doubles (e.g. 1e100) as 100-digit
fixed-point strings instead of "1e+100". Since std.manifestPython
is explicitly "render as Python source", matching Python 3's
repr() is the canonical behavior.

Modification:
- Override visitFloat64 in PythonRenderer so that float rendering
  follows CPython 3's repr() rules:
    * integer-valued doubles with |d| < 1e16 emit without a decimal
      point (via the existing writeLongDirect path), so 1.0 → "1",
      42.0 → "42", 1e15 → "1000000000000000";
    * integer-valued doubles with |d| >= 1e16 route through the new
      formatPythonFloat path so 1e16 → "1e+16", 1e100 → "1e+100"
      (Python 3's repr() switches to scientific at 1e+16);
    * negative zero emits as "-0" (preserving the sign bit);
    * non-integer doubles use the shortest round-trip form:
      scientific notation (lowercase "e", signed exponent, >=2
      exponent digits zero-padded) for adjusted exponents outside
      [-4, 16), fixed-point otherwise;
    * mantissa trailing zeros are stripped (e.g. Java "1.0E20" →
      "1e+20", not "1.0e+20").
- formatPythonFloat parses Java's Double.toString output (shortest
  round-trip form on JDK 15+, Ryu algorithm) and shifts the decimal
  point on the integer mantissa so that exactly one digit precedes
  the point. The shift preserves the parsed IEEE 754 value, so
  round-trip safety is retained.
- Added manifest_python_repr.jsonnet covering integer-valued doubles
  (0.0, 1.0, 42.0, -0.0, 1e15, 1e16), fractional doubles (1.5,
  0.0001, pi, 1e-5, 1e-10, 1e-100, 1e20, 1e100), non-float types,
  and std.manifestPythonVars (which reuses PythonRenderer).

Result:
std.manifestPython and std.manifestPythonVars now match Python 3
repr() on every covered input. All file tests
(new_test_suite + go_test_suite + test_suite) and EvaluatorTests
pass on Scala 2.12 / 2.13 / 3.3.

Cross-implementation comparison:
| Input            | sjsonnet (before) | sjsonnet (after) | Python 3.12 | go-jsonnet 0.22.0 | jrsonnet 0.5.0-pre99 | C++ jsonnet 0.22.0 |
|------------------|-------------------|------------------|-------------|-------------------|----------------------|--------------------|
| 0.000001         | "1.0E-6"          | "1e-06"          | "1e-06"     | "9.999..e-07"     | "0.000001"           | "9.999..e-07"      |
| 1e-10            | "1.0E-10"         | "1e-10"          | "1e-10"     | "1e-10"           | "0.0000000001"       | "1e-10"            |
| 1e16             | "10000000..0"     | "1e+16"          | "1e+16"     | "10000000..0"     | "10000000..0"        | "10000000..0"      |
| 1e20             | "1.0E20"          | "1e+20"          | "1e+20"     | "10000000..0"     | "10000000..0"        | "10000000..0"      |
| 1e100            | 100-digit string  | "1e+100"         | "1e+100"    | 100-digit string  | 100-digit string     | 100-digit string   |
| 1e-4             | "0.0001"          | "0.0001"         | "0.0001"    | "0.0001"          | "0.0001"             | "0.0001"           |
| 1e-5             | "1.0E-5"          | "1e-05"          | "1e-05"     | "1e-05"           | "0.00001"            | "1e-05"            |
| 1.5              | "1.5"             | "1.5"            | "1.5"       | "1.5"             | "1.5"                | "1.5"              |
| 1.0              | "1"               | "1"              | "1.0" *     | "1"               | "1"                  | "1"                |
| -0.0             | "-0"              | "-0"             | "-0.0" *    | "-0"              | "-0"                 | "-0"               |
| 3.141592653589793| "3.141592653589793"| "3.141592653589793" | "3.141592653589793" | "3.1415926535897931" ** | "3.141592653589793" | "3.1415926535897931" ** |

* Python 3's repr() adds ".0" to integer-valued floats. The existing
  sjsonnet/go-jsonnet/jrsonnet/C++ convention is to omit it; this PR
  preserves that cross-implementation consensus for 1.0 / -0.0 / 42.0.
** go-jsonnet / C++ jsonnet output one extra digit due to Go's `%g`
  formatting; sjsonnet / jrsonnet / Python 3 all agree on the shorter
  "3.141592653589793" form.
@He-Pin He-Pin marked this pull request as draft June 20, 2026 04:54
Motivation:
PR databricks#1006's formatPythonFloat relied on JVM-specific Double.toString output
format (uppercase 'E', normalized mantissa with 1 digit before decimal).
On Scala.js, Double.toString delegates to JavaScript's Number.toString,
which uses lowercase 'e', different scientific notation thresholds, and
fixed-point format for a wider range of values. This caused CI failures
on JS and WASM builds.

Modification:
Three bugs fixed in PythonRenderer.formatPythonFloat:
1. Parse both uppercase 'E' (JVM) and lowercase 'e' (Scala.js) exponent
   markers in Double.toString output.
2. Compute adjustedExp using formula rawExp + effectiveDotIdx - firstNonZero
   - 1, which correctly handles JS fixed-point formats without exponent
   (e.g. "10000000000000000" for 1e16, "0.000001" for 1e-6) in addition
   to JVM's normalized scientific notation.
3. Strip leading zeros from fpDigits in the fixed-point adjustedExp < 0
   branch to avoid duplicating zeros already present in JS fixed-point
   format (e.g. "00001" from "0.0001").

Result:
All file tests pass on JVM (Scala 2.12/2.13/3.3), JS, and WASM. The
manifest_python_repr.jsonnet test fixture now validates correctly across
all platforms.
@He-Pin He-Pin marked this pull request as ready for review June 20, 2026 07:41
Motivation:
Sub-agent review found a regression in the negative fixed-point branch
of formatPythonFloat: the padding formula subtracted (sigDigits.length-1)
from the leading zero count, producing wrong output for multi-digit
significands in [1e-4, 1e-3). E.g. 0.0012 rendered as "0.012" instead
of "0.0012".

Modification:
- Remove incorrect subtraction of (sigDigits.length - 1) from the
  padding count. The formula -(adjustedExp + 1) alone gives the
  correct number of zeros between the decimal point and the first
  significant digit.
- Remove dead trailing-zero stripping in the middle fixed-point branch
  (fpDigits already has trailing zeros stripped upstream).
- Add test assertions for 0.0012 and 0.00012 to cover multi-digit
  significands in the negative fixed-point range.

Result:
All tests pass on JVM/JS/WASM including the new regression tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant