Skip to content

Add Spec.Data.Value.Budget benchmark module#7738

Draft
Unisay wants to merge 3 commits intomasterfrom
yura/issue-2177-value-builtin-budget-spec
Draft

Add Spec.Data.Value.Budget benchmark module#7738
Unisay wants to merge 3 commits intomasterfrom
yura/issue-2177-value-builtin-budget-spec

Conversation

@Unisay
Copy link
Copy Markdown
Contributor

@Unisay Unisay commented Apr 23, 2026

What

Adds Spec.Data.Value.Budget under plutus-tx-plugin/test-ledger-api/. The module measures CPU, memory, AST size and flat size for the Value builtins (unsafeDataAsValue, lookupCoin, unionValue) against the pure Plutus Tx Data-backed Value API (valueOf, unionWith). Four shapes, four hit positions each for lookup, plus union-then-lookup:

  • S1 — 1 policy with 1 token.
  • S3 — 3 policies, each with 1 token.
  • S8 — 8 policies, each with 1 token. Tuned so the first-position lookup ratio lands near 1:1 (this is the crossover).
  • S100 — 11 policies, one with 1 token and ten with 10 tokens each.

Each goldenBundle produces .pir, .uplc, and .eval. 108 goldens total under 9.6/. Currency symbols are 28 bytes, token names are 32 bytes.

Why

Picks up IntersectMBO/plutus-private#2177. Aiken community members reported to Philip that switching from pure-Tx Value ops to the new builtins caused regressions on small values. There were no systematic numbers to argue about, so the community had nothing to reproduce.

Findings

Full write-up posted as a comment on this PR. Two-line version:

  • Lookup has a real crossover. For a first-position hit it lands near N=8 total tokens; below that the builtin wins by 2× to 4×; above it the non-builtin valueOf wins whenever the key is near the front of the list or not present. For last-position hits the builtin keeps winning past N=100.
  • Union has no crossover in this range. unionValue beats unionWith by 15× CPU at S1 and 43× CPU / 677× memory at S100, and the gap grows with N.

The specific small-value regression from the reports does not reproduce under the Plutus Tx plugin. The plugin emits essentially a single-step builtin invocation. Most likely the Aiken reports reflect different compiler output; an Aiken vs Plinth UPLC diff would close that out.

Closes plutus-private/#2177

Compares the Value builtins (unsafeDataAsValue, lookupCoin, unionValue)
against PlutusLedgerApi.V1.Data.Value's valueOf/unionWith across four
shapes (S1, S3, S8, S100) at four hit positions for lookup, plus
union-then-lookup. 108 goldens under test-ledger-api/.

For IntersectMBO/plutus-private#2177.
@Unisay
Copy link
Copy Markdown
Contributor Author

Unisay commented Apr 23, 2026

Value builtins vs. pure Plutus Tx — findings (issue #2177)

TL;DR

The lookup path has a crossover around N = 8 total tokens when the lookup key sits at position 0 of the underlying value list. Below that the builtin wins by 2× to 4×. Above it the non-builtin valueOf wins whenever the key is near the front of the list, or not present at all. For last-position hits the builtin keeps winning past N = 100.

Union has no crossover in this range. unionValue beats unionWith by 15× CPU at S1 and 43× CPU / 677× memory at S100, and the gap grows with size.

The small-value regression Philip reported does not reproduce. Under the Plutus Tx plugin the builtin path compiles to roughly one unValueData + lookupCoin pair, which wins at small sizes. The Aiken reports probably reflect different compiler output. A minimal Aiken reproducer we can diff against Plinth would settle it.

Important framing: Value is a list, not a map

PlutusLedgerApi.V1.Data.Value.Value is a newtype over PlutusTx.Data.AssocMap.Map, and Map is itself a newtype over BuiltinList (BuiltinPair BuiltinData BuiltinData):

newtype Map k a = Map (BuiltinList (BuiltinPair BuiltinData BuiltinData))

No balancing, no hashing, no ordering invariant. The library's own docs are blunt about it (PlutusTx/Data/AssocMap.hs:97-99): "If the Map is not well-defined, the result is the value associated with the left-most occurrence of the key in the list. This operation is O(n)." And Value's flattenValue note (PlutusLedgerApi/V1/Data/Value.hs:426): "the result isn't sorted".

So when the write-up says "position 0", it literally means "head of the underlying BuiltinList". listsToValue (from the testlib) is a straight map over the Haskell list you hand it; the order you wrote is the order you get.

Cost breakdown for a lookup:

  • Position-0 hit: one outer cons unwrap + one equalsData match, then one inner cons unwrap + one equalsData match. Constant in the size of the rest of the value.
  • Position-k hit: k+1 outer cons unwraps + k+1 equalsData comparisons (the last one matches), plus one inner walk of up to m entries. Scales with k.
  • Miss: full outer walk (N+1 cons unwraps, N+1 equalsData comparisons, all failing), then a default-branch return without touching any inner map. Scales with the outer length only.

This is why the non-builtin CPU in the table below is shape-independent for position 0 and shape-dependent for everything else.

Method

Both paths take the same BuiltinData-encoded Value as input (the representation a validator receives from the ledger). The builtin path unwraps via unsafeDataAsValue (internally unValueData) and calls lookupCoin / unionValue. The non-builtin path unwraps via unsafeFromBuiltinData :: Value and calls PlutusLedgerApi.V1.Data.Value.valueOf / unionWith. Each scenario is a goldenBundle producing .pir, .uplc, and .eval goldens (CPU, memory, AST size, flat size).

Shapes:

Shape Contents Total tokens
S1 1 policy with 1 token (position 0 = ada in testlib convention) 1
S3 3 policies, each with 1 token 3
S8 8 policies, each with 1 token (crossover) 8
S100 11 policies: one with 1 token, ten with 10 tokens each 101

Lookup keys tested:

  • first — head of the outer list (position 0).
  • middle — roughly halfway into the outer list (position N/2).
  • last — the final entry of the outer list.
  • miss — a currency symbol that isn't present at all.

Results: lookup CPU

Ratio is non-builtin / builtin. Ratio > 1 means the builtin is cheaper; ratio < 1 means the non-builtin is cheaper.

Shape Position Builtin CPU Non-builtin CPU Ratio Winner
S1 first 895 629 3 387 176 3.78× builtin
S1 miss 895 629 1 875 266 2.09× builtin
S3 first 1 672 681 3 387 176 2.03× builtin
S3 middle 1 672 681 5 145 393 3.08× builtin
S3 last 1 672 681 6 581 773 3.93× builtin
S3 miss 1 672 681 4 748 026 2.84× builtin
S8 first 3 611 149 3 387 176 0.94× parity
S8 middle 3 611 149 9 454 533 2.62× builtin
S8 last 3 611 149 13 763 673 3.81× builtin
S8 miss 3 611 149 11 929 926 3.30× builtin
S100 first 22 108 153 3 387 176 0.15× non-builtin by 6.53×
S100 middle 22 108 153 16 636 433 0.75× non-builtin by 1.33×
S100 last 22 108 153 31 000 233 1.40× builtin
S100 miss 22 108 153 16 239 066 0.73× non-builtin by 1.36×

Mechanical reading

Two observations fall straight out of the numbers.

First, the builtin CPU within a shape does not depend on hit position. The whole cost sits in unsafeDataAsValue, which has to walk the entire Data to validate its shape and reconstruct a BuiltinValue. lookupCoin afterwards is a single CEK step. Builtin cost is therefore O(total value size), and identical for every position within a given shape.

Second, the non-builtin CPU depends on hit position, not on total size. A first-position hit costs 3 387 176 across every shape, because the outer AssocMap.lookup' exits after one cons. A last-position hit scales with outer-list length plus inner-list length. A miss costs a full outer walk.

The builtin CPU scales roughly linearly in the number of data nodes (policies + tokens): around 200K CPU per node at scale, with a ~500K fixed overhead. The non-builtin CPU tracks hit distance, so lookup_Sn_first is flat at 3 387 176 for every n while lookup_Sn_last grows.

The S8 crossover scenario

S8 (8 single-token policies, 8 tokens total) is the shape where the two paths come within 7% of each other at the first-position lookup: builtin 3 611 149 vs non-builtin 3 387 176. At every other position in S8 the builtin still wins by 2.6× to 3.8×, because those positions force the non-builtin to walk further into the outer list.

The crossover is position-specific. For a first-position hit it's near N=8. For a last-position hit it hasn't been reached at N=100. For a miss it sits somewhere between N=8 and N=100. For union it doesn't happen within any size I looked at.

Results: union-then-lookup

The conservation-of-value pattern: union two BuiltinData-encoded values and read the value of some key in the result. The non-builtin path allocates a fresh nested AssocMap; the builtin path calls unionValue and stops.

Shape Builtin CPU Non-builtin CPU CPU ratio Builtin Mem Non-builtin Mem Mem ratio
S1 1 876 591 28 679 559 15.3× 2 279 110 486 48×
S3 4 131 831 87 810 607 21.3× 2 539 302 003 119×
S8 9 766 539 313 714 512 32.1× 3 189 895 158 281×
S100 79 832 775 3 460 294 959 43.3× 11 319 7 669 071 677×

The S100 non-builtin union costs 3.46 billion CPU units. That's most of a V3 max-budget block spent on a single conservation check. The builtin path stays at 80 M. Memory is 7.67 M vs 11 K.

unionValue only does the work needed to produce the result BuiltinValue. unionWith (+) walks both outer lists, unions the matching inner lists pointwise through the These algebra, and rebuilds the entire nested structure. Nothing about the size regime makes the non-builtin path cheaper, and the gap grows roughly linearly in N.

Interpretation

Ziyang's hypothesis

From the Slack thread:

[The regression] is the conversion cost of unValueData. For small values, the non-builtin path wins because valueOf pattern-matches a few levels into the Data and stops.

The small-value half of this does not reproduce. At S1 the builtin is 3.78× faster, not slower. The plugin emits essentially a single-step builtin invocation.

The large-value half is real, though the mechanism is the opposite of what "the conversion dominates at small sizes" would suggest. unsafeDataAsValue's cost grows with total data size, while valueOf's cost is bounded by hit distance. So the crossover favours the non-builtin only once the full-data traversal has grown to match the short-circuit cost at a given position. For a first-position hit, that's around N=8.

Why Aiken users might still be right

If Aiken's compiler emits UPLC where the builtin path carries extra overhead (thunks, wrappers, non-inlined intermediates), the crossover shifts left and the builtins can look worse at small N. That's the likeliest explanation for Philip's reports. The Plutus Tx plugin doesn't produce that shape. A minimal Aiken reproducer we can compare UPLC-to-UPLC against Plinth would close this out.

Suggested guidance for V3 users (Plutus Tx plugin)

  • Lookup on small values, up to about 8 total tokens: builtin wins at every position.
  • Lookup on medium values, 8 < N < 100: builtin wins except for the first-position case, where non-builtin edges out past N ≈ 8. The gap is small; prefer builtin unless you've measured your specific shape.
  • Lookup on large values, N ≥ 100: non-builtin valueOf wins for first-position hits, middle-position hits on realistic shapes, and misses. Builtin still wins for last-position hits. If the lookup key is statically known and expected to be near the front (e.g. ada in a sorted Value), prefer non-builtin.
  • Union or any composition that produces a new value: builtin always. The gap widens with size.

V4 impact

Plutus V4 plans to add a Value constructor to Data, which would make unsafeDataAsValue a no-op. The builtin's per-lookup cost would drop back to a single lookupCoin call. The crossover would disappear entirely and the builtin would win at every size, every position. These goldens become the before-picture for that change.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

Execution Budget Golden Diff

7918873 (master) vs 61b9557

output

This comment will get updated when changes are made.

@Unisay Unisay self-assigned this Apr 23, 2026
Adds hand-rolled counterparts that operate directly on raw BuiltinData,
bypassing valueOf's newtype/Maybe wrappers and unionWith's These algebra.
Hand-rolled union additionally skips the zero-filter, exploiting the
ledger invariant that tx-output Values have strictly positive quantities.

18 new bundles across the existing shape matrix (S1, S3, S8, S100):
14 lookup + 4 union-then-lookup, paired with the existing builtin and
non-builtin bundles.

For IntersectMBO/plutus-private#2177.
@Unisay
Copy link
Copy Markdown
Contributor Author

Unisay commented Apr 24, 2026

Follow-up: hand-rolled variants added

Per the Slack discussion, I added two more paths to the comparison matrix:

  • Hand-rolled lookup operates directly on raw BuiltinData via unsafeDataAsMap / unsafeDataAsB / unsafeDataAsI / equalsByteString. Bypasses the CurrencySymbol and TokenName newtype wrappers, the Maybe wrapping inside AssocMap.lookup, and the withCurrencySymbol continuation that valueOf chains together.
  • Hand-rolled union is a naive O(|m1|·|m2|) double-pass that exploits the positive-quantities invariant to skip the zero-filter. It materialises a fresh BuiltinData for the result and then feeds that into the hand-rolled lookup for the final Integer.

18 new bundles on the existing (S1, S3, S8, S100) × (first, middle, last, miss) matrix. Commit 022384c.

Lookup CPU

"Handrolled" column added. All numbers in 1 000 CPU units.

Shape Position Builtin Non-builtin Hand-rolled
S1 first 896 3 387 1 353
S1 miss 896 1 875 885
S3 first 1 673 3 387 1 353
S3 middle 1 673 5 145 1 750
S3 last 1 673 6 582 2 152
S3 miss 1 673 4 748 1 689
S8 first 3 611 3 387 1 353
S8 middle 3 611 9 455 2 956
S8 last 3 611 13 764 4 161
S8 miss 3 611 11 930 3 698
S100 first 22 108 3 387 1 353
S100 middle 22 108 16 636 4 965
S100 last 22 108 31 000 8 983
S100 miss 22 108 16 239 4 903

Bold marks the winner for that row.

A few things to call out:

  1. Hand-rolled beats valueOf at every position and every shape. The non-builtin overhead Ziyang flagged in the thread is real and measurable: Maybe wrapping, withCurrencySymbol's continuation call, and the newtype deriving for ToData/UnsafeFromData on CurrencySymbol/TokenName. Stripping all of that gives roughly 2-3× on small shapes and larger gains at S100.

  2. Hand-rolled first-position lookup is essentially a constant 1 353 K CPU across every shape. It short-circuits on the first outer cons and the first inner cons, so nothing downstream is ever touched. Position-bounded, not size-bounded.

  3. Hand-rolled beats the builtin starting at S3 first-position, and at every position in S8 and S100 except S3 last and S8 last. The builtin's unsafeDataAsValue walks the entire data to validate shape and reconstruct a BuiltinValue, so once the value is big enough that "walk everything" costs more than "walk to the hit position", hand-rolled takes over.

  4. Answers Ziyang's question from the thread: valueOf does not compile to optimal UPLC. Hand-rolled has a real advantage on the non-builtin side. Whether that advantage is worth maintaining a separate library of BuiltinData-direct helpers is a product-side judgement, not a Plutus Core one.

Union CPU (and memory)

Shape Builtin CPU Non-builtin CPU Hand-rolled CPU Builtin Mem Non-builtin Mem Hand-rolled Mem
S1 1.88 M 28.68 M 16.52 M 2.3 K 110 K 59.8 K
S3 4.13 M 87.81 M 76.60 M 2.5 K 302 K 204.6 K
S8 9.77 M 313.71 M 613.29 M 3.2 K 895 K 1 168.8 K
S100 79.83 M 3 460.29 M 11 352.08 M 11.3 K 7 669 K 18 777.5 K

Two observations:

  1. Hand-rolled beats unionWith at S1 (1.7×) and marginally at S3 (1.15×), then loses at S8 and S100. The zero-filter and These savings matter at tiny sizes but the materialisation cost dominates as shapes grow.

  2. Hand-rolled union never beats the builtin. At S100 the gap is 142×. This matches what Ziyang and Philip both expected from the cost-model side of the thread.

Two caveats on the hand-rolled union:

  • The algorithm is naive. filterMissingOuter does a full containsKey walk per entry, so the filter pass is O(|m1|·|m2|) on its own. A smarter implementation that threads a "consumed" flag or uses a sort-merge could lower this, but a sort-first step isn't free either.
  • The result is materialised as a fresh BuiltinData. The builtin keeps the intermediate in BuiltinValue (CEK-heap) form, which is why its absolute numbers stay so low. Any hand-rolled path that produces BuiltinData output pays that serialisation tax.

If Philip's djed library implements a smarter union with invariant tracking, I'll measure that too and update these numbers. Until then, the story for union is clean: builtin wins at every size, full stop.

Still open from the thread

  • Standalone unValueData overhead per shape (Ziyang's ask from the thread). Will add as a separate commit.
  • Waiting on Philip's djed-library share and chain stats on real-world Value sizes.

Isolates the conversion tax from any downstream operation across the
four shapes (S1, S3, S8, S100). Enables decomposing the builtin-path
cost into `unsafeDataAsValue` + `lookupCoin`.

4 new bundles. Responds to Ziyang's request in the thread.

For IntersectMBO/plutus-private#2177.
@Unisay
Copy link
Copy Markdown
Contributor Author

Unisay commented Apr 24, 2026

Follow-up: standalone unsafeDataAsValue per shape

Answering Ziyang's ask from the Slack thread. Added a compiled function that evaluates only unsafeDataAsValue bd (returns the BuiltinValue, no downstream op) and measured it across all four shapes. Commit 61b9557.

Decomposition of the builtin path

Shape unsafeDataAsValue alone lookup_*_ada_builtin Delta (lookupCoin alone) lookupCoin share
S1 576 790 895 629 318 839 35.6%
S3 1 344 398 1 672 681 328 283 19.6%
S8 3 263 978 3 611 149 347 171 9.6%
S100 21 732 650 22 108 153 375 503 1.7%

A few things worth noting:

  1. unsafeDataAsValue scales linearly with value size. From S1 to S100 the value contains ~100× more tokens; the cost grows ~38×. The slope is roughly 200 K CPU per additional policy or token entry in the data structure, which matches what the lookup-path numbers already implied.

  2. lookupCoin on the resulting BuiltinValue is essentially constant at 320–375 K CPU per call, with a tiny upward drift as the value grows (probably a field-access or list-head overhead on the materialised BuiltinValue). It's noise relative to the conversion cost.

  3. At S1 the conversion tax is already 64% of the lookup_ada cost. At S100 it's 98%. So the builtin path is essentially unsafeDataAsValue + a constant.

  4. V4's plan to make unsafeDataAsValue a no-op would, on these numbers, reduce lookup_S100_ada_builtin from 22.1 M to ~0.4 M — a 55× speedup on that particular shape. Broadly: every builtin-path CPU number in the matrix becomes lookupCoin or unionValue alone once unsafeDataAsValue is free.

Memory

Memory for standalone unsafeDataAsValue stays very small (756 at S1, 3 176 at S100). Almost all of the memory in lookup_*_ada_builtin was also from unsafeDataAsValue (1 257 and 3 677 respectively), so lookupCoin contributes ~500 memory units on top. Again: noise relative to unsafeDataAsValue.

Open from the thread

Waiting on Philip's djed library share for a smarter hand-rolled union, and on his chain-stats for typical Value sizes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant