Add Spec.Data.Value.Budget benchmark module by Unisay · Pull Request #7738 · IntersectMBO/plutus

Unisay · 2026-04-23T14:00:37Z

What

Adds Spec.Data.Value.Budget under plutus-tx-plugin/test-ledger-api/. The module measures CPU, memory, AST size and flat size for the Value builtins (unsafeDataAsValue, lookupCoin, unionValue) against the pure Plutus Tx Data-backed Value API (valueOf, unionWith). Four shapes, four hit positions each for lookup, plus union-then-lookup:

S1 — 1 policy with 1 token.
S3 — 3 policies, each with 1 token.
S8 — 8 policies, each with 1 token. Tuned so the first-position lookup ratio lands near 1:1 (this is the crossover).
S100 — 11 policies, one with 1 token and ten with 10 tokens each.

Each goldenBundle produces .pir, .uplc, and .eval. 108 goldens total under 9.6/. Currency symbols are 28 bytes, token names are 32 bytes.

Why

Picks up IntersectMBO/plutus-private#2177. Aiken community members reported to Philip that switching from pure-Tx Value ops to the new builtins caused regressions on small values. There were no systematic numbers to argue about, so the community had nothing to reproduce.

Findings

Full write-up posted as a comment on this PR. Two-line version:

Lookup has a real crossover. For a first-position hit it lands near N=8 total tokens; below that the builtin wins by 2× to 4×; above it the non-builtin valueOf wins whenever the key is near the front of the list or not present. For last-position hits the builtin keeps winning past N=100.
Union has no crossover in this range. unionValue beats unionWith by 15× CPU at S1 and 43× CPU / 677× memory at S100, and the gap grows with N.

The specific small-value regression from the reports does not reproduce under the Plutus Tx plugin. The plugin emits essentially a single-step builtin invocation. Most likely the Aiken reports reflect different compiler output; an Aiken vs Plinth UPLC diff would close that out.

Closes plutus-private/#2177

Compares the Value builtins (unsafeDataAsValue, lookupCoin, unionValue) against PlutusLedgerApi.V1.Data.Value's valueOf/unionWith across four shapes (S1, S3, S8, S100) at four hit positions for lookup, plus union-then-lookup. 108 goldens under test-ledger-api/. For IntersectMBO/plutus-private#2177.

Unisay · 2026-04-23T14:00:46Z

Value builtins vs. pure Plutus Tx — findings (issue #2177)

TL;DR

The lookup path has a crossover around N = 8 total tokens when the lookup key sits at position 0 of the underlying value list. Below that the builtin wins by 2× to 4×. Above it the non-builtin valueOf wins whenever the key is near the front of the list, or not present at all. For last-position hits the builtin keeps winning past N = 100.

Union has no crossover in this range. unionValue beats unionWith by 15× CPU at S1 and 43× CPU / 677× memory at S100, and the gap grows with size.

The small-value regression Philip reported does not reproduce. Under the Plutus Tx plugin the builtin path compiles to roughly one unValueData + lookupCoin pair, which wins at small sizes. The Aiken reports probably reflect different compiler output. A minimal Aiken reproducer we can diff against Plinth would settle it.

Important framing: `Value` is a list, not a map

PlutusLedgerApi.V1.Data.Value.Value is a newtype over PlutusTx.Data.AssocMap.Map, and Map is itself a newtype over BuiltinList (BuiltinPair BuiltinData BuiltinData):

newtype Map k a = Map (BuiltinList (BuiltinPair BuiltinData BuiltinData))

No balancing, no hashing, no ordering invariant. The library's own docs are blunt about it (PlutusTx/Data/AssocMap.hs:97-99): "If the Map is not well-defined, the result is the value associated with the left-most occurrence of the key in the list. This operation is O(n)." And Value's flattenValue note (PlutusLedgerApi/V1/Data/Value.hs:426): "the result isn't sorted".

So when the write-up says "position 0", it literally means "head of the underlying BuiltinList". listsToValue (from the testlib) is a straight map over the Haskell list you hand it; the order you wrote is the order you get.

Cost breakdown for a lookup:

Position-0 hit: one outer cons unwrap + one equalsData match, then one inner cons unwrap + one equalsData match. Constant in the size of the rest of the value.
Position-k hit: k+1 outer cons unwraps + k+1 equalsData comparisons (the last one matches), plus one inner walk of up to m entries. Scales with k.
Miss: full outer walk (N+1 cons unwraps, N+1 equalsData comparisons, all failing), then a default-branch return without touching any inner map. Scales with the outer length only.

This is why the non-builtin CPU in the table below is shape-independent for position 0 and shape-dependent for everything else.

Method

Both paths take the same BuiltinData-encoded Value as input (the representation a validator receives from the ledger). The builtin path unwraps via unsafeDataAsValue (internally unValueData) and calls lookupCoin / unionValue. The non-builtin path unwraps via unsafeFromBuiltinData :: Value and calls PlutusLedgerApi.V1.Data.Value.valueOf / unionWith. Each scenario is a goldenBundle producing .pir, .uplc, and .eval goldens (CPU, memory, AST size, flat size).

Shapes:

Shape	Contents	Total tokens
S1	1 policy with 1 token (position 0 = ada in testlib convention)	1
S3	3 policies, each with 1 token	3
S8	8 policies, each with 1 token (crossover)	8
S100	11 policies: one with 1 token, ten with 10 tokens each	101

Lookup keys tested:

first — head of the outer list (position 0).
middle — roughly halfway into the outer list (position N/2).
last — the final entry of the outer list.
miss — a currency symbol that isn't present at all.

Results: lookup CPU

Ratio is non-builtin / builtin. Ratio > 1 means the builtin is cheaper; ratio < 1 means the non-builtin is cheaper.

Shape	Position	Builtin CPU	Non-builtin CPU	Ratio	Winner
S1	first	895 629	3 387 176	3.78×	builtin
S1	miss	895 629	1 875 266	2.09×	builtin
S3	first	1 672 681	3 387 176	2.03×	builtin
S3	middle	1 672 681	5 145 393	3.08×	builtin
S3	last	1 672 681	6 581 773	3.93×	builtin
S3	miss	1 672 681	4 748 026	2.84×	builtin
S8	first	3 611 149	3 387 176	0.94×	parity
S8	middle	3 611 149	9 454 533	2.62×	builtin
S8	last	3 611 149	13 763 673	3.81×	builtin
S8	miss	3 611 149	11 929 926	3.30×	builtin
S100	first	22 108 153	3 387 176	0.15×	non-builtin by 6.53×
S100	middle	22 108 153	16 636 433	0.75×	non-builtin by 1.33×
S100	last	22 108 153	31 000 233	1.40×	builtin
S100	miss	22 108 153	16 239 066	0.73×	non-builtin by 1.36×

Mechanical reading

Two observations fall straight out of the numbers.

First, the builtin CPU within a shape does not depend on hit position. The whole cost sits in unsafeDataAsValue, which has to walk the entire Data to validate its shape and reconstruct a BuiltinValue. lookupCoin afterwards is a single CEK step. Builtin cost is therefore O(total value size), and identical for every position within a given shape.

Second, the non-builtin CPU depends on hit position, not on total size. A first-position hit costs 3 387 176 across every shape, because the outer AssocMap.lookup' exits after one cons. A last-position hit scales with outer-list length plus inner-list length. A miss costs a full outer walk.

The builtin CPU scales roughly linearly in the number of data nodes (policies + tokens): around 200K CPU per node at scale, with a ~500K fixed overhead. The non-builtin CPU tracks hit distance, so lookup_Sn_first is flat at 3 387 176 for every n while lookup_Sn_last grows.

The S8 crossover scenario

S8 (8 single-token policies, 8 tokens total) is the shape where the two paths come within 7% of each other at the first-position lookup: builtin 3 611 149 vs non-builtin 3 387 176. At every other position in S8 the builtin still wins by 2.6× to 3.8×, because those positions force the non-builtin to walk further into the outer list.

The crossover is position-specific. For a first-position hit it's near N=8. For a last-position hit it hasn't been reached at N=100. For a miss it sits somewhere between N=8 and N=100. For union it doesn't happen within any size I looked at.

Results: union-then-lookup

The conservation-of-value pattern: union two BuiltinData-encoded values and read the value of some key in the result. The non-builtin path allocates a fresh nested AssocMap; the builtin path calls unionValue and stops.

Shape	Builtin CPU	Non-builtin CPU	CPU ratio	Builtin Mem	Non-builtin Mem	Mem ratio
S1	1 876 591	28 679 559	15.3×	2 279	110 486	48×
S3	4 131 831	87 810 607	21.3×	2 539	302 003	119×
S8	9 766 539	313 714 512	32.1×	3 189	895 158	281×
S100	79 832 775	3 460 294 959	43.3×	11 319	7 669 071	677×

The S100 non-builtin union costs 3.46 billion CPU units. That's most of a V3 max-budget block spent on a single conservation check. The builtin path stays at 80 M. Memory is 7.67 M vs 11 K.

unionValue only does the work needed to produce the result BuiltinValue. unionWith (+) walks both outer lists, unions the matching inner lists pointwise through the These algebra, and rebuilds the entire nested structure. Nothing about the size regime makes the non-builtin path cheaper, and the gap grows roughly linearly in N.

Interpretation

Ziyang's hypothesis

From the Slack thread:

[The regression] is the conversion cost of unValueData. For small values, the non-builtin path wins because valueOf pattern-matches a few levels into the Data and stops.

The small-value half of this does not reproduce. At S1 the builtin is 3.78× faster, not slower. The plugin emits essentially a single-step builtin invocation.

The large-value half is real, though the mechanism is the opposite of what "the conversion dominates at small sizes" would suggest. unsafeDataAsValue's cost grows with total data size, while valueOf's cost is bounded by hit distance. So the crossover favours the non-builtin only once the full-data traversal has grown to match the short-circuit cost at a given position. For a first-position hit, that's around N=8.

Why Aiken users might still be right

If Aiken's compiler emits UPLC where the builtin path carries extra overhead (thunks, wrappers, non-inlined intermediates), the crossover shifts left and the builtins can look worse at small N. That's the likeliest explanation for Philip's reports. The Plutus Tx plugin doesn't produce that shape. A minimal Aiken reproducer we can compare UPLC-to-UPLC against Plinth would close this out.

Suggested guidance for V3 users (Plutus Tx plugin)

Lookup on small values, up to about 8 total tokens: builtin wins at every position.
Lookup on medium values, 8 < N < 100: builtin wins except for the first-position case, where non-builtin edges out past N ≈ 8. The gap is small; prefer builtin unless you've measured your specific shape.
Lookup on large values, N ≥ 100: non-builtin valueOf wins for first-position hits, middle-position hits on realistic shapes, and misses. Builtin still wins for last-position hits. If the lookup key is statically known and expected to be near the front (e.g. ada in a sorted Value), prefer non-builtin.
Union or any composition that produces a new value: builtin always. The gap widens with size.

V4 impact

Plutus V4 plans to add a Value constructor to Data, which would make unsafeDataAsValue a no-op. The builtin's per-lookup cost would drop back to a single lookupCoin call. The crossover would disappear entirely and the builtin would win at every size, every position. These goldens become the before-picture for that change.

github-actions · 2026-04-23T14:00:51Z

Execution Budget Golden Diff

7918873 (master) vs 61b9557

output

This comment will get updated when changes are made.

Adds hand-rolled counterparts that operate directly on raw BuiltinData, bypassing valueOf's newtype/Maybe wrappers and unionWith's These algebra. Hand-rolled union additionally skips the zero-filter, exploiting the ledger invariant that tx-output Values have strictly positive quantities. 18 new bundles across the existing shape matrix (S1, S3, S8, S100): 14 lookup + 4 union-then-lookup, paired with the existing builtin and non-builtin bundles. For IntersectMBO/plutus-private#2177.

Unisay · 2026-04-24T10:05:01Z

Follow-up: hand-rolled variants added

Per the Slack discussion, I added two more paths to the comparison matrix:

Hand-rolled lookup operates directly on raw BuiltinData via unsafeDataAsMap / unsafeDataAsB / unsafeDataAsI / equalsByteString. Bypasses the CurrencySymbol and TokenName newtype wrappers, the Maybe wrapping inside AssocMap.lookup, and the withCurrencySymbol continuation that valueOf chains together.
Hand-rolled union is a naive O(|m1|·|m2|) double-pass that exploits the positive-quantities invariant to skip the zero-filter. It materialises a fresh BuiltinData for the result and then feeds that into the hand-rolled lookup for the final Integer.

18 new bundles on the existing (S1, S3, S8, S100) × (first, middle, last, miss) matrix. Commit 022384c.

Lookup CPU

"Handrolled" column added. All numbers in 1 000 CPU units.

Shape	Position	Builtin	Non-builtin	Hand-rolled
S1	first	896	3 387	1 353
S1	miss	896	1 875	885
S3	first	1 673	3 387	1 353
S3	middle	1 673	5 145	1 750
S3	last	1 673	6 582	2 152
S3	miss	1 673	4 748	1 689
S8	first	3 611	3 387	1 353
S8	middle	3 611	9 455	2 956
S8	last	3 611	13 764	4 161
S8	miss	3 611	11 930	3 698
S100	first	22 108	3 387	1 353
S100	middle	22 108	16 636	4 965
S100	last	22 108	31 000	8 983
S100	miss	22 108	16 239	4 903

Bold marks the winner for that row.

A few things to call out:

Hand-rolled beats valueOf at every position and every shape. The non-builtin overhead Ziyang flagged in the thread is real and measurable: Maybe wrapping, withCurrencySymbol's continuation call, and the newtype deriving for ToData/UnsafeFromData on CurrencySymbol/TokenName. Stripping all of that gives roughly 2-3× on small shapes and larger gains at S100.
Hand-rolled first-position lookup is essentially a constant 1 353 K CPU across every shape. It short-circuits on the first outer cons and the first inner cons, so nothing downstream is ever touched. Position-bounded, not size-bounded.
Hand-rolled beats the builtin starting at S3 first-position, and at every position in S8 and S100 except S3 last and S8 last. The builtin's unsafeDataAsValue walks the entire data to validate shape and reconstruct a BuiltinValue, so once the value is big enough that "walk everything" costs more than "walk to the hit position", hand-rolled takes over.
Answers Ziyang's question from the thread: valueOf does not compile to optimal UPLC. Hand-rolled has a real advantage on the non-builtin side. Whether that advantage is worth maintaining a separate library of BuiltinData-direct helpers is a product-side judgement, not a Plutus Core one.

Union CPU (and memory)

Shape	Builtin CPU	Non-builtin CPU	Hand-rolled CPU	Builtin Mem	Non-builtin Mem	Hand-rolled Mem
S1	1.88 M	28.68 M	16.52 M	2.3 K	110 K	59.8 K
S3	4.13 M	87.81 M	76.60 M	2.5 K	302 K	204.6 K
S8	9.77 M	313.71 M	613.29 M	3.2 K	895 K	1 168.8 K
S100	79.83 M	3 460.29 M	11 352.08 M	11.3 K	7 669 K	18 777.5 K

Two observations:

Hand-rolled beats unionWith at S1 (1.7×) and marginally at S3 (1.15×), then loses at S8 and S100. The zero-filter and These savings matter at tiny sizes but the materialisation cost dominates as shapes grow.
Hand-rolled union never beats the builtin. At S100 the gap is 142×. This matches what Ziyang and Philip both expected from the cost-model side of the thread.

Two caveats on the hand-rolled union:

The algorithm is naive. filterMissingOuter does a full containsKey walk per entry, so the filter pass is O(|m1|·|m2|) on its own. A smarter implementation that threads a "consumed" flag or uses a sort-merge could lower this, but a sort-first step isn't free either.
The result is materialised as a fresh BuiltinData. The builtin keeps the intermediate in BuiltinValue (CEK-heap) form, which is why its absolute numbers stay so low. Any hand-rolled path that produces BuiltinData output pays that serialisation tax.

If Philip's djed library implements a smarter union with invariant tracking, I'll measure that too and update these numbers. Until then, the story for union is clean: builtin wins at every size, full stop.

Still open from the thread

Standalone unValueData overhead per shape (Ziyang's ask from the thread). Will add as a separate commit.
Waiting on Philip's djed-library share and chain stats on real-world Value sizes.

Isolates the conversion tax from any downstream operation across the four shapes (S1, S3, S8, S100). Enables decomposing the builtin-path cost into `unsafeDataAsValue` + `lookupCoin`. 4 new bundles. Responds to Ziyang's request in the thread. For IntersectMBO/plutus-private#2177.

Unisay · 2026-04-24T12:43:36Z

Follow-up: standalone `unsafeDataAsValue` per shape

Answering Ziyang's ask from the Slack thread. Added a compiled function that evaluates only unsafeDataAsValue bd (returns the BuiltinValue, no downstream op) and measured it across all four shapes. Commit 61b9557.

Decomposition of the builtin path

Shape	`unsafeDataAsValue` alone	`lookup_*_ada_builtin`	Delta (`lookupCoin` alone)	`lookupCoin` share
S1	576 790	895 629	318 839	35.6%
S3	1 344 398	1 672 681	328 283	19.6%
S8	3 263 978	3 611 149	347 171	9.6%
S100	21 732 650	22 108 153	375 503	1.7%

A few things worth noting:

unsafeDataAsValue scales linearly with value size. From S1 to S100 the value contains ~100× more tokens; the cost grows ~38×. The slope is roughly 200 K CPU per additional policy or token entry in the data structure, which matches what the lookup-path numbers already implied.
lookupCoin on the resulting BuiltinValue is essentially constant at 320–375 K CPU per call, with a tiny upward drift as the value grows (probably a field-access or list-head overhead on the materialised BuiltinValue). It's noise relative to the conversion cost.
At S1 the conversion tax is already 64% of the lookup_ada cost. At S100 it's 98%. So the builtin path is essentially unsafeDataAsValue + a constant.
V4's plan to make unsafeDataAsValue a no-op would, on these numbers, reduce lookup_S100_ada_builtin from 22.1 M to ~0.4 M — a 55× speedup on that particular shape. Broadly: every builtin-path CPU number in the matrix becomes lookupCoin or unionValue alone once unsafeDataAsValue is free.

Memory

Memory for standalone unsafeDataAsValue stays very small (756 at S1, 3 176 at S100). Almost all of the memory in lookup_*_ada_builtin was also from unsafeDataAsValue (1 257 and 3 677 respectively), so lookupCoin contributes ~500 memory units on top. Again: noise relative to unsafeDataAsValue.

Open from the thread

Waiting on Philip's djed library share for a smarter hand-rolled union, and on his chain-stats for typical Value sizes.

Unisay added the Do not merge label Apr 23, 2026

Unisay self-assigned this Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Spec.Data.Value.Budget benchmark module#7738

Add Spec.Data.Value.Budget benchmark module#7738
Unisay wants to merge 3 commits intomasterfrom
yura/issue-2177-value-builtin-budget-spec

Unisay commented Apr 23, 2026 •

edited

Loading

Uh oh!

Unisay commented Apr 23, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

Unisay commented Apr 24, 2026

Uh oh!

Unisay commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Unisay commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Findings

Uh oh!

Unisay commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Value builtins vs. pure Plutus Tx — findings (issue #2177)

TL;DR

Important framing: Value is a list, not a map

Method

Results: lookup CPU

Mechanical reading

The S8 crossover scenario

Results: union-then-lookup

Interpretation

Ziyang's hypothesis

Why Aiken users might still be right

Suggested guidance for V3 users (Plutus Tx plugin)

V4 impact

Uh oh!

github-actions Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Execution Budget Golden Diff

Uh oh!

Unisay commented Apr 24, 2026

Follow-up: hand-rolled variants added

Lookup CPU

Union CPU (and memory)

Still open from the thread

Uh oh!

Unisay commented Apr 24, 2026

Follow-up: standalone unsafeDataAsValue per shape

Decomposition of the builtin path

Memory

Open from the thread

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Unisay commented Apr 23, 2026 •

edited

Loading

Unisay commented Apr 23, 2026 •

edited

Loading

Important framing: `Value` is a list, not a map

github-actions Bot commented Apr 23, 2026 •

edited

Loading

Follow-up: standalone `unsafeDataAsValue` per shape