perf: fuse fvar substitution into instantiateMVars#12233
perf: fuse fvar substitution into instantiateMVars#12233
Conversation
|
!radar |
|
Benchmark results for c83f527 against 9e18eea are in! @nomeata
Large changes (1✅, 2🟥)
Small changes (7🟥)
|
|
Reference manual CI status:
|
|
Mathlib CI status (docs):
|
…date This adds an experimental `instantiateMVarsNoUpdate2` variant (inline in the benchmark) that keeps its expression cache across binder depths using bvar-blind keying, rather than clearing the cache at each depth. Bench2 shows it avoids the O(depth * bodySize) scaling of the old approach when the same closed sub-expression appears at many depths (crossover ~depth=50). However, the ~6ms constant per-node overhead makes it 3-5x slower on the bench1 workload (the realistic delayed-assignment case), where both variants are already sub-millisecond. Not worth adopting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This PR adds a C++ implementation of `instantiateMVarsNoUpdate`, following the pattern of the existing `instantiate_mvars.cpp`. Key features: - FVar substitution via hash map instead of `replace_fvars`, avoiding a separate traversal pass for delayed assignments - Depth tracking with lazy lifting of substitution values - Two-level cache: a depth-dependent cache (cleared under binders) and a global cache for fvar-free expressions that persists across binder depths - When the fvar substitution is empty, updates mvar assignments with their normalized values so subsequent `instantiateMVars` calls are free Benchmark (delayed-assignment workload): - C++ NoUpdate is ~2x faster than the Lean NoUpdate implementation - C++ NoUpdate is 10-26x faster than standard instantiateMVars - The global cache eliminates O(depth) scaling for fvar-free expressions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
267f6b4 to
860b8fd
Compare
This PR makes instantiateMVarsNoUpdate the default implementation behind lean_instantiate_expr_mvars. The NoUpdate variant carries a free variable substitution during traversal instead of doing repeated replace_fvars, avoiding quadratic behavior in terms with many delayed-assigned metavariables. Key fixes for drop-in compatibility: - Use name_hash_map for fvar substitution (structural name equality, not pointer comparison) - Always call apply_beta with preserve_data=false for regular mvar assignments to strip mdata wrappers like _recApp - Clear depth-dependent cache when extending fvar substitution in visit_delayed - Handle unassigned mvars with active fvar substitution by creating delayed assignments via elimMVarDeps - Match standard instantiateMVars guards for delayed assignment resolution Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
!radar |
|
Benchmark results for c5d6267 against 9e18eea are in! @nomeata Warning These warnings may indicate that the benchmark results are not directly comparable, for example due to changes in the runner configuration or hardware.
Large changes (7🟥)
Medium changes (19🟥)
Small changes (1✅, 212🟥) Too many entries to display here. View the full report on radar instead. |
|
!radar |
|
Benchmark results for ee068d5 against 87ec768 are in! @nomeata
No significant changes detected. |
…lean4 into joachim/instantiateMVarsNoUpdate
ee068d5 to
ae9230f
Compare
|
!radar |
|
Benchmark results for ae9230f against 87ec768 are in! @nomeata
Large changes (8🟥)
Medium changes (21🟥) Too many entries to display here. View the full report on radar instead. Small changes (1✅, 207🟥) Too many entries to display here. View the full report on radar instead. |
|
Mathlib CI status (docs):
|
Pass 2 of instantiateAllMVars does not handle level metavariables (pass 1 already resolves them), so has_mvar checks are unnecessarily broad. Replace with has_expr_mvar throughout pass 2. This makes the global cache redundant: expressions with no fvars and no expr mvars are returned unchanged by the early-return check, so there is nothing to cache. Remove m_global_cache and simplify the visit logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
!bench |
|
Benchmark results for e2bbd74 against cda8470 are in! @nomeata Warning These warnings may indicate that the benchmark results are not directly comparable, for example due to changes in the runner configuration or hardware.
Large changes (3✅)
Medium changes (3✅, 3🟥)
Small changes (64✅, 39🟥) Too many entries to display here. View the full report on radar instead. |
…andling In pass 2, direct mvar assignments have already been resolved by pass 1. Assert this invariant in visit_app and for bare mvars instead of handling the impossible case. Inline the pending value lookup in visit_delayed instead of going through visit(mk_mvar(mid_pending)). Remove the now unnecessary visit_mvar and visit_args_and_beta methods from pass 2. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This moves the resolvability checking from a separate pre-computation phase into pass 2 itself, where it is performed on-demand when each delayed mvar is first encountered. Caches are kept across invocations for efficiency. This eliminates the `resolvability_checker` class and the `head_to_pending` map from pass 1. Pass 1 now only tracks a `m_has_delayed` bool, and only sets it when a delayed mvar's pending value is actually assigned (unassigned pending values will clearly fail the resolvability check, so pass 2 can be skipped entirely in that case). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
!radar |
|
Benchmark results for a61d5b6 against cda8470 are in! @nomeata Warning These warnings may indicate that the benchmark results are not directly comparable, for example due to changes in the runner configuration or hardware.
Large changes (4✅)
Medium changes (2✅, 3🟥)
Small changes (72✅, 28🟥) Too many entries to display here. View the full report on radar instead. |
This exports `DelayedMetavarAssignment.fvars` and `DelayedMetavarAssignment.mvarIdPending` as C functions and uses them in both `instantiate_mvars.cpp` and `instantiate_mvars_all.cpp`, replacing raw `cnstr_get` access. Also simplifies `get_assignment` in pass 2 by removing redundant `has_expr_mvar`/`has_fvar` checks that `visit` already performs. Pass 1 now only sets `m_has_delayed` when the pending mvar is actually assigned. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Inside a resolvable subtree, all nested delayed mvars are also resolvable, so the non-resolvable branch is only reachable in outer mode (empty fvar substitution). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Mathlib CI status (docs):
|
This commit introduces consistent terminology throughout instantiate_mvars_all.cpp: direct MVar, pending MVar, assigned MVar, MVar DAG, resolvable MVar, updateable MVar, outer/inner mode, and the updateable-MVar cut. Renames fvar_subst_empty() to in_outer_mode() and m_has_delayed to m_has_updateable_delayed for alignment. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Mathlib CI status (docs):
|
This commit replaces the scope_cache comment with a clearer description of the conceptual model (stack of hashmaps) and how the implementation inverts this for efficiency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Mathlib CI status (docs):
|
This PR replaces the default
instantiateMVarsimplementation with a two-pass variant that fuses fvar substitution into the traversal, avoiding separatereplace_fvarscalls for delayed-assigned MVars and preserving sharing.Terminology (used in this PR and in the source)
DelayedMetavarAssignment.In the MVar DAG, the updateable delayed-assigned MVars form a cut (the updateable-MVar cut) with only assigned MVars behind it and no resolvable delayed-assigned MVars before it.
Two-pass architecture
Pass 1 (
instantiate_direct_fn): Traverses all MVars and expressions reachable from the initial expression and instantiates all updateable direct MVars (updating their assignment with the result), instantiates all level MVars, and determines if there are any updateable delayed-assigned MVars.Pass 2 (
instantiate_delayed_fn): Only run if pass 1 found updateable delayed-assigned MVars. Has an outer and an inner mode, depending on whether it has crossed the updateable-MVar cut.In outer mode (empty fvar substitution), all MVars are either unassigned direct MVars (left alone), non-updateable delayed-assigned MVars (pending MVar traversed in outer mode and updated with the result), or updateable delayed-assigned MVars. When a delayed-assigned MVar is encountered, its MVar DAG is explored (via
is_resolvable_pending) to determine if it is resolvable (and thus updateable). Results are cached across invocations.If it is updateable, the substitution is initialized from its arguments and traversal continues with the value of its pending MVar in inner mode. In inner mode (non-empty substitution), all encountered delayed-assigned MVars are, by construction, resolvable but not updateable. The substitution is carried along and extended as we cross such MVars. Pending MVars of these delayed-assigned MVars are NOT updated with the result (as the result is valid only for this substitution, not in general).
Applying the substitution in one go, rather than instantiating each delayed-assigned MVar on its own from inside out, avoids the quadratic overhead of that approach when there are long chains of delayed-assigned MVars.
Write-back behavior: Pass 2 writes back the normalized pending MVar values of delayed-assigned MVars above the updateable-MVar cut (the non-resolvable ones whose children may have been resolved). This is exactly the right set: these MVars are visited in outer mode, so their normalized values are suitable for storing in the mctx. MVars below the cut are visited in inner mode, so their intermediate values cannot be written back.
Pass 2 scope-tracked caching
A
scope_cachedata structure ensures that sharing is preserved even across different delayed-assigned MVars (and hence with different substitutions), when possible. Eachvisit_delayedcall pushes a new scope with fresh fvar bindings. The cache correctly handles cross-scope reuse, fvar shadowing, and late-binding via generation counters and scope-level tracking.Behavioral differences from original
instantiateMVarsThe implementation matches the original single-pass
instantiateMVarsbehavior with one cosmetic difference: the new implementation substitutes fvars inline during traversal rather than constructing intermediate beta-redexes, producing more beta-reduced terms in some edge cases. This changes the pretty-printed output for two elab tests (1179b,depElim1) but all terms remain definitionally equal.Files
src/kernel/instantiate_mvars_all.cpp— C++ implementation of both passessrc/kernel/scope_cache.h— scope-aware cache data structuresrc/kernel/instantiate_mvars.cpp— redirectslean_instantiate_expr_mvarsto the new variantsrc/Lean/MetavarContext.lean— exported accessors forDelayedMetavarAssignmentfieldstests/lean/run/instantiateAllMVarsSharing.lean— sharing preservation testtests/lean/run/instantiateAllMVarsShadow.lean— fvar shadowing and late-binding correctness teststests/elab/1179b.lean.out.expected,tests/elab/depElim1.lean.out.expected— updated expected output