Skip to content

feat: unify name demangling with single Lean implementation#12539

Merged
hargoniX merged 7 commits intomasterfrom
feat/lean-name-demangling
Mar 6, 2026
Merged

feat: unify name demangling with single Lean implementation#12539
hargoniX merged 7 commits intomasterfrom
feat/lean-name-demangling

Conversation

@kim-em
Copy link
Collaborator

@kim-em kim-em commented Feb 18, 2026

This PR replaces three independent name demangling implementations (Lean, C++, Python) with a single source of truth in Lean.Compiler.NameDemangling. The new module handles the full pipeline: prefix parsing (l_, lp_, _init_, initialize_, lean_apply_N, _lean_main), postprocessing (suffix flags, private name stripping, hygienic suffix stripping, specialization contexts), backtrace line parsing, and C exports via @[export].

The C++ runtime backtrace handler now calls the Lean-exported functions instead of its own 792-line reimplementation. This is safe because print_backtrace is only called from lean_panic_impl (soft panics), not lean_internal_panic.

The Python profiler demangler (script/profiler/lean_demangle.py) is replaced with a thin subprocess wrapper around a Lean CLI tool, preserving the demangle_lean_name API so downstream scripts work unchanged.

New files:

  • src/Lean/Compiler/NameDemangling.lean — single source of truth (483 lines)
  • tests/lean/run/demangling.lean — comprehensive tests (281 lines)
  • script/profiler/lean_demangle_cli.leanc++filt-style CLI tool

Deleted files:

  • src/runtime/demangle.cpp (792 lines)
  • src/runtime/demangle.h (26 lines)
  • script/profiler/test_demangle.py (670 lines)

Net: −1,381 lines of duplicated C++/Python code.

🤖 Prepared with Claude Code

@kim-em kim-em requested a review from leodemoura as a code owner February 18, 2026 00:17
@kim-em kim-em added the changelog-compiler Compiler, runtime, and FFI label Feb 18, 2026
@kim-em kim-em requested a review from hargoniX as a code owner February 18, 2026 00:17
@kim-em kim-em added the changelog-compiler Compiler, runtime, and FFI label Feb 18, 2026
@kim-em kim-em force-pushed the feat/lean-name-demangling branch 3 times, most recently from 07dc564 to d5e99dc Compare February 18, 2026 00:32
@github-actions github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Feb 18, 2026
@mathlib-lean-pr-testing
Copy link

mathlib-lean-pr-testing bot commented Feb 18, 2026

Mathlib CI status (docs):

  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase e9cc84b7c99f64220c952dee026ac0f06a031fc4 --onto 91bd6e19a7b7aca769f9720e1127c768df0cd2a7. You can force Mathlib CI using the force-mathlib-ci label. (2026-02-18 01:23:08)
  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase e9cc84b7c99f64220c952dee026ac0f06a031fc4 --onto 309f44d00757482e1c5a2b2427fdca818a24615a. You can force Mathlib CI using the force-mathlib-ci label. (2026-02-20 06:48:15)
  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase e9cc84b7c99f64220c952dee026ac0f06a031fc4 --onto 333ab1c6f0ce11f33551d6a736054cb72812cad9. You can force Mathlib CI using the force-mathlib-ci label. (2026-03-06 06:53:08)
  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase a165292462a973c20d3cc8c8b23a3ac2334a2a4a --onto 333ab1c6f0ce11f33551d6a736054cb72812cad9. You can force Mathlib CI using the force-mathlib-ci label. (2026-03-06 10:02:57)

@leanprover-bot
Copy link
Collaborator

leanprover-bot commented Feb 18, 2026

Reference manual CI status:

  • ❗ Reference manual CI will not be attempted unless your PR branches off the nightly-with-manual branch. Try git rebase e9cc84b7c99f64220c952dee026ac0f06a031fc4 --onto 5c7a508e21c5ebc710b0dfe65648d5beba4e34e0. You can force reference manual CI using the force-manual-ci label. (2026-02-18 01:23:10)
  • ❗ Reference manual CI will not be attempted unless your PR branches off the nightly-with-manual branch. Try git rebase e9cc84b7c99f64220c952dee026ac0f06a031fc4 --onto cda84702e9b31165f1f83c657b532f36f34e0bd0. You can force reference manual CI using the force-manual-ci label. (2026-03-06 06:53:10)
  • ❗ Reference manual CI will not be attempted unless your PR branches off the nightly-with-manual branch. Try git rebase a165292462a973c20d3cc8c8b23a3ac2334a2a4a --onto cda84702e9b31165f1f83c657b532f36f34e0bd0. You can force reference manual CI using the force-manual-ci label. (2026-03-06 10:02:58)

Copy link
Contributor

@hargoniX hargoniX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the approach in NameDemandling.lean looks generally sensible. It would be great if we could use the new String API to as opposed to the legacy one.

For the linking details in object.cpp I would like to defer to @Kha

@kim-em kim-em force-pushed the feat/lean-name-demangling branch 2 times, most recently from 5aca15a to 8e51b9b Compare February 20, 2026 09:54
Copy link
Contributor

@Rob23oba Rob23oba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the review comments, I think there should be an option to disable post-processing, since e.g. myFn (λ) can be very ambiguous when a function has many lambdas

@kim-em
Copy link
Collaborator Author

kim-em commented Mar 4, 2026

Apart from the review comments, I think there should be an option to disable post-processing, since e.g. myFn (λ) can be very ambiguous when a function has many lambdas

@Rob23oba thanks for your very helpful comments.

Can you point to where you would want to call a version of this with disabled post-processing? I'd like to understand the use case so I can know exactly what should be disabled.

@Rob23oba
Copy link
Contributor

Rob23oba commented Mar 4, 2026

Alright, here's a script that should demonstrate the ambiguity involved:

open Lean Name Demangle
run_meta
  let ext := Compiler.LCNF.impureSigExt
  let env ← getEnv
  let mut map : Std.HashMap String Nat := {}
  for idx in *...env.header.modules.size do
    for decl in ext.getModuleEntries env idx do
      map := map.alter (postprocessNameParts (nameToNameParts decl.name)) fun
        | none => some 1
        | some a => some (a + 1)
  let arr := map.toArray.qsort fun (a, b) (c, d) => b > d ∨ (b = d ∧ a < c)
  let arr := arr.take 100
  Lean.logInfo m!"Result: {arr}"

The top results are

Result: [(Lean.initFn, 276),
 (Lean.initFn [private], 218),
 (Lean.initFn._closed_0, 134),
 (Lean.initFn._closed_1, 126),
 (Lean.initFn._closed_2, 122),
 (Lean.initFn._closed_3, 115),
 (Lean.Server.registerLspRequestHandler spec at Lean.Server.FileWorker.initFn, 111),
 (Lean.initFn [λ], 110),
 (Lean.initFn._closed_4, 109),
 (Lean.initFn._closed_0 [private], 104),
 (Lean.initFn._closed_1 [private], 103),
 (Lean.initFn._closed_2 [private], 102),
 (Lean.initFn._closed_3 [private], 101),
 (Lean.initFn._closed_4 [private], 101),
 (Lean.Meta.initFn, 96),
 (Lean.Meta.initFn [private], 92),
 (Lean.initFn._closed_5 [private], 88),
 (Lean.Compiler.LCNF.initFn [private], 78),
 (Lean.initFn [λ, private], 71),

So if Lean.initFn popped up in the backtrace, there are 276 candidate functions that could have been the cause. This might also be a good test for the postprocessor though because _closed_<n> seems like it should've been converted to [closed] and some of these Lean.initFns have _boxed after the macro scopes. Anyways, even in other cases, it might be good to trace back an error not just to a single function but to a single IR declaration, e.g. when the function is large and you want to narrow down where the error was or when the compiler might have been the problem (i.e. you want to investigate miscompilations).

@kim-em
Copy link
Collaborator Author

kim-em commented Mar 6, 2026

Alright, here's a script that should demonstrate the ambiguity involved:

open Lean Name Demangle
run_meta
  let ext := Compiler.LCNF.impureSigExt
  let env ← getEnv
  let mut map : Std.HashMap String Nat := {}
  for idx in *...env.header.modules.size do
    for decl in ext.getModuleEntries env idx do
      map := map.alter (postprocessNameParts (nameToNameParts decl.name)) fun
        | none => some 1
        | some a => some (a + 1)
  let arr := map.toArray.qsort fun (a, b) (c, d) => b > d ∨ (b = d ∧ a < c)
  let arr := arr.take 100
  Lean.logInfo m!"Result: {arr}"

The top results are

Result: [(Lean.initFn, 276),
 (Lean.initFn [private], 218),
 (Lean.initFn._closed_0, 134),
 (Lean.initFn._closed_1, 126),
 (Lean.initFn._closed_2, 122),
 (Lean.initFn._closed_3, 115),
 (Lean.Server.registerLspRequestHandler spec at Lean.Server.FileWorker.initFn, 111),
 (Lean.initFn [λ], 110),
 (Lean.initFn._closed_4, 109),
 (Lean.initFn._closed_0 [private], 104),
 (Lean.initFn._closed_1 [private], 103),
 (Lean.initFn._closed_2 [private], 102),
 (Lean.initFn._closed_3 [private], 101),
 (Lean.initFn._closed_4 [private], 101),
 (Lean.Meta.initFn, 96),
 (Lean.Meta.initFn [private], 92),
 (Lean.initFn._closed_5 [private], 88),
 (Lean.Compiler.LCNF.initFn [private], 78),
 (Lean.initFn [λ, private], 71),

So if Lean.initFn popped up in the backtrace, there are 276 candidate functions that could have been the cause. This might also be a good test for the postprocessor though because _closed_<n> seems like it should've been converted to [closed] and some of these Lean.initFns have _boxed after the macro scopes. Anyways, even in other cases, it might be good to trace back an error not just to a single function but to a single IR declaration, e.g. when the function is large and you want to narrow down where the error was or when the compiler might have been the problem (i.e. you want to investigate miscompilations).

Thanks for the explanation here @Rob23oba.

I think with LEAN_BACKTRACE_RAW available to disable postprocessing in stack traces, I might just proceed for now. But I think this would be good to follow up on, and I'd be happy if you wanted to raise an issue, or suggest a particular place to hook in disabling post-processing.

@kim-em kim-em enabled auto-merge March 6, 2026 06:11
@hargoniX hargoniX disabled auto-merge March 6, 2026 08:37
kim-em and others added 7 commits March 6, 2026 08:38
This PR replaces three independent name demangling implementations (Lean,
C++, Python) with a single source of truth in Lean. The new
`Lean.Compiler.NameDemangling` module handles the full pipeline: prefix
parsing, postprocessing (suffix flags, private names, hygienic stripping,
specialization contexts), backtrace line parsing, and C exports. The C++
runtime backtrace handler now calls `@[export]`ed Lean functions instead
of its own reimplementation. The Python profiler demangler is replaced
with a thin subprocess wrapper around a Lean CLI tool.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n link dependency

The `@[export lean_demangle_bt_line_cstr]` function is defined in
`Lean.Compiler.NameDemangling` (in `libLean`) but called from `libleanrt`
(`print_backtrace`). On Linux these are in separate linker groups, so
downstream executables fail to link with undefined reference.

Use a weak symbol with a no-op stub so `leanrt` doesn't require `libLean`
at link time. When the Lean demangler is linked in (as it always is for
Lean executables), the real implementation overrides the stub.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This PR fixes issues found during code review:
- add re-entrancy guard in C runtime panic handler to prevent recursive
  panic when demangling itself panics
- expand test coverage from 44 to 118 test cases
- fix bug in findLpSplit where _private was absorbed into package name
- replace hardcoded byte offsets with dropPrefix?
- remove unnecessary partial annotations using bounded for loops
- move public import from Lean.lean to Lean/Compiler.lean
- remove AI-slop stylistic artifacts (section dividers, verbose comments)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This PR addresses review feedback from hargoniX and Kha:
- add missing lean_dec(s) in the weak symbol stub (object.cpp)
- add Char.isHexDigit to Init and use it instead of custom isHexChar
- reuse Lean.NamePart instead of defining a local Component type
- remove _lambda_N, _jp_N, _closed_N numbered suffix handling (compiler
  only produces _lam_N and _elam_N)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…mangling

This PR removes the custom raw-position string helpers (rawGet, rawNext,
rawAtEnd, rawExtract, rawEnd) and rewrites all string traversal to use
the modern Lean 4 String API: s.positions iterator, s.Pos with proofs,
s.find? with char/predicate/substring patterns, s.extract, and a
skipWhile helper with termination proof. Net reduction of 52 lines.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The g_in_demangle guard was preventing recursive demangling if the
demangler itself panicked. The demangler is pure string manipulation
that should not panic, and if it does we'd rather discover the bug
than silently fall back to raw output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This PR addresses three issues raised in review:

1. Use `Name.toString` for correct escaping of demangled names (#9):
   replace manual `String.intercalate "."` formatting with
   `Name.toString`, which handles `«»` escaping for names containing
   special characters. Add `namePartsToName` helper and guard
   `formatNameParts` against empty input.

2. Use `Name.demangle?` for robust `lp_` package splitting (#8):
   replace `findLpSplit`/`hasUpperStart`/`unmanglePkg` with a simpler
   `demangleWithPkg` that tries each underscore as a split point and
   validates both package and body via `Name.demangle?` round-trip.

3. Fix suffix flags lost after hygienic sections: move suffix flag
   collection (`_boxed`, `_lam`, etc.) before `_@` hygienic suffix
   stripping, so flags appearing after hygienic sections are recognized.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@hargoniX
Copy link
Contributor

hargoniX commented Mar 6, 2026

Needs a rebase...

@hargoniX hargoniX force-pushed the feat/lean-name-demangling branch from d684699 to 25e8e04 Compare March 6, 2026 08:40
@hargoniX hargoniX enabled auto-merge March 6, 2026 08:40
@hargoniX hargoniX added this pull request to the merge queue Mar 6, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Mar 6, 2026
@hargoniX hargoniX added this pull request to the merge queue Mar 6, 2026
Merged via the queue into master with commit 5f3ca3a Mar 6, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog-compiler Compiler, runtime, and FFI toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants