[WIP] feat(size opt): Go linktime like DCE#1618
[WIP] feat(size opt): Go linktime like DCE#1618luoliwoshang wants to merge 172 commits intoxgo-dev:mainfrom
Conversation
Summary of ChangesHello @luoliwoshang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a refined approach to dead code elimination by integrating Go-specific semantic information directly into the IR graph construction. By capturing implicit dependencies like interface conversions and reflection usage at the SSA lowering stage, the system can build a more accurate and comprehensive dependency graph. This enables a more effective DCE pass that can prune unused methods and metadata, leading to smaller binaries while maintaining Go's runtime semantics. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This is a significant refactoring that introduces a Go-style dead code elimination (DCE) pass. The changes are extensive, touching SSA lowering, IR graph construction, and the build process. The core idea is to build a global dependency graph from per-package bitcode and relocation metadata, perform a reachability analysis, and then apply a DCE pass to remove unreachable code and data. The implementation appears solid and well-tested. My main feedback is a suggestion to simplify a complex conditional block in the build process to improve readability and remove dead code.
|
|
||
| dceLLName := "" | ||
| if ctx.buildConf.DCE { | ||
| roots, err := dceEntryRoots(merged) |
| if ctx.buildConf.DCE { | ||
| if dceLLName == "" { | ||
| merged.Dispose() | ||
| return fmt.Errorf("missing DCE module .ll output") | ||
| } | ||
| // NOTE: after the DCE pass the merged module may contain mutated | ||
| // constants that occasionally trip LLVM type walkers. Skip the | ||
| // ABI metadata dump in DCE mode to avoid crashes during verbose runs. | ||
| if !ctx.buildConf.DCE { | ||
| dumpAbiTypeMetadata(merged, dceLLName, verbose) | ||
| } | ||
|
|
||
| combinedObj := strings.TrimSuffix(dceLLName, ".ll") + ".o" | ||
| args := []string{"-o", combinedObj, "-c", dceLLName, "-Wno-override-module"} | ||
| if verbose { | ||
| fmt.Fprintln(os.Stderr, "clang", args) | ||
| } | ||
| if err := ctx.compiler().Compile(args...); err != nil { | ||
| merged.Dispose() | ||
| return fmt.Errorf("failed to compile DCE ll: %v", err) | ||
| } | ||
| merged.Dispose() | ||
| objFiles = []string{combinedObj} | ||
| } else { | ||
| combinedBc, err := os.CreateTemp("", "llgo-link-*.bc") | ||
| if err != nil { | ||
| merged.Dispose() | ||
| return err | ||
| } | ||
| combinedBcName := combinedBc.Name() | ||
| if err := llvm.WriteBitcodeToFile(merged, combinedBc); err != nil { | ||
| combinedBc.Close() | ||
| merged.Dispose() | ||
| return fmt.Errorf("write combined bitcode: %w", err) | ||
| } | ||
| combinedBc.Close() | ||
| if verbose { | ||
| fmt.Fprintf(os.Stderr, "[bc-pass] merged bitcode written to %s\n", combinedBcName) | ||
| } | ||
|
|
||
| // Optional debug: dump abi.Type metadata from the merged module. | ||
| dumpAbiTypeMetadata(merged, combinedBcName, verbose) | ||
|
|
||
| // Compile the merged BC to a single object for final native link. | ||
| combinedObj := strings.TrimSuffix(combinedBcName, ".bc") + ".o" | ||
| args := []string{"-o", combinedObj, "-c", combinedBcName, "-Wno-override-module"} | ||
| if verbose { | ||
| fmt.Fprintln(os.Stderr, "clang", args) | ||
| } | ||
| if err := ctx.compiler().Compile(args...); err != nil { | ||
| merged.Dispose() | ||
| return fmt.Errorf("failed to compile combined bc: %v", err) | ||
| } | ||
|
|
||
| merged.Dispose() | ||
| objFiles = []string{combinedObj} | ||
| } |
| relocs := d.relocsByOwner[m.Type] | ||
| for i := 0; i < len(relocs); i++ { | ||
| r := relocs[i] | ||
| if r.Kind != irgraph.EdgeRelocMethodOff || int(r.Addend) != m.Index { | ||
| continue | ||
| } | ||
| d.mark(r.Target) | ||
| if i+1 < len(relocs) && relocs[i+1].Kind == irgraph.EdgeRelocMethodOff { | ||
| d.mark(relocs[i+1].Target) | ||
| } | ||
| if i+2 < len(relocs) && relocs[i+2].Kind == irgraph.EdgeRelocMethodOff { | ||
| d.mark(relocs[i+2].Target) | ||
| } | ||
| break | ||
| } |
There was a problem hiding this comment.
Performance: This linear scan searches all relocs for the owner type to find the matching method index. For types with many methods and relocs, this is O(methods × relocs_per_type). Consider building an index (map from (owner, methodIndex) to reloc positions) during initialization to make this O(1).
| fnTyp := r.FnType | ||
| if fnTyp == "" { | ||
| panic("useifacemethod missing fnType") | ||
| } |
There was a problem hiding this comment.
Code Quality: Using panic() for malformed input makes the code less robust. Since this processes internal compiler data, crashes here could be confusing to users. Consider returning an error instead:
| fnTyp := r.FnType | |
| if fnTyp == "" { | |
| panic("useifacemethod missing fnType") | |
| } | |
| return fmt.Errorf("useifacemethod missing fnType for target %s", r.Target) |
If you prefer to keep the panic for internal consistency checks, consider wrapping with a more descriptive message that helps debugging.
| var verbose bool | ||
|
|
||
| // SetVerbose enables verbose debugging output during Analyze. | ||
| func SetVerbose(v bool) { | ||
| verbose = v | ||
| } |
There was a problem hiding this comment.
Code Quality: Global mutable state with SetVerbose() is not thread-safe and makes the code harder to test in parallel. Consider passing the verbose flag through Analyze() via an options struct:
type AnalyzeOptions struct {
Verbose bool
}
func Analyze(g *irgraph.Graph, roots []irgraph.SymID, opts AnalyzeOptions) Result {This would also allow different callers to have different verbosity settings without interfering with each other.
| // clearUnreachableMethods zeros Mtyp/Ifn/Tfn for unreachable methods in type | ||
| // metadata constants. All method slots are cleared by default; the whitelist | ||
| // reachMethods marks which (type,index) to keep. |
There was a problem hiding this comment.
Documentation/Code Quality: The comment mentions "zeros Mtyp/Ifn/Tfn" but the code actually preserves mtypField (field 1) and only zeros fields at indices 2 and 3 (Ifn and Tfn). The comment should say "zeros Ifn/Tfn" to match the actual behavior:
| // clearUnreachableMethods zeros Mtyp/Ifn/Tfn for unreachable methods in type | |
| // metadata constants. All method slots are cleared by default; the whitelist | |
| // reachMethods marks which (type,index) to keep. | |
| // clearUnreachableMethods zeros Ifn/Tfn for unreachable methods in type | |
| // metadata constants. All method slots are cleared by default; the whitelist | |
| // reachMethods marks which (type,index) to keep. |
| // Stats reports basic DCE pass metrics. | ||
| type Stats struct { | ||
| Reachable int | ||
| DroppedFuncs int | ||
| DroppedGlobal int | ||
| DroppedMethod int | ||
| DroppedMethodDetail map[irgraph.SymID][]int | ||
| } |
There was a problem hiding this comment.
Code Quality: The Stats struct declares DroppedFuncs and DroppedGlobal fields that are never populated by the Apply function. These unused fields are misleading as they suggest the DCE pass tracks dropped functions and globals, but it currently does not. Consider removing them or adding a TODO comment if they are planned for future use.
| newMethods[i] = zeroed | ||
| changed = true | ||
| dropped++ | ||
| detail[irgraph.SymID(g.Name())] = append(detail[irgraph.SymID(g.Name())], i) |
There was a problem hiding this comment.
Performance: Minor optimization: g.Name() is called twice, creating two SymID conversions. Consider caching the key:
key := irgraph.SymID(g.Name())
detail[key] = append(detail[key], i)| // Walk a value tree (constants + constant expressions) and record any | ||
| // function pointer it embeds. This captures func values stored in globals, | ||
| // composite literals, or casted constants. | ||
| visited := make(map[llvm.Value]bool) |
There was a problem hiding this comment.
Performance: A new visited map is allocated on every call to addRefEdgesFromValue. This function is called for every global initializer and every instruction operand in the module. For large modules with thousands of functions and instructions, this creates significant GC pressure.
Consider either:
- Passing a reusable
visitedmap from the caller and clearing it between uses - Using
sync.Poolfor map allocation - Moving the visited map to a struct field that persists across calls
Code Review SummaryThis PR refactors DCE to build the reloc graph from package context rather than parsing LLVM IR. The new Strengths:
Areas for improvement:
Overall, this is solid compiler infrastructure code. The inline comments highlight specific actionable items. |
Code Review SummaryThis PR implements a well-structured dead code elimination architecture with good separation between reachability analysis ( Key observations:
Areas for improvement:
|
Implement #1587
Background and Motivation
The current llgo build pipeline compiles each package/file into
.o(or.bc) and then performs a final link.This works well for explicit symbol references (functions/globals), but it is weak for implicit Go semantic dependencies, which leads to larger final binaries.
We want to follow Go linker's model: reachability analysis + method-table sentinel/backfill.
The first priority is to mark the right symbols; pruning/backfilling strategy can evolve iteratively.
Goals
Current State and Pain Points
.go/.cproduces an independent object, then everything is linked together.abi.Type -> abi.Method,Ifn/Tfnare real function pointers; once a type is reachable, method entries are treated as strong dependencies.Result: many actually-unused methods and type metadata are retained, increasing binary size.
Reference Relationship between
abi.Typeandabi.Method(Simplified)Example source:
Simplified generated metadata:
Key points:
abi.Type.UncommonType.Moffpoints to the[n]Methodarray.abi.Methodcontains:Name(method-name symbol)Mtyp(method function-type symbol)Ifn(interface-call wrapper symbol)Tfn(direct method-call symbol)Under the default layout,
Ifn/Tfnare real addresses, so the linker cannot treat these relationships as optional.Core Bottleneck
From LLD's perspective,
Ifn/Tfnin method tables are hard references.Even if
initGameis never called, once theGametype descriptor is reachable, the method-tableIfn/Tfnare considered reachable and cannot be precisely pruned.This is the main reason for the current size optimization ceiling.
How Go Does It (Reference Model)
LLGo is a Go compiler, so the right reference point is Go's own pipeline.
At a high level, Go uses two phases:
.o.Phase 1: Per-package semantic marking
The compiler records semantic reachability markers on functions/symbols. Main marker kinds:
R_USEIFACE: recorded on concrete-type-to-interface conversion.Written on the current function symbol, targeting concrete type
type.*.Example:
R_USEIFACEMETHOD: recorded on interface method calls.Written on the current function symbol, targeting interface type
type.*with method offset.Example:
R_USENAMEDMETHOD: conservative marker when only method name is known.Typical for generic interface calls or constant-name
MethodByName("Foo").Example:
R_METHODOFF: relocations in method-table triples (Mtyp/Ifn/Tfn).Example:
type.Bmethod table, bothLoadandhiddeneach have threeR_METHODOFFentries (Mtyp/Ifn/Tfn), and link time decides real offset vs sentinel.Conceptual relation:
R_USEIFACE(type.B)onmain.mainmakestype.Benter the reachable path.R_USENAMEDMETHOD("Foo")on some function conservatively retains methods namedFoo.type.Bcontributes threeR_METHODOFF, and reachability decides real offset vs sentinel.Combined source-to-markers example:
Markers are conceptually placed as:
main.mainrelocation table:R_USEIFACE -> Sym=type.BR_USEIFACEMETHOD -> Sym=type.A, Add=offset of LoadR_USENAMEDMETHOD("Load")(constant-name case)type.Bmethod table (uncommon.Methods):R_METHODOFFentries per method (Mtyp/Ifn/Tfn)Phase 2: Global link-time reachability (deadcode flood)
The linker performs a global flood using compile-time markers:
main/main..inittask, runtime base symbols, plugin/export entries,runtime.unreachableMethod, etc.R_METHODOFFis backfilled by reachability: real offset if reachable,-1sentinel if unreachable.AttrReflectMethod,R_USENAMEDMETHOD, etc.) trigger conservative method retention.dynexp) are treated as roots at deadcode initialization to avoid over-pruning.Minimal BFS flood example:
Traversal:
main.main, see reference tofoo, mark+enqueuefoo.foo, see reference tobar, mark+enqueuebar.bar, no new refs, queue ends.Reachable:
main, foo, bar.Another example: precise interface-method retention by reloc matching (
abiType + Name):Simplified reachability/backfill flow:
main; flood reachesuse, andR_USEIFACE(type.T)putstype.Tinto interface-related reachable domain.R_USEIFACEMETHODonuseto obtain interface method demand (Name=Load+ compatible signature).type.Tmethod-tableR_METHODOFFtriples; match byabiType + Name; mark onlyLoadas reachable method.Ifn/TfnforLoad; write sentinel/null for unmatched methods (Hidden), enabling later pruning.Relevant Go sources:
cmd/compile/internal/reflectdata/reflect.go(MarkTypeUsedInInterface,MarkUsedIfaceMethod)cmd/compile/internal/walk/expr.go(usemethod, etc.)cmd/link/internal/ld/deadcode.go(roots +R_USEIFACE/R_USEIFACEMETHOD/R_USENAMEDMETHOD/R_METHODOFF)cmd/link/internal/ld/data.go(R_METHODOFFreal offset for reachable,-1sentinel for unreachable)LLGo Go-Style LinkTime Adaptation Strategy (This PR)
LLGo strategy in this PR:
call/reffrom each package'sllvm.Module.USEIFACE/USEIFACEMETHOD/USENAMEDMETHOD/METHODOFF/...) are produced by llssa during SSA lowering.relocgraph.Graph, then globally merged and analyzed with one flood algorithm.type -> method indices), we emit strong symbol overrides inmainPkgand remove unnecessaryIFn/TFn.Core data structures (simplified):
ReachableMethodsis the direct input to the later rewrite stage:type symbol -> reachable method index set.Data Flow (This PR)
Key properties of this design:
call/ref) and Go semantic reachability (reloc) are modeled in one graph..opipeline (not.ll-only).Why This Design
First, this path is structurally aligned with Go.
Per-package semantic markers + global link-time reachability + method-table backfill is the fastest, most stable way to approach Go-equivalent pruning behavior.
Second, reachability is computed directly from symbol references.
For many ordinary call paths (especially runtime calls),
llvm.Moduleuse/refalready gives stable graph edges directly, without requiring extra SSA-level interpretation of how Go semantics get lowered into llgo calls.These relationships are already close to binary-level expression: stable and composable.
Three minimal examples:
Example 1: Channel ops map directly to runtime symbols
internal/relocgraph/_testdata/chanops/in.gointernal/relocgraph/_testdata/chanops/out.txtExample 2: Closure calls are explicit at symbol level
internal/relocgraph/_testdata/closure/in.gointernal/relocgraph/_testdata/closure/out.txtExample 3: Local function pointers become stable symbol edges
internal/relocgraph/_testdata/funcptrlocal/in.gointernal/relocgraph/_testdata/funcptrlocal/out.txtKey Points of This PR
Current Limitation and Follow-up (cache-hit and ForceRebuild)
In the current implementation, DCE temporarily enables
ForceRebuildbecause:call/refedges collected from the package LLVM module..aartifacts and link metadata, but we do not yet restore the LLVM-sidecall/refedge set.relocgraph, which can impact global reachability correctness.So for now, DCE forces rebuild to guarantee correctness.
Planned follow-up:
.a(for example as an archive member or a sidecar file).ForceRebuildwhile keeping correctness.