Improve build performance without CI fan-out#1832
Improve build performance without CI fan-out#1832cpunion wants to merge 45 commits intoxgo-dev:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces significant build performance optimizations and caching mechanisms across the build system. Key improvements include parallelizing C and assembly compilation with a configurable worker pool, caching pkg-config results and LLVM version detection, and optimizing manifest fingerprinting using unsafe string-to-byte conversions to avoid allocations. Additionally, the PR implements a fast-path for manifest metadata parsing to bypass full YAML decoding and optimizes C file scanning by leveraging package metadata. Review feedback suggests further performance gains by caching environment variables used in pkg-config lookups and increasing the default parallel job limit for high-performance build environments.
| for _, env := range os.Environ() { | ||
| if strings.HasPrefix(env, "PKG_CONFIG") { | ||
| keyParts = append(keyParts, env) | ||
| } | ||
| } |
There was a problem hiding this comment.
Calling os.Environ() inside cachedPkgConfig is inefficient because it allocates and copies the entire environment on every call. Since this function is called for every #cgo pkg-config directive across all files in a package, this can add significant overhead in projects with many cgo dependencies. Consider caching the filtered PKG_CONFIG environment variables globally or within the build context to improve performance.
| if jobs > 16 { | ||
| return 16, nil | ||
| } |
There was a problem hiding this comment.
The hardcoded limit of 16 parallel jobs may be too low for modern high-performance build environments. Since this PR focuses on build performance, consider increasing this limit or removing it entirely to allow full utilization of available CPU cores, especially as clang processes are independent.
| if jobs > 16 { | |
| return 16, nil | |
| } | |
| if jobs > 64 { | |
| return 64, nil | |
| } |
2c3d623 to
7fbdf87
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1832 +/- ##
==========================================
- Coverage 88.37% 88.36% -0.01%
==========================================
Files 51 51
Lines 14498 14756 +258
==========================================
+ Hits 12812 13039 +227
- Misses 1468 1485 +17
- Partials 218 232 +14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
8566a2d to
cd7cc02
Compare
Result: {"status":"keep","rebased_internal_build_wall":30.925,"go_reported_s":30.325,"baseline_s":32.555,"patched_s":30.925,"delta_s":-1.63,"base_go_reported_s":31.983,"patched_go_reported_s":30.325}
Result: {"status":"keep","rebased_internal_build_wall":31.8,"go_reported_s":31.224,"baseline_s":31.8,"patched_s":31.8,"delta_s":0,"base_go_reported_s":31.224,"patched_go_reported_s":31.224}
Result: {"status":"keep","warm_internal_build_wall":29.279,"go_reported_s":28.415,"baseline_s":31.157,"patched_s":29.279,"delta_s":-1.878,"base_go_reported_s":30.589,"patched_go_reported_s":28.415,"wall_s":29.279}
Result: {"status":"keep","warm_internal_build_wall":29.256,"go_reported_s":28.699,"baseline_s":31.312,"patched_s":29.256,"delta_s":-2.056,"base_go_reported_s":30.727,"patched_go_reported_s":28.699,"wall_s":29.256}
Result: {"status":"keep","warm_internal_build_wall":29.456,"go_reported_s":28.89,"baseline_s":30.32,"patched_s":29.456,"delta_s":-0.864,"base_go_reported_s":29.756,"patched_go_reported_s":28.89,"wall_s":29.456}
Result: {"status":"keep","warm_internal_build_wall":28.574,"go_reported_s":27.996,"baseline_s":30.326,"patched_s":28.574,"delta_s":-1.752,"base_go_reported_s":29.768,"patched_go_reported_s":27.996,"wall_s":28.574}
340e732 to
fd5680f
Compare
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Summary
This PR is the pure build-performance subset of the previous build-perf work. It intentionally removes the CI workflow/job split and sharding changes so the branch focuses only on compiler/build hot-path improvements.
What Changed
Intentionally Not Included
llgobinary.Validation
Ran locally on this branch:
Latest Local Follow-up: Async Native Object Emission
Added source-only commit
e94d62ato overlap native host object emission with later package work. The main compiler still serializes LLGo/LLVM IR generation, then sends serialized LLVM IR to bounded external clang workers for native object emission. This avoids same-process LLVM concurrency while overlapping object generation.LLGO_PARALLEL_OBJECT_EMIT=0is available as an opt-out, and debug/-genll/command-tracing paths remain synchronous.Local evidence before pushing:
go test ./internal/build -run '^TestExtest$' -count=3go test ./internal/build -count=1go test ./internal/build -count=1go clean -cache && go build -a -tags=dev ./cmd/llgoAdditional local validation passed: targeted object-emission gating tests,
go test -racesubset forinternal/build, cleango build -a -tags=dev -o <tmp> ./cmd/llgo,test ! -e llgo, andgit diff --check.Follow-up commit
1bb466blowers the bounded object-emission worker cap from 4 to 2 to reduce external clang contention. Local full-suite A/B overe94d62a:go test ./internal/build -count=1go test ./internal/build -count=1Cap 1 and cap 3 both regressed locally, so cap 2 is the current local best. Race subset and clean
go build -a -tags=devguards passed after the cap change.Follow-up commit
740b15cavoids copying the serialized LLVM IR string into a[]bytebefore writing the temporary.llfile. Local full-suite A/B over cap-2 async emission:go test ./internal/build -count=1go test ./internal/build -count=1A targeted
TestExtest/ object-emission gating test run passed before pushing.Follow-up commit
48cde59uses the C clang driver instead of clang++ for native async IR object emission while preserving the configured compiler for cross/-genllpaths. Local full-suite A/B over740b15c:go test ./internal/build -count=1go test ./internal/build -count=1Targeted
TestExtest/ object-emission gating tests passed before pushing; fullgo test ./internal/build -count=1, cleango build -a -tags=dev -o <tmp> ./cmd/llgo,test ! -e llgo, andgit diff --checkalso passed after pushing.Follow-up commit
10cc3f4broadens async object emission to external-clang/cross builds too, while explicitly keeping-genll, IR checking, and command-tracing paths synchronous. Local full-suite A/B over48cde59:go test ./internal/build -count=1go test ./internal/build -count=1Targeted
TestParallelObjectEmitEnabled/TestExtest, race subset, and cleango build -a -tags=devguards passed before pushing. Follow-upa2e2505keeps external/cross async emission on the target-specific compiler (instead of the nativeclangdriver) after Targets CI exposed xtensa builds using the wrong compiler; localbuild.sh empty esp32passed with the fix.Follow-up commit
61b1b1dpipes async LLVM IR to clang via stdin (clang -x ir -c -) instead of writing temporary.llfiles whenGenLL/IR-checking are disabled. Debug/check paths still materialize.llfiles. Local full-suite A/B overa2e2505:go test ./internal/build -count=1Validation also passed: targeted async/object-emission tests plus
TestExtest,build.sh empty esp32, race subset, cleango build -a -tags=dev, andgit diff --check.Follow-up commit
1c53158leavesgo/types.Info.Scopesnil in LLGo package loads because LLGo andx/tools/go/ssado not consume lexical scope records during compilation. This avoids extra type-checker scope recording. Local full-suite A/B over61b1b1d:go test ./internal/build -count=1go test ./internal/build -count=1Validation passed:
go test ./internal/packages ./internal/build ./ssa ./cl -count=1, race subset, cleango build -a -tags=dev, andgit diff --check.Follow-up commit
2ecb5b6avoids copyingyaml.Marshalmanifest bytes into a second string by using the existing read-only unsafe byte-slice-to-string helper. Local full-suite A/B over1c53158:go test ./internal/build -count=1go test ./internal/build -count=1Validation passed: manifest/fingerprint/cache targeted tests, full build-cache script, clean
go build -a -tags=dev, andgit diff --check.Follow-up commit
22d440acreates allx/tools/go/ssapackages first, then callsProgram.Build()once so the upstream SSA builder can use its documented parallel package build path. LLGo/LLVM codegen still runs sequentially after this phase. Local full-suite A/B over2ecb5b6:go test ./internal/build -count=1go test ./internal/build -count=1Validation passed:
go test ./internal/build ./ssa ./cl -count=1, race subset, cleango build -a -tags=dev, andgit diff --check.Follow-up commit
c0ff171avoids constructing fullgo/typesmethod sets in the local SSA order fixup. The fixup now visits explicit named method functions directly viaProgram.FuncValue, avoidingMethodSetallocation for every type. Local full-suite A/B over22d440a:go test ./internal/build -count=1go test ./internal/build -count=1Validation passed:
go test ./internal/build ./ssa ./cl -count=1, targeted SSA-order tests, race subset, cleango build -a -tags=dev, andgit diff --check.Follow-up commit
60e0404tracks whetherbuildSSAPkgsactually created new SSA packages and skips a redundantProgram.Build()traversal when the call only wraps packages built by earlier setup; local SSA fixups still run for returned packages. Local full-suite A/B overc0ff171:go test ./internal/build -count=1go test ./internal/build -count=1Validation passed: targeted
TestExtest/SSA-order/object-emission tests, race subset, andgit diff --check.Follow-up commit
cc6d908writes LLGo build manifests with a deterministic specialized YAML emitter instead of using genericyaml.Marshalreflection for the hot package-manifest path. The cache manifest remains YAML and existing YAML decoding/legacy fallback remain in place. Local full-suite A/B over60e0404:go test ./internal/build -count=1go test ./internal/build -count=1Validation passed: manifest/fingerprint/cache targeted tests, full build-cache script, clean
go build -a -tags=dev, andgit diff --check.Follow-up commit
e213989avoidsstrconv.Quotefor manifest strings that are safe plain YAML scalars, reducing allocations and manifest size in the specialized emitter while still quoting ambiguous/special values. Local full-suite A/B overcc6d908:go test ./internal/build -count=1go test ./internal/build -count=1Validation passed: manifest/fingerprint/cache targeted tests, full build-cache script, clean
go build -a -tags=dev, andgit diff --check.Follow-up commit
86d5af4trims dead build helper code and reuses scratch state in the SSA-order fixup:yaml.Marshalfallback for build-manifest emission,manifestBuilder.Fingerprint,digestFile),fixSSAOrderBlock.Local paired A/B evidence:
go build -a -tags=dev -o <tmp> ./cmd/llgo(yaml.Marshalfallback removal)go build -a -tags=dev -o <tmp> ./cmd/llgodigestFilehelperdigestFilehelpergo test ./internal/build -count=1(SSA-order scratch reuse)A third SSA-order repeat was slightly negative (+0.2%), so that part is treated as a small low-risk allocation cleanup rather than a large claimed speedup. Validation before push passed targeted manifest/fingerprint/digest/metadata/SSA-order/
TestExtesttests, cleango build -a -tags=dev,test ! -e llgo, andgit diff --check.Follow-up commit
8549fc4inlines the only remaining productiondigestBytescall site in the overlay file-digest path and keeps the hash helper test-local. This trims a dead production helper after the earlierdigestFilecleanup while preserving the samesha256+ hex encoding logic.Local clean-build A/B over
86d5af4:go build -a -tags=dev -o <tmp> ./cmd/llgoValidation before push passed targeted digest/manifest/metadata/
TestExtesttests, fullgo test ./internal/build -count=1, cleango build -a -tags=dev -o <tmp> ./cmd/llgo,test ! -e llgo, andgit diff --check.CI Result for Latest Head (
8549fc4)Latest pushed head is CI-clean:
test (macos-latest, 19)in Go workflow)Compared with the previous clean head
86d5af4, this run improves total runner time by 6m40s, LLGo workflow total by 13m10s, and end-to-end wall time by 5m34s, while the longest job is 1m28s longer due to the Go workflow macOS test job. Compared with the coverage-equivalentmainbaselineb4d9167, it is lower in total runner time (-39m42s), LLGo workflow total (-40m41s), wall time (vs that main sample), and longest-job time (-0m05s). As before, hosted-runner variance remains significant; local paired A/B is the primary source-level evidence.CI Result for Latest Head (
86d5af4)Latest pushed head is CI-clean:
llgo (macos-15-intel, 19, 1.24.2)in LLGo workflow)Compared with the previous clean head
e213989, this run improves total runner time by 14m37s and LLGo workflow total by 7m31s, with a similar longest job (-19s) and slightly higher end-to-end wall time (+47s). Compared with the coverage-equivalentmainbaselineb4d9167, it is lower in total runner time (-33m02s), LLGo workflow total (-27m31s), and longest job (-1m33s). As before, local paired A/B remains the primary evidence for source-level changes because hosted-runner timing is noisy.CI Result for Latest Head (
e213989)Latest pushed head is CI-clean:
test (macos-latest, 19)in Go workflow)Compared with the previous clean head
60e0404, this hosted-runner sample is mixed/noisier: total runner time is +13m16s and LLGo workflow total is +5m17s, while the longest job improves by 2m59s. Compared with the coverage-equivalentmainbaselineb4d9167, it remains faster in total runner time (-18m25s), LLGo workflow total (-20m00s), and longest job (-1m14s). The manifest-emitter commits are therefore justified primarily by local paired A/B and build-cache validation rather than by claiming a whole-CI timing win from this single run.CI Result for Latest Head (
60e0404)Latest CI completed clean: 57 successful checks and 1 skipped check; merge state is
CLEAN.Compared with the previous clean async-tuning sample (
48cde59), CI is mixed: total runner time and LLGo workflow total are higher on this run, while wall time is essentially unchanged and several individual jobs still improve. This reinforces that the later SSA/cache hot-path commits are justified primarily by local paired A/B evidence, not by a single hosted-runner timing sample.48cde5960e0404Compared with the coverage-equivalent
mainbaseline (b4d9167), the latest head remains lower in total runner time and LLGo workflow total, though the longest single job is higher in this sample.mainb4d916760e0404CI Result for Latest Async Object Emission Tuning (
48cde59)Latest CI completed clean: 57 successful checks and 1 skipped check. Compared with the first async object-emission CI sample (
e94d62a), the follow-up cap/IR-copy/clang-driver tuning improves total runner time, LLGo workflow total, and end-to-end wall time, though the single longest job is longer on this sample.e94d62a)48cde59)Compared with the pre-async PR head (
7004afe), the latest head is also lower by total runner time (5h59m35s vs 6h05m05s) and LLGo workflow total (4h19m13s vs 4h32m44s), with the same workflow topology/job coverage.CI Result for Async Object Emission (
e94d62a)The first CI attempt hit a transient GitHub
502 Bad Gatewaywhile downloading the ESP newlib tarball inhello (macos-latest, 19, 1.26.0). Rerunning the failed job succeeded; final PR status is clean: 57 successful checks and 1 skipped check.Compared with the previous PR head
7004afe, the latest run improves the longest single job but does not show a total-runner-time win on this one CI sample:7004afe)e94d62a)Against the fastest coverage-equivalent
goplus/mainbaseline (b4d9167), the latest PR head remains faster overall (6h13m10s vs 6h39m33s total runner time), but the async object-emission commit itself needs more CI samples before claiming a whole-CI improvement.CI Runtime Snapshot (latest head after tool environment caching)
Measured from GitHub Actions job
startedAt/completedAttimestamps. Skipped jobs are excluded from runtime totals. Codecov checks are not included because they are external status checks rather than Actions runtime jobs. This source-only branch keeps the same workflow topology asmain; end-to-end wall time can still vary significantly with hosted-runner queueing, so total runner time is the less noisy cost proxy.Data sources:
43d48c75f5c284805930291cdbfd38f4f9c9bc7d:25047647064,25047647084,25047647105,25047647095,25047647059,25047647044,25047647061,25047647056goplus/mainCI run set by total runner time atb4d9167e460d91a4a0f09a0f8616670a8fbd23fa:24972314382,24972314373,24972314376,24972314381,24972314377,24972314387,24972314374,24972314386Baseline selection note for future snapshots: compare against the fastest completed
goplus/mainrun set that has the same workflow topology / job coverage (same non-skipped and skipped job count where possible). Older completedmainruns with fewer jobs are not used as the main baseline because they are not coverage-equivalent. In the currently queried recent completedmainruns, there are 3 completed successfulmainrun sets with the same 55 non-skipped / 1 skipped job topology;b4d9167remains the fastest by total runner time, while7ea3148is the fastest by wall time.43d48c7)goplus/main(b4d9167)Latest longest PR job:
Go / test (macos-latest, 19).Fastest comparable main total runner-time baseline:
b4d9167e460d91a4a0f09a0f8616670a8fbd23fa. Fastest comparable main wall-time baseline:7ea31484337c1d3b560fea9f07bbca1dcf75150aat 2h09m59s.goplus/mainjobs / total / wall / longestLatest Pure Build Hot-Path Follow-up
After the rebase, added one source-only follow-up commit
85d1523focused on cgo build metadata and pragma hot paths, without changing CI workflow topology.Local focused benchmarks used during the follow-up:
splitDirectiveArgsmostly-unquoted argsgo:cgo_*build-flow pragma collectionbuildCgowith complete metadataOtherFilesmetadata extractionAdditional local validation after the follow-up:
Additional Pure Build Hot-Path Follow-up
Added source-only commit
b427dafwith further cgo metadata / pragma scan reductions. CI workflow topology is still unchanged.Local focused benchmarks from this follow-up:
buildCgowith complete metadataOtherFilesmetadataOtherFilesmetadata//go:cgo_line-comment pragma parsingAdditional local validation after this follow-up:
Additional Pure Build Hot-Path Follow-up 2
Added source-only commit
437d443to reuse cgo pragma scan results across Darwin Plan9 asm handling and reducego:cgo_import_dynamicparsing overhead. CI workflow topology is still unchanged.Local focused benchmark from this follow-up:
This removes a duplicate AST comment scan between
compilePkgSFiles' Darwin trampoline skip check and the later cgo alias/link-arg collection, then reduces allocation while parsing repeated exact//go:cgo_import_dynamicline directives.Additional local validation after this follow-up:
Whole Build Pipeline Follow-up
Added source-only commit
b236c0ato remove duplicate archive work in the LLGo build cache miss path. Previously an uncached package was archived to a temporary.a, then copied into the build cache. The build now publishes the archive directly at the cache path and uses that archive for the current link, falling back to the temporary archive path only when cache publication is unavailable.End-to-end local evidence focused on the whole
internal/buildpipeline rather than microbenchmarks:go test ./internal/build -run '^TestExtest$' -count=1wallRejected during the same whole-process pass: linking uncached main package object files directly instead of archiving them. It passed
TestExtest, but did not improve over the cache-archive change and added linker-order complexity, so it was dropped.Additional local validation after this follow-up:
CI workflow topology is still unchanged.
Whole Build Setup Follow-up
Added source-only commit
e36a84cto cache successful macOS SDK sysroot discovery within a process. Native macOS builds callxcrun --sdk macosx --show-sdk-pathwhile setting up crosscompile flags; the fullinternal/buildtest package invokes the build pipeline repeatedly, so reusing a successful sysroot lookup avoids repeated external setup work without changing generated outputs. Failed lookups are not cached, so transientxcrunfailures can still be retried.End-to-end local evidence used the full
internal/buildpackage test pipeline rather than a microbenchmark:go test ./internal/build -count=1wallAdditional local validation after this follow-up:
CI workflow topology is still unchanged.
Whole Build Setup Follow-up 2
Added source-only commit
6b54e5fto keep LLVM's bin directory first inPATHwithout prepending duplicate entries on everyinternal/build.Docall. The previous setup mutated processPATHrepeatedly during multi-build processes such asgo test ./internal/build, growing duplicate LLVM path entries and increasing external tool lookup/setup overhead. Empty LLVM bin dirs are now ignored instead of prepending an empty path component.End-to-end local evidence again used the full
internal/buildpackage test pipeline rather than a microbenchmark:go test ./internal/build -count=1wallAdditional local validation after this follow-up:
CI workflow topology is still unchanged.
Whole Build Setup Follow-up 3
Added source-only commit
90ea768to make LLVM target initialization idempotent within the process.internal/build.Dois called repeatedly by the fullinternal/buildtest pipeline; each call previously invokedllssa.Initialize(llssa.InitAll). LLVM target initialization is process-global, so already-initialized flag groups can be skipped while still allowing later calls with additional flags to initialize any missing groups.End-to-end local evidence used the full
internal/buildpackage test pipeline:go test ./internal/build -count=1wallAdditional local validation after this follow-up:
CI workflow topology is still unchanged.
Whole Build Pipeline Follow-up 2
Added source-only commit
1629549to overlap cache archive/manifest publication with later package builds. A temporary phase trace of the wholeTestExtestpipeline showed cache publication as one of the larger traced subphases (saveToCacheabout 1.48s cumulative in that workload). The build now starts bounded asynchronous cache saves after cache misses and waits before linking, so archive/manifest I/O can overlap with subsequent package codegen while preserving link inputs and falling back to a temporary archive if cache publication does not produce one.End-to-end local evidence used the full
internal/buildpackage test pipeline:go test ./internal/build -count=1wallAdditional local validation after this follow-up:
CI workflow topology is still unchanged.
Next Optimization Work
Potential follow-ups after this PR, ordered by expected value and required design work:
clcompile, LLVM emit, archive, link, and test-run phases. Use it onTestExtest,go test ./internal/build, and clean/warmgo build -tags=dev ./cmd/llgobefore making more source changes.cmd/llgodependency graph slimming: cleango build -a -tags=dev ./cmd/llgostill spends much time in transitive stdlib / x-tools / crosscompile dependencies. Larger gains may require CLI/dependency layering changes, which should be evaluated separately from this focused build-hot-path PR.Directions already measured and not worth revisiting without new phase evidence: hard-coded GC tuning, disabling linker ICF, removing
packages.NeedExportFile, disabling build cache, broad crosscompile/env negative caches, native-only LLVM init, ABI global scan gating, and further small cgo parser/string micro-optimizations.Whole Build Pipeline Follow-up 3
Added source-only commit
4e8a123to let the existing bounded cache-publication workers also perform the archive fallback for uncached packages. The previous async cache save path overlapped cache archive/manifest publication, but packages that are intentionally not cached (notablymainpackages) still fell back tonormalizeToArchivewhile waiting for pending cache saves. The worker now:main, and fingerprint/manifest are available; force rebuild still bypasses cache reads but repopulates cache entries;mainpackages.End-to-end local evidence used the full
internal/buildpackage test pipeline:go test ./internal/build -count=1wallA focused attempt to lower the worker cap from 4 to 2 was discarded because it did not improve the whole workload and would overfit a local run.
Additional local validation after this follow-up and the force-rebuild cache refresh fix (
e733953):CI workflow topology is still unchanged.
Whole Build Setup Follow-up 4
Added source-only commit
43d48c7to cache repeated tool-environment lookups inside long-lived build/test processes:internal/env.GoEnvWithEnvnow caches successfulgo env ...results by requested variables and effective environment, while still not caching failures. This avoids repeatedly spawninggo env GOROOT GOVERSIONacross multipleinternal/build.Docalls in the same process.xtool/env/llvm.Newnow caches successfulllvm-config --bindirresults by effective llvm-config/PATH selection, while still retrying failures. This avoids repeatedly spawningllvm-config --bindirduring multi-build workloads.Local evidence used repeated full
internal/buildpackage runs after the async cache-publication changes:go test ./internal/build -count=1wallgo test ./internal/env ./xtool/env/llvm ./internal/build -count=1wallRejected during this local-only sweep before pushing:
internal/buildbadly;ssa.Program.Build: no improvement;llvm-configpath lookup: no improvement beyond caching successful--bindir.Additional local validation before pushing this follow-up:
CI workflow topology is still unchanged.
Whole Build Pipeline Follow-up 5
Added source-only commit
3d3a3e3to move expensivegolang.org/x/tools/go/ssasanity checking out of the default build hot path:InstantiateGenericsbut does not runSanityCheckFunctionsfor every compiled package.LLGO_SSA_SANITY=1restores the old sanity-check behavior for debugging/validation.LLGO_SSA_SANITYis included in cache fingerprint env inputs, so enabling it forces rebuilds instead of reusing default no-sanity cache entries.TestSSABuildModeSanityOptInto cover the default and opt-in modes.Local evidence from the representative
internal/buildworkload:go test ./internal/build -count=1wallgo test ./internal/build -count=1go-reportedThis was profile-guided:
TestExtestCPU/memory profiles showedssa.mustSanityCheck/ssa.WriteFunctionallocating roughly 287MB in the old default path. The new default removes that validation cost from normal builds while preserving an explicit opt-in path.Rejected during this sweep:
ssatype-conversion allocation tweak: regressed fullinternal/build.Additional local validation before pushing this follow-up:
Whole Build Pipeline Follow-up 6
Added source-only commit
b79f897to reduce repeated setup and scheduler overhead in multi-build processes:LLGO_ROOTdiscovery now prints the repeated “Using LLGO root for devel” warning only once per root in a process, preserving the diagnostic while avoiding repeated stderr I/O duringinternal/buildtests and other long-lived build drivers.internal/packagesnow avoids spawning package-load goroutines for narrow import fanout and for the common single-root load case, while preserving parallel loading for wider import graph nodes.TestLLGoROOTWarnsOnceForDevelRootfor the warning-once behavior.Local representative evidence from this sweep:
go test ./internal/build -count=1wallgo test ./internal/build -count=1go-reportedgo build -a -tags=dev ./cmd/llgollgoartifactRejected during this local-only sweep before pushing:
LPkgmiss hidden ABI/link side effects;internal/buildcgo shim: failed include-path portability and would add package-level cgo risk;ssa/abi: passed SSA tests but regressed the fullinternal/buildworkload;TypesInfo.Scopescollection and small ABI metadata slice preallocation: both regressed representative runs;LLGO_ROOTcaching: no representative win and higher stale-global-state risk.Additional local validation before pushing this follow-up:
CI workflow topology is still unchanged.
Whole Build Pipeline Follow-up 7
Added source-only commit
fff8fffto reducego/typesmap growth during package loading:internal/packages.loadPackageExnow gives thetypes.Infomaps modest initial capacities based on the number of parsed source files.go/types.Checker.recordTypeAndValue/TypesInfopopulation without changing which type information LLGo collects.TypesInfomaps (Types,Defs,Uses,Implicits,Instances,Scopes,Selections) because omitting any required map is unsafe.Local representative evidence from this sweep:
go test ./internal/build -count=1wallgo test ./internal/build -count=1go-reportedgo build -a -tags=dev ./cmd/llgollgoartifactCapacity tuning kept
1024 * len(syntax)forTypes,1/2of that forDefs/Uses, and small per-file capacities for the smaller maps. Rejected alternatives:TypesInfo.Implicits: crashed immediately;TypesInfo.Scopes: previously passed but regressed;Typescapacity to 1536 or 2048 per file: regressed;Defs/Usesto matchTypes: regressed;internal/buildcgo shim: improvedinternal/buildexecution but doubled cleango build -a ./cmd/llgo, so it was rejected and not pushed.Additional local validation before pushing this follow-up:
CI workflow topology is still unchanged.
Latest CI Runtime Snapshot (
fff8fff)Latest head:
fff8fffec897f5a4ccf0d2f52426440962d08adb(pre-size package type info maps). All checks completed successfully; CI workflow topology remains unchanged.Compared against the fastest completed coverage-equivalent
goplus/mainbaseline by total runner time from the current baseline pool (b4d9167, same55non-skipped /1skipped topology):fff8fffb4d9167Go / test (ubuntu-latest, 19))LLGo / llgo (macos-15-intel, 19, 1.26.0))Baseline pool checked: latest completed successful
goplus/mainrun sets with equivalent55non-skipped /1skipped coverage. Fastest by total runner time wasb4d9167at6h39m33s; fastest by end-to-end wall time was7ea3148at2h10m04s. PRfff8fffend-to-end wall time was51m04s.Per-workflow comparison against
b4d9167:Top latest PR jobs by duration:
test (ubuntu-latest, 19)test (macos-latest, 19)llgo (macos-15-intel, 19, 1.26.0)llgo (macos-15-intel, 19, 1.24.2)llgo (macos-15-intel, 19, 1.21.13)Note: end-to-end wall time is affected by GitHub-hosted runner queueing and scheduling; total runner time is usually the better signal for source-level build-performance changes.
Whole Build Pipeline Follow-up 8
Added source-only commit
66693e8to skip parser object resolution during LLGo package loads:internal/buildnow supplies apackages.Config.ParseFilecallback that usesparser.SkipObjectResolutionwhile still preservingparser.ParseCommentsfor LLGo directives (go:linkname,llgo:*,go:embed, cgo pragmas, etc.).go/typesand x/tools SSA object information, not parser-populatedast.Ident.Obj/ast.File.Scopelinks, so this avoids unnecessary parser work without reducing comment/directive coverage.TestParseBuildFileSkipsObjectResolutionAndKeepsCommentsto cover both properties: comments are retained, parser object links are not populated.Local representative evidence:
go test ./internal/build -count=1wallgo test ./internal/build -count=1go-reportedgo build -a -tags=dev ./cmd/llgoRejected during this local-only sweep before pushing:
internal/llvmextcgo package: improved LLGo execution workloads but regressed cleango build -a ./cmd/llgoto 12.8s; still best pursued by addingEmitToFileto the existinggithub.com/goplus/llvmbinding instead;TypesInfomap sizing: improved noisyinternal/buildruns but regressed clean cmd build guard;TypesInfocapacity from 1024/file to 512/file: small/noisy internal improvement but clean cmd guard regressed slightly;SkipObjectResolutionalone and has higher risk of missing directive forms.Additional local validation before pushing this follow-up:
CI workflow topology is still unchanged.
Latest CI Runtime Snapshot (
66693e8)Latest head:
66693e87e859e1d912209f1172533bdb8e95ffc4(skip parser object resolution in build loads). All checks completed successfully; CI workflow topology remains unchanged.Compared against the fastest completed coverage-equivalent
goplus/mainbaseline by total runner time from the current baseline pool (b4d9167, same55non-skipped /1skipped topology):66693e8b4d9167LLGo / llgo (macos-15-intel, 19, 1.21.13))LLGo / llgo (macos-15-intel, 19, 1.26.0))Compared with the previous PR head snapshot (
fff8fff), this run was slower in CI (+17m39stotal runner,+8m16swall). The local representative tests for66693e8improved, so this CI delta is treated cautiously because the longest latest job shifted to a single macOS Intel matrix job.Per-workflow comparison against
b4d9167:Top latest PR jobs by duration:
llgo (macos-15-intel, 19, 1.21.13)llgo (macos-15-intel, 19, 1.26.0)test (ubuntu-latest, 19)test (macos-latest, 19)llgo (macos-15-intel, 19, 1.24.2)Note: end-to-end wall time is affected by GitHub-hosted runner queueing and scheduling; total runner time is usually the better signal for source-level build-performance changes.
LLVM EmitToFile CI Validation (
b549c2d)Temporary CI-validation commit
b549c2dswitches native host object emission fromEmitToMemoryBuffer+ Go-side object-file write to the newTargetMachine.EmitToFileAPI on a forked LLVM module branch:github.com/cpunion/llvm, branchfeat/emit-to-file40fdafa target: emit target machine output to filereplace github.com/goplus/llvm => github.com/cpunion/llvm v0.8.9-0.20260429084913-40fdafa22ac4Local A/B before pushing:
v0.8.8memory buffer pathEmitToFilego test ./internal/build -run '^TestExtest$' -count=3wallgo test ./internal/build -run '^TestExtest$' -count=3go-reportedgo test ./internal/build -count=1go build -a -tags=dev ./cmd/llgoThis validates the direction that PR #1823 enabled: native object emission should move from memory-buffer output to direct file output, but the API belongs in the existing
github.com/goplus/llvmbinding rather than a new LLGo-side cgo shim. Once the LLVM PR is merged/tagged, the temporaryreplaceshould be removed and LLGo should depend on the releasedgithub.com/goplus/llvmversion.CI result for LLVM
EmitToFilereplace commit (b549c2d)All checks completed successfully for
b549c2d9e8cdedfad0b4ba919beb72e20d7340cf(57success,1skipped; merge stateCLEAN).Coverage-equivalent comparison (
55non-skipped jobs,1skipped job):b549c2dwith forked LLVMEmitToFile66693e8b4d9167llgobuild-job bucketPer-workflow totals for
b549c2dvs previous PR head66693e8:b549c2d66693e8CI is noisy at whole-run granularity: total runner time regressed vs the previous PR head mostly from Go/Docs/test-job variance, while end-to-end wall, longest job, Release Build, Targets, and the LLGo
llgobuild-job bucket improved. The local targeted A/B remains the clearest signal for theEmitToFileimplementation itself (TestExtest -count=3: 32.1s -> 23.2s wall, -27.7%).EmitToFile validation discarded
The temporary
b549c2dvalidation commit usingreplace github.com/goplus/llvm => github.com/cpunion/llvm ...has been dropped from this PR and the branch has been reset back to66693e8.Reason: while the forked LLVM
EmitToFileAPI was functional and showed a targeted local improvement forTestExtest, the completed CI run did not show a clear whole-pipeline/total-runner-time improvement over the previous PR head. To keep this PR focused on proven build-performance changes, the temporary dependency replace and LLGo code change are discarded. The LLVM API work can still be pursued separately as a cleanup/API PR, but it is not included here as a build-performance change.Local follow-up after dropping EmitToFile (
7004afe)After discarding the temporary LLVM
EmitToFilereplace commit, this source-only follow-up adds two low-risk local hot-path reductions:internal/goembed: skip the fullgo:embeddeclaration walk when parsed comments contain nogo:embedtext.ssa/type_cvt.go: lazily allocate converted tuple/interface/struct slices only when a nested type actually changes.Local paired A/B (
origin/improve/build-perf-pureat66693e8vs combined patch):7004afepatchgo test ./internal/build -run '^TestExtest$' -count=3wallIncremental local A/B while developing:
go:embedpre-scanDiscarded during this round because paired A/B regressed:
isGoSSAOpaqueTypereflection fast path for publicgo/typesimplementations.runtime.Version()parse cache in package loading.cvtNamedzero-method allocation special case.Local validation before pushing
7004afe: