Dedup target list in ProjectGraph.ExpandDefaultTargets to prevent graph explosion#13855
Open
dfederm wants to merge 2 commits into
Open
Dedup target list in ProjectGraph.ExpandDefaultTargets to prevent graph explosion#13855dfederm wants to merge 2 commits into
dfederm wants to merge 2 commits into
Conversation
…ph explosion ProjectGraph.ExpandDefaultTargets now unconditionally dedupes its output via a hybrid fast/slow path. The downstream BFS cross-products entry-targets against matching PRT items, so any N-duplicate entry list at one hop becomes ~N^2 propagations at the next; over BFS depth D this is N^D edges. Duplicates arise from PRT marker double-emission and from explicit literal duplicates in Targets metadata. ExpandDefaultTargets has a zero-allocation fast path (inline O(n^2) scan for n <= 8) returning the input unchanged when no marker or duplicate is found, and a HashSet-backed slow path otherwise. Dedup is OrdinalIgnoreCase, first-occurrence wins, matching the existing post-BFS dedup at the public GetTargetLists boundary -- so no consumer-observable behavior changes and no ChangeWave is needed. BFS hot path moved off ImmutableList<string>/ImmutableList<TargetSpecification>: ProjectGraphBuildRequest.RequestedTargets, ExpandDefaultTargets, TargetsToPropagate.FromProjectAndEntryTargets, and GetApplicableTargetsForReference all flow string[] end-to-end. TargetsToPropagate collapses two ImmutableLists to one flat TargetSpecification[] + outer-build count. LINQ removed from the per-edge loop in FromProjectAndEntryTargets. ImmutableList<string> is retained only at the public GetTargetLists boundary and in targetLists[node] where the AddRange chain actually derives each version from the prior. 11 new tests (22 with TFMs) cover dedup behavior plus end-to-end GetTargetLists smoke at depth 12 with the duplicate-marker shape and depth 6 with the common single-marker shape. End-to-end GetTargetLists(["Build"]) benchmark on realistic .NET-shape graphs: 19-51% faster and 24-31% lower allocation across small/medium/large trees and linear chains; the pathological duplicate-marker shape is 187x faster and 172x lower allocation at 50 nodes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
rainersigwald
approved these changes
May 26, 2026
- Wrap unit tests in TestEnvironment.Create(_output) to match the in-file end-to-end test pattern and the sibling ProjectGraph_Tests convention, ensuring evaluation state from in-memory Project instances doesn't leak across tests via the global ProjectCollection (Copilot bot suggestion). - Defer the empty-list allocation in FromProjectAndEntryTargets until the first target is actually appended, avoiding an empty List<TargetSpecification> allocation when a matched PRT item has empty Targets metadata (Copilot bot suggestion). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Microsoft.Build.Graph.ProjectGraphbuilds the static graph by walking project references breadth-first. For each edge it expandsProjectReferenceTargets(PRT) items to decide which targets to propagate, including two markers:.default→ the referenced project'sDefaultTargets.projectReferenceTargetsOrDefaultTargets→ the entry-point PRT value if set, else.defaultIf marker expansion produces a target that also appears literally in the same
Targetsmetadata — or if the materializedTargetsmetadata itself contains duplicates — the per-edge target list grows. The downstreamProjectInterpretation.TargetsToPropagate.FromProjectAndEntryTargetscross-products each entry-target against every matching PRT, so an N-duplicate entry list at one hop produces ~N² propagations at the next. Across BFS depth D this becomes ~N^D, burning gigabytes and minutes on graphs of only a few dozen nodes.The reproducer that surfaced this in a large internal codebase was two SDK targets files both prepending to the same property and both emitting
<ProjectReferenceTargets Include="Build" Targets="$(...)"/>. The second emitter snapshotted a property value that already contained the marker, so the materializedTargetshad the marker more than once and the explosion took off from hop 1. The authoring-side double-emission could also be fixed where it originates, but this PR is the engine-side guard rail so any future recurrence (or any literalTargets="Build;Build"-style authoring quirk) can't take the graph down.Changes Made
The fix (
src/Build/Graph/ProjectGraph.cs)ExpandDefaultTargetsnow dedupes its output unconditionally, with a hybrid fast/slow shape:ExpandDefaultTargetsSlow): oneHashSet<string>sized to count plus a lazily-allocatedList<string>buffer. Single pass, expands markers in place, dedupes via the set. Used for n > 8 or when the fast scan flagged anything.Dedup is
OrdinalIgnoreCase, first-occurrence wins.Why this is behavior-preserving (no ChangeWave)
GetTargetListsalready collapses each per-node final target list toOrdinalIgnoreCase-unique entries viaImmutableHashSet+ImmutableList.AddRange, andExpandDefaultTargetsis only called fromGetTargetLists. Adding inner dedup only changes BFS internal state (encounteredEdgesset membership, per-edgerequestedTargetslist size) — never the public return value. No new public API, no new warnings or errors, no consumer-observable change.Supporting refactor (justification below)
The dedup fix on its own is small. The PR also takes the BFS hot path off
ImmutableList<string>, which is the dominant cost once the explosion is gone:ProjectGraphBuildRequest.RequestedTargets:ImmutableList<string>→string[].Equals/GetHashCodeuse.Length+ indexer, no virtual dispatch or AVL traversal.ProjectGraph.cs:string[]cascade replacesImmutableList<string>/IReadOnlyList<string>.ImmutableList<string>is kept at the publicGetTargetListsboundary and intargetLists[node], whereAddRangeactually derives each version from the prior.ProjectInterpretation.TargetsToPropagate: signature widened tostring[];_outerBuildTargets+_allTargets(twoImmutableList<TargetSpecification>) collapsed to a singleTargetSpecification[] _allTargets+int _outerBuildTargetCount— one allocation/copy per source list instead of three+two.Where(...).Select(...).ToArray()+AddRange→ directforeachover theSemiColonTokenizerstruct fromExpressionShredder.SplitSemiColonSeparatedList, appending via areflocal. Drops theWhereSelectArrayIteratorstate machine, an intermediateTargetSpecification[], and tokenizer boxing.GetApplicableTargetsForReferencereturnsstring[]directly, sized for the no-skip common case andArray.Resized only when something is skipped. Drops LINQ state machine +LargeArrayBuilderdoubling copies.Why drop
ImmutableList<T>here at all: the original draft usedImmutableList<string>.Builderfor the dedup buffer. In review it became clear that AVL-treeImmutableList<T>is materially more expensive per element thanList<T>/string[], and the structural-sharing benefit doesn't apply on this path — every per-edgeapplicableTargets/expandedTargetsis built fresh from rawProjectReferenceTargetsitems, never as an.Add/.Removederivative of a common ancestor, so the trees share zero internal nodes.Testing
New file:
src/Build.UnitTests/Graph/ProjectGraph_ExpandDefaultTargetsDedup_Tests.cs— 11 tests, covering:.projectReferenceTargetsOrDefaultTargetswith literal duplicates is deduped.Build;Build;Build) are deduped.DefaultTargets.GetTargetListssmoke at depth 12 with the duplicate-marker shape — confirms result size stays bounded.GetTargetListssanity at depth 6 for the common single-marker case.All 11 pass on
net10.0andnet48(22/22). The rest ofMicrosoft.Build.Graph.UnitTestsis unchanged by this PR; the 6 pre-existing failures (3 tests × 2 TFMs) are all[ActiveIssue("https://github.com/dotnet/msbuild/issues/4368")].Performance evidence (why the refactor scope is justified)
End-to-end
ProjectGraph.GetTargetLists(["Build"])via BenchmarkDotNet on .NET 10.0.8, X64 RyuJIT AVX2,--job short. The graph is built once in[GlobalSetup]so the measured op is purely the BFS hot path. Each project carries<ProjectReferenceTargets Include="Build" Targets=".projectReferenceTargetsOrDefaultTargets;GetNativeManifest;_GetCopyToOutputDirectoryItemsFromThisProject"/>to mirror the realistic shape produced byMicrosoft.Common.CurrentVersion.targets. V1 = upstreammainat the PR base; the same benchmark DLL is rebuilt against each engine.Every realistic shape is 19–51% faster and 24–31% lower allocation. The pathological shape — which is what would otherwise OOM/hang on a large graph — is 187× faster and 172× lower allocation at just 50 nodes; because the growth is geometric in BFS depth, the gap widens rapidly past that.
Benchmark source is kept out of this PR to keep the diff focused; happy to upstream it separately if maintainers want it in
ref/ordocumentation/.Notes
ChangeWaves.mdentry — there is no observable behavior change at theGetTargetListsboundary, only reduced internal work.Strings.resxchanges.ProjectGraph.cs+137/-33,ProjectInterpretation.cs+78/-33, new test file +268/-0.Targetsmetadata — is now bounded.