Optimize artifact lookup in DefaultProjectDependencyAnalyzer#270
Optimize artifact lookup in DefaultProjectDependencyAnalyzer#270sanjana2505006 wants to merge 2 commits intoapache:masterfrom
Conversation
|
Hello @slawekjaranowski, I’ve opened a PR with a small improvement I noticed. |
|
Hello @slawekjaranowski, just a gentle follow-up on this PR. Whenever you have time, could you please share your feedback or let me know if any further changes are needed from my side? Happy to update anything required. Thank you |
There was a problem hiding this comment.
Pull request overview
This PR optimizes core dependency analysis by replacing per-class linear artifact lookup with a precomputed class-to-artifact map, improving performance for projects with large dependency graphs.
Changes:
- Build a
Map<String, Artifact>index (buildClassToArtifactMap) for O(1) artifact lookup by class name. - Update used-artifact computation to use the new map and modernize collection population via
computeIfAbsent. - Minor robustness/style updates (scope constant, formatting).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return entry.getKey(); | ||
| Artifact artifact = entry.getKey(); | ||
| for (String className : entry.getValue()) { | ||
| classToArtifactMap.put(className, artifact); |
There was a problem hiding this comment.
buildClassToArtifactMap overwrites any existing mapping when the same class name appears in multiple artifacts. Previously the lookup returned the first matching artifact encountered in artifactClassMap iteration order; with put(...) the last artifact wins, which can change results (e.g., shaded/duplicate classes) and cause incorrect used/unused dependency reporting. Preserve the previous behavior (e.g., putIfAbsent to keep the first mapping) or explicitly detect duplicates and handle/report them deterministically.
| classToArtifactMap.put(className, artifact); | |
| classToArtifactMap.putIfAbsent(className, artifact); |
There was a problem hiding this comment.
I've updated it to putIfAbsent in the latest commit to preserve the original first-match-wins semantics. Thank you for flagging this, @slawekjaranowski!
| private static Map<Artifact, Set<DependencyUsage>> buildUsedArtifacts( | ||
| Map<Artifact, Set<String>> artifactClassMap, Set<DependencyUsage> dependencyClasses) { | ||
| Map<String, Artifact> classToArtifactMap, Set<DependencyUsage> dependencyClasses) { | ||
| Map<Artifact, Set<DependencyUsage>> usedArtifacts = new HashMap<>(); | ||
|
|
||
| for (DependencyUsage classUsage : dependencyClasses) { | ||
| Artifact artifact = findArtifactForClassName(artifactClassMap, classUsage.getDependencyClass()); | ||
| Artifact artifact = classToArtifactMap.get(classUsage.getDependencyClass()); | ||
|
|
||
| if (artifact != null && !includedInJDK(artifact)) { | ||
| Set<DependencyUsage> classesFromArtifact = usedArtifacts.get(artifact); | ||
| if (classesFromArtifact == null) { | ||
| classesFromArtifact = new HashSet<>(); | ||
| usedArtifacts.put(artifact, classesFromArtifact); | ||
| } | ||
| classesFromArtifact.add(classUsage); | ||
| usedArtifacts.computeIfAbsent(artifact, k -> new HashSet<>()).add(classUsage); | ||
| } | ||
| } |
There was a problem hiding this comment.
There are no unit tests covering the new class-to-artifact resolution path (including duplicate-class scenarios and ensuring the chosen artifact is deterministic). Adding focused tests around buildUsedArtifacts/buildClassToArtifactMap would help prevent regressions in dependency classification as this is core analysis logic.
There was a problem hiding this comment.
That's a fair point. I'll add focused unit tests covering the new class-to-artifact resolution path, including duplicate-class scenarios, to ensure the chosen artifact is deterministic. Will update it.
…mantics The original findArtifactForClassName returned the first matching artifact during iteration. Using put() overwrites previous entries, making the last artifact win instead. putIfAbsent preserves the first-match-wins behavior, preventing incorrect used/unused dependency reporting for shaded/duplicate classes.
While reviewing the core analysis logic, I noticed a performance bottleneck in$O(N)$ for every class usage), which could be a bottleneck in large projects with many dependencies. These changes switch to a
DefaultProjectDependencyAnalyzer. \n\nPreviously, it used a linear search to find which artifact a class belongs to (Map-based lookup ($O(1)$) and modernize the code usingcomputeIfAbsentand more robust scope checks.\n\nThis significantly improves the efficiency of the analysis process for projects with large dependency trees.