refactor(main): extract repository analyze/process services#132
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughThis PR refactors the main IPC handler in src/main/index.ts by extracting large inline repository analysis and processing logic into two new public services: analyzeRepository and processRepository. The changes consolidate duplicated logic, update related IPC types, and introduce new test coverage for the processing service. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Reviewer's GuideRefactors repository analysis and processing logic out of the main IPC handler into dedicated service modules, adds unit coverage for the new processing service, and wires the IPC layer to delegate to these services with injectable logging hooks. Sequence diagram for repo:analyze IPC delegating to repository analyzer servicesequenceDiagram
actor Renderer
participant IpcMain
participant RepositoryAnalyzer as RepositoryAnalyzerService
participant GitignoreParser
participant FileAnalyzer
participant TokenCounter
Renderer->>IpcMain: ipcMain.handle repo:analyze
IpcMain->>RepositoryAnalyzer: analyzeRepository(rootPath, configContent, selectedFiles, gitignoreParser, onWarn, onInfo)
RepositoryAnalyzer->>RepositoryAnalyzer: yaml.parse(configContent) -> ConfigObject
RepositoryAnalyzer->>TokenCounter: new TokenCounter()
RepositoryAnalyzer->>GitignoreParser: parseGitignore(rootPath)
GitignoreParser-->>RepositoryAnalyzer: GitignorePatterns
RepositoryAnalyzer->>FileAnalyzer: new FileAnalyzer(config, tokenCounter, gitignoreOptions)
loop for each selected file
RepositoryAnalyzer->>RepositoryAnalyzer: path.resolve(rootPath, filePath)
RepositoryAnalyzer->>RepositoryAnalyzer: isPathWithinRoot(rootPath, resolvedPath)?
alt outside root
RepositoryAnalyzer-->>IpcMain: onWarn Skipping file outside current root directory
else inside root
RepositoryAnalyzer->>RepositoryAnalyzer: getRelativePath(resolvedPath, rootPath)
RepositoryAnalyzer->>RepositoryAnalyzer: isBinaryFile(resolvedPath)?
alt binary file
RepositoryAnalyzer-->>IpcMain: onInfo Binary file detected
RepositoryAnalyzer->>RepositoryAnalyzer: push FileInfo isBinary true
else text file
RepositoryAnalyzer->>FileAnalyzer: shouldProcessFile(relativePath)?
alt should process
RepositoryAnalyzer->>FileAnalyzer: analyzeFile(resolvedPath)
FileAnalyzer-->>RepositoryAnalyzer: tokenCount | null
alt tokenCount not null
RepositoryAnalyzer->>RepositoryAnalyzer: push FileInfo tokens
end
end
end
end
end
RepositoryAnalyzer->>RepositoryAnalyzer: sort(filesInfo) by tokens desc
RepositoryAnalyzer-->>IpcMain: AnalyzeRepositoryResult
IpcMain-->>Renderer: filesInfo, totalTokens, skippedBinaryFiles
Sequence diagram for repo:process IPC delegating to repository processing servicesequenceDiagram
actor Renderer
participant IpcMain
participant RepositoryProcessing as RepositoryProcessingService
participant ContentProcessor
participant TokenCounter
Renderer->>IpcMain: ipcMain.handle repo:process
IpcMain->>RepositoryProcessing: processRepository(rootPath, filesInfo, treeView, options, onWarn, onInfo)
RepositoryProcessing->>TokenCounter: new TokenCounter()
RepositoryProcessing->>ContentProcessor: new ContentProcessor(tokenCounter)
RepositoryProcessing->>RepositoryProcessing: resolveRepositoryProcessingOptions(options)
RepositoryProcessing-->>IpcMain: onInfo Processing with options
RepositoryProcessing->>RepositoryProcessing: normalizedFilesInfo = filesInfo ?? []
RepositoryProcessing->>RepositoryProcessing: buildRepositoryHeader(processingOptions, treeView, normalizedFilesInfo)
loop for each fileInfo in normalizedFilesInfo
alt fileInfo.path is falsy
RepositoryProcessing-->>IpcMain: onWarn Skipping invalid file info entry
else valid path
RepositoryProcessing->>RepositoryProcessing: processRepositoryFile(rootPath, fileInfo, contentProcessor, processingOptions, onWarn)
alt processedFile is null
RepositoryProcessing->>RepositoryProcessing: skippedFiles++
else processedFile not null
RepositoryProcessing->>RepositoryProcessing: append processedFile.content
RepositoryProcessing->>RepositoryProcessing: totalTokens += tokenCount
RepositoryProcessing->>RepositoryProcessing: processedFiles++
end
end
end
RepositoryProcessing->>RepositoryProcessing: buildRepositoryFooter(processingOptions, summary)
RepositoryProcessing-->>IpcMain: ProcessRepositoryResult
IpcMain-->>Renderer: content, exportFormat, totals, filesInfo
Updated class diagram for repository analyzer and processing servicesclassDiagram
class RepositoryAnalyzerService {
<<service>>
+AnalyzeRepositoryResult analyzeRepository(rootPath, configContent, selectedFiles, gitignoreParser, onWarn, onInfo)
}
class GitignoreParserLike {
<<interface>>
+GitignorePatterns parseGitignore(rootPath)
}
class FileAnalyzer {
+FileAnalyzer(config, tokenCounter, options)
+boolean shouldProcessFile(relativePath)
+number~null analyzeFile(resolvedFilePath)
}
class TokenCounter {
+TokenCounter()
+number countTokens(content)
}
class RepositoryProcessingService {
<<service>>
+ProcessRepositoryResult processRepository(rootPath, filesInfo, treeView, options, onWarn, onInfo)
-RepositoryProcessingOptions resolveRepositoryProcessingOptions(options)
-string buildRepositoryHeader(processingOptions, treeView, filesInfo)
-ProcessedRepositoryFileResult processRepositoryFile(rootPath, fileInfo, contentProcessor, processingOptions, onWarn)
-string buildRepositoryFooter(processingOptions, summary)
}
class ContentProcessor {
+ContentProcessor(tokenCounter)
+string~null processFile(fullPath, relativePath, options)
}
class RepositoryProcessingOptions {
+boolean showTokenCount
+boolean includeTreeView
+string exportFormat
}
class FileInfo {
+string path
+number tokens
+boolean isBinary
}
class AnalyzeRepositoryResult {
+FileInfo[] filesInfo
+number totalTokens
+number skippedBinaryFiles
}
class ProcessRepositoryResult {
+string content
+string exportFormat
+number totalTokens
+number processedFiles
+number skippedFiles
+FileInfo[] filesInfo
}
RepositoryAnalyzerService --> GitignoreParserLike : uses
RepositoryAnalyzerService --> FileAnalyzer : creates
RepositoryAnalyzerService --> TokenCounter : creates
RepositoryAnalyzerService --> AnalyzeRepositoryResult : returns
RepositoryAnalyzerService --> FileInfo : populates
RepositoryProcessingService --> ContentProcessor : creates
RepositoryProcessingService --> TokenCounter : creates
RepositoryProcessingService --> RepositoryProcessingOptions : configures
RepositoryProcessingService --> FileInfo : consumes
RepositoryProcessingService --> ProcessRepositoryResult : returns
ContentProcessor --> TokenCounter : uses
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Review Summary by QodoExtract repository analyze and process services
WalkthroughsDescription• Extract repository analysis logic into dedicated service module • Extract repository processing/output assembly into dedicated service module • Simplify IPC handlers by delegating to extracted services • Add comprehensive unit test coverage for repository processing service Diagramflowchart LR
A["IPC Handler<br/>repo:analyze"] -->|delegates| B["analyzeRepository<br/>service"]
C["IPC Handler<br/>repo:process"] -->|delegates| D["processRepository<br/>service"]
B -->|uses| E["FileAnalyzer<br/>TokenCounter"]
D -->|uses| F["ContentProcessor<br/>Export Format"]
G["Unit Tests"] -->|covers| D
File Changes1. src/main/index.ts
|
Summary of ChangesHello @Mehdi-Bl, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the main Electron process by extracting complex repository analysis and processing functionalities into separate, modular services. This improves the maintainability and testability of the codebase by adhering to the single responsibility principle, making the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Hey - I've found 1 issue
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `src/main/services/repository-analyzer.ts:30-32` </location>
<code_context>
+ onInfo?: (message: string) => void;
+};
+
+const EMPTY_GITIGNORE_PATTERNS: GitignorePatterns = {
+ excludePatterns: [],
+ includePatterns: [],
+};
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Using a shared EMPTY_GITIGNORE_PATTERNS object may be risky if FileAnalyzer mutates it
Previously each call received a new `gitignorePatterns` object; now calls share `EMPTY_GITIGNORE_PATTERNS` when `use_gitignore === false`. If `FileAnalyzer` or downstream code mutates this object, state will leak between invocations. To keep the old behavior and avoid hidden coupling, either clone `EMPTY_GITIGNORE_PATTERNS` per call or construct a new object instead of reusing the shared one.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This is a solid refactoring that significantly improves the structure and maintainability of the main process logic. By extracting the repository analysis and processing concerns into dedicated services, the IPC handlers in src/main/index.ts have become much cleaner and easier to understand. The introduction of unit tests for the new repository-processing service is also a great addition.
I've found one high-severity bug in the new tree-generation logic within repository-processing.ts that can lead to incorrect tree view output in certain scenarios. I've provided a detailed comment with a suggested fix. I've also suggested an additional test case to cover this scenario and prevent future regressions.
Overall, this is a high-quality pull request that moves the codebase in the right direction. Once the identified issue is addressed, this will be ready to merge.
| const upsertPathPart = (tree: PathTree, part: string, isLeaf: boolean): PathTree | null => { | ||
| const existing = tree[part]; | ||
| if (existing !== undefined) { | ||
| return existing; | ||
| } | ||
|
|
||
| const nextValue = isLeaf ? null : {}; | ||
| tree[part] = nextValue; | ||
| return nextValue; | ||
| }; | ||
|
|
||
| const addFilePathToTree = (pathTree: PathTree, filePath: string): void => { | ||
| const parts = filePath.split('/'); | ||
| let currentLevel: PathTree = pathTree; | ||
|
|
||
| for (const [index, part] of parts.entries()) { | ||
| const isLeaf = index === parts.length - 1; | ||
| const nextLevel = upsertPathPart(currentLevel, part, isLeaf); | ||
| if (isLeaf || nextLevel === null) { | ||
| continue; | ||
| } | ||
| currentLevel = nextLevel; | ||
| } | ||
| }; |
There was a problem hiding this comment.
The current implementation for building the path tree has a logic issue when handling file paths where one is a prefix of another (e.g., a file a/b and another file a/b/c.txt). Because the files are sorted alphabetically, a/b is processed first, marking b as a leaf node. When a/b/c.txt is processed, the logic fails to convert b into a directory and add c.txt underneath it, resulting in an incorrect tree structure.
I suggest updating both upsertPathPart and addFilePathToTree to correctly handle this scenario by allowing a leaf node to be promoted to a directory node if needed.
const upsertPathPart = (tree: PathTree, part: string, isLeaf: boolean): PathTree | null => {
const existing = tree[part];
if (existing === null && !isLeaf) {
// Path was a file, but now it's a directory path segment. Convert it.
tree[part] = {};
return tree[part] as PathTree;
}
if (existing !== undefined) {
return existing;
}
const nextValue = isLeaf ? null : {};
tree[part] = nextValue;
return nextValue;
};
const addFilePathToTree = (pathTree: PathTree, filePath: string): void => {
const parts = filePath.split('/');
let currentLevel: PathTree = pathTree;
for (const [index, part] of parts.entries()) {
const isLeaf = index === parts.length - 1;
const nextLevel = upsertPathPart(currentLevel, part, isLeaf);
if (isLeaf || nextLevel === null) {
// We've reached a leaf or a conflict we can't resolve. Stop descending.
break;
}
currentLevel = nextLevel;
}
};There was a problem hiding this comment.
Addressed in 6c42908. The tree builder now promotes an existing leaf node to a directory when a deeper segment is encountered, and traversal flow was updated to avoid misplaced descendants.
| }; | ||
| }; | ||
|
|
||
| describe('repository-processing service', () => { |
There was a problem hiding this comment.
These are great tests! To make them even more robust, consider adding a test case for the tree view generation that specifically handles paths where one is a prefix of another (e.g., files at a/b and a/b/c.txt). This would help catch the edge case in the tree building logic I mentioned in another comment and prevent future regressions.
There was a problem hiding this comment.
Added in 6c42908: a regression test (keeps nested tree structure stable when a path prefix appears as a file entry) in tests/unit/main/repository-processing.test.ts.
Dependency Review✅ No vulnerabilities or license issues found.Scanned FilesNone |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e6482fb1c4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (existing !== undefined) { | ||
| return existing; |
There was a problem hiding this comment.
Handle leaf-to-branch upgrades in path tree
When includeTreeView is enabled, this helper keeps an existing null node unchanged and returns it, so a later nested path cannot descend into that segment. With filesInfo like src and src/index.js, the nested file is attached at the wrong level and the rendered tree is incorrect. The previous logic upgraded a leaf to an object in this case; please handle existing === null && !isLeaf so prefix-collision inputs still produce a valid hierarchy.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in 239e6a2. Tree nodes now support leaf-to-branch promotion and the regression is covered by keeps nested tree structure stable when a path prefix appears as a file entry.
Code Review by Qodo
1. Prototype-pollutable path tree
|
| const upsertPathPart = (tree: PathTree, part: string, isLeaf: boolean): PathTree | null => { | ||
| const existing = tree[part]; | ||
| if (existing !== undefined) { | ||
| return existing; | ||
| } | ||
|
|
||
| const nextValue = isLeaf ? null : {}; | ||
| tree[part] = nextValue; | ||
| return nextValue; | ||
| }; | ||
|
|
||
| const addFilePathToTree = (pathTree: PathTree, filePath: string): void => { | ||
| const parts = filePath.split('/'); | ||
| let currentLevel: PathTree = pathTree; | ||
|
|
||
| for (const [index, part] of parts.entries()) { | ||
| const isLeaf = index === parts.length - 1; | ||
| const nextLevel = upsertPathPart(currentLevel, part, isLeaf); | ||
| if (isLeaf || nextLevel === null) { | ||
| continue; | ||
| } | ||
| currentLevel = nextLevel; | ||
| } |
There was a problem hiding this comment.
1. Prototype-pollutable path tree 🐞 Bug ⛨ Security
repository-processing builds a tree using a plain object keyed by path segments derived from repository file/folder names; segments like __proto__ / constructor can mutate prototypes or traverse into Object.prototype, leading to crashes or unpredictable output when rendering the tree view. This can be triggered by opening a repo containing such folder/file names.
Agent Prompt
### Issue description
`repository-processing` constructs a directory tree using `{}` nodes and assigns child nodes via `tree[part]`, where `part` comes from repository path segments. Path segments like `__proto__` / `constructor` can interact with JavaScript object prototype semantics, causing prototype pollution or traversal into `Object.prototype`, which can crash or corrupt tree rendering.
### Issue Context
File paths originate from filesystem directory entries and are only normalized for slash direction (not sanitized). This means a repository can legitimately contain a folder named `__proto__`.
### Fix Focus Areas
- src/main/services/repository-processing.ts[37-110]
- src/utils/filter-utils.ts[10-13]
### Suggested approach
- Prefer `Map<string, PathTreeNode>` for nodes, or ensure every node is created with `Object.create(null)`.
- Optionally add a small guard that rejects or escapes path segments equal to `__proto__`, `prototype`, or `constructor` before insertion.
- Add a unit test that includes a `filesInfo` entry like `__proto__/x.txt` and asserts tree rendering is stable and does not pollute global objects.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
There was a problem hiding this comment.
Addressed in 239e6a2. Path-tree nodes now use Object.create(null) via createPathTreeNode, including promotions/insertions, and a regression test covers special segments (__proto__, constructor) to ensure no prototype mutation.
|



Summary
Validation
Notes
Summary by Sourcery
Extract repository analysis and processing logic into dedicated services and wire them into the main IPC handlers.
Enhancements:
Documentation:
Tests:
Summary by CodeRabbit
Refactor
Tests