Skip to content

fix(git): improve diff handling and code quality in GitDataStore#15

Merged
marevol merged 1 commit intomasterfrom
fix/git-datastore-diff-handling
Mar 15, 2026
Merged

fix(git): improve diff handling and code quality in GitDataStore#15
marevol merged 1 commit intomasterfrom
fix/git-datastore-diff-handling

Conversation

@marevol
Copy link
Collaborator

@marevol marevol commented Mar 15, 2026

Summary

Fix incorrect path resolution for deleted files in diff processing and improve overall code quality in GitDataStore.

Changes Made

  • Fix DELETE diff handling: Use getOldPath() instead of getNewPath() for DiffEntry.ChangeType.DELETE entries, since deleted files have no new path
  • Support auto-adding PREV_COMMIT_ID: When PREV_COMMIT_ID is not present in handler parameters, append it automatically so incremental crawling works on first run
  • Scope git log query: Pass currentCommitId to git.log().add() in getRevCommit() to ensure log is scoped to the correct commit
  • Fix logging: Replace string concatenation with SLF4J parameterized logging ({}) in logger.warn() calls
  • Simplify temp directory creation: Use Files.createTempDirectory() instead of manual createTempFile + delete + mkdir sequence
  • Remove dead code: Remove unused isUpdateCommitId variable and redundant configMap.put(REPOSITORY, repository) call

Testing

  • Verified compilation succeeds with mvn compile

Breaking Changes

  • getRevCommit() method signature now includes throws IOException — subclasses overriding this method will need to update their signature

Additional Notes

  • The DELETE path fix is important for incremental crawling scenarios where files are removed between commits
  • The updateDataConfig change enables seamless first-run behavior without requiring manual PREV_COMMIT_ID configuration

- Use getOldPath() for DELETE entries instead of getNewPath()
- Support adding PREV_COMMIT_ID when not present in handler parameters
- Scope git log to current commit ID in getRevCommit()
- Replace string concatenation with parameterized logging
- Simplify temp directory creation using Files.createTempDirectory()
- Remove unused variable and redundant configMap.put()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@marevol marevol merged commit a43c8b7 into master Mar 15, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant