Proposal: Include a Hibernate-Based DFS Implementation in the JGit Distribution #251

carstenartur · 2026-03-08T07:56:16Z

carstenartur
Mar 8, 2026

Hello JGit Community,

JGit already provides the DfsRepository / DfsObjDatabase abstraction for pluggable storage backends. I have built a full database-backed implementation of this API using Hibernate ORM and Hibernate Search, and I would like to discuss including it as an optional module in the JGit distribution.

What This Is

A complete DFS backend that stores Git objects, packs, refs, and reflogs in a relational database:

HibernateRepository extends DfsRepository
HibernateObjDatabase extends DfsObjDatabase — stores pack data as BLOBs, keyed by pack name and extension
HibernateRefDatabase, HibernateReflogWriter, HibernateReflogReader — database-backed ref and reflog storage
HibernateRepositoryBuilder extends DfsRepositoryBuilder
JPA entity mappings for GitObjectEntity, GitRefEntity, GitPackEntity, GitReflogEntity, GitCommitIndex, JavaBlobIndex, FilePathHistory

It is actively used in my Sandbox project as an Eclipse plugin — see the sandbox-jgit-storage-hibernate module.

Why Include It in the JGit Distribution

Atomicity and Transactional Guarantees
Database transactions provide true ACID guarantees for pack commits, ref updates, and rollbacks — something inherently difficult to achieve with filesystem-based storage.
Packaging and Dependency Management
Maintaining this implementation externally means dealing with version compatibility against JGit internals, OSGi/p2 packaging challenges, and duplicated build infrastructure. Including it as an optional module in JGit would eliminate this burden and ensure it stays in sync with API changes.
A Real-World DFS Implementation Beyond InMemoryRepository
JGit ships InMemoryRepository as its only DFS implementation. A Hibernate-based module would be the natural persistent counterpart — usable with any JDBC-compatible database (H2 for testing/embedded, PostgreSQL for production) — and would serve as a reference implementation that helps validate and harden the DFS API.

Features This Has Enabled

The database layer has made it possible to build features that would be extremely difficult or impossible with filesystem-based storage:

ECJ-Based Java Source Tokenizer for Lucene
An EcjTokenizer that uses Eclipse's own Java compiler scanner to produce lexically correct Java tokens for Lucene indexing. Combined with an EcjTokenFilter that applies CamelCase splitting, string literal stripping, and token-type-aware processing. This gives true language-aware full-text search over Java source code stored in Git — something fundamentally different from plain text search.
AST-Based Structural Indexing
A JavaBlobExtractor and JavaFileStrategy that parse Java source files using JDT's ASTParser and extract structural metadata — package names, declared types, methods, fields, supertypes, interfaces, imports — all indexed and queryable via Hibernate Search/Lucene.
Semantic and Hybrid Code Search
A SemanticSearchClient that supports natural language queries over the indexed repository content — semantic search (vector-based), hybrid search (full-text + semantic), type search, symbol search, commit message search, and changed-path search. This allows asking questions like "find all implementations of a caching strategy" rather than just grep for a string.
LLM-Powered Commit Analysis
A CommitAnalysisJob that feeds commit diffs to an LLM service to generate DSL rules and semantic evaluations of code changes. The database layer makes it practical to store, index, and query both the raw repository data and the AI-generated analysis results together.
Structured Querying via GitDatabaseQueryService
SQL/HQL queries over commits, trees, blobs, and refs — e.g., "find all commits by author X touching files in path Y between dates A and B" — without walking the entire object graph.

None of these features require changes to the DFS API itself — they all build on top of the Hibernate storage layer. But they demonstrate why having this implementation inside the JGit distribution (rather than maintained externally) would benefit the broader ecosystem.

References

JGit Fork (persistence layer work):
👉 https://github.com/carstenartur/jgit
Hibernate Storage Module (in Sandbox):
👉 sandbox-jgit-storage-hibernate
Sandbox Project (Eclipse product with all integrations):
👉 https://github.com/carstenartur/sandbox
Related EGit Discussion — allowing plugins to switch the persistence layer:
👉 eclipse-egit/egit#145

What I Would Like to Discuss

Is there interest in including this as an optional module in the JGit distribution?
What concerns are there around additional dependencies (Hibernate ORM, Hibernate Search, Lucene)?
Would the team be open to reviewing the implementation as a starting point?
Are there DFS API improvements that should be addressed to better support database backends?

Thank you for your time and feedback!

Best regards,
Carsten Hammer
GitHub: @carstenartur

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Include a Hibernate-Based DFS Implementation in the JGit Distribution #251

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Proposal: Include a Hibernate-Based DFS Implementation in the JGit Distribution #251

Uh oh!

carstenartur Mar 8, 2026

What This Is

Why Include It in the JGit Distribution

Features This Has Enabled

References

What I Would Like to Discuss

Replies: 0 comments

carstenartur
Mar 8, 2026