refactor(indexing): eliminate Tier 1, add hash gate & cooldown, rename to AST/Embed by SerPeter · Pull Request #9 · SerPeter/code-atlas

SerPeter · 2026-03-03T22:28:04Z

Summary

Remove Tier 1 pass-through consumer — Tier1GraphConsumer converted FileChanged → ASTDirty with zero value-add. Pipeline simplified from 3-tier to 2-stage: Watcher → file-changed → AST → embed-dirty → Embed
Add file hash gate — SHA-256 content hash (with whitespace normalization) skips unchanged files before parsing. Stored on Module/Package nodes in Memgraph via batch read/write.
Add per-file cooldown — Throttles rapid re-processing of the same file in daemon mode (default 10s). Deferred events are re-published after cooldown expires. Disabled in CLI reindex mode.
Rename Tier 2/3 → AST/Embed — Classes, consumer groups, log prefixes, span names, variables, and all documentation updated across 19 files.

Commits

refactor(indexing): remove Tier 1 consumer, simplify to two-tier pipeline
feat(indexing): add file hash gate to skip unchanged files
feat(indexing): add per-file cooldown for daemon mode
test(indexing): add integration tests for two-tier pipeline, hash gate, and cooldown
refactor(indexing): rename Tier 2/3 to AST/Embed stage across code and docs

Test plan

Unit tests pass (463 passed)
Integration tests pass (7 passed, including 5 new tests against live Memgraph + Valkey)
Ruff lint + format clean
ty check clean (2 pre-existing warnings only)

…line Tier 1 was a pure pass-through converting FileChanged → ASTDirty with zero value-add. Remove it to reduce latency, eliminate an extra Valkey stream hop, and simplify the architecture. Before: Watcher → file-changed → Tier1 → ast-dirty → Tier2 → Tier3 After: Watcher → file-changed → Tier2 → embed-dirty → Tier3

Compute SHA-256 of file contents before parsing and compare against stored hashes in Memgraph. Files with matching hashes are skipped entirely, avoiding unnecessary AST parsing and graph writes. - strip_whitespace mode normalizes formatting before hashing so formatter-only changes (e.g. ruff format) are ignored - Hash gate is bypassed for deleted files and full reindexes (where stored hashes are empty) - Pre-read file bytes are passed to the parser to avoid double I/O

…e, and cooldown

…d docs

Copilot

Pull request overview

Refactors the indexing pipeline from three tiers to two stages (AST → Embed), adding a content-hash gate to skip unchanged files and a per-file cooldown to throttle rapid reprocessing in daemon mode.

Changes:

Remove the Tier 1 pass-through consumer and wire FileChanged directly into the AST stage.
Add a SHA-256-based file hash gate (with optional whitespace normalization) persisted on Module/Package nodes in Memgraph.
Add per-file cooldown deferral/re-publish logic for daemon mode; rename Tier2/3 terminology to AST/Embed across code, tests, and docs.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/unit/search/test_embeddings.py	Updates embed consumer naming and doc-section strings for AST/Embed rename.
tests/integration/indexing/test_consumers.py	Adds integration tests for AST consumer, hash gate, and cooldown behavior.
tests/conftest.py	Renames “Tier3” wording to “embed stage” in NO_EMBED docstring.
src/code_atlas/settings.py	Adds `index.file_hash_gate`, `index.strip_whitespace`, and `watcher.cooldown_s` settings; updates embed settings wording.
src/code_atlas/search/embeddings.py	Updates doc-section breadcrumb example text for “AST Stage”.
src/code_atlas/indexing/orchestrator.py	Removes Tier1/Tier2 wiring; runs AST + optional Embed consumers; updates drain/publish logic.
src/code_atlas/indexing/daemon.py	Starts AST/Embed consumers and passes watcher cooldown into AST consumer.
src/code_atlas/indexing/consumers.py	Deletes Tier1; implements AST consumer hash gate + cooldown; renames Tier3 to Embed consumer.
src/code_atlas/indexing/init.py	Re-exports `ASTConsumer`/`EmbedConsumer` instead of Tier1/2/3.
src/code_atlas/graph/client.py	Adds batch read/write helpers for persisting file hashes on Module/Package nodes.
src/code_atlas/events.py	Removes `ASTDirty` event/topic; updates event union and EmbedDirty docstring.
scripts/profile_index.py	Renames profiling spans/labels from tier2/tier3 to ast/embed.
docs/guides/repo-guidelines.md	Updates example consumer name to `ASTConsumer`.
docs/benchmarks.md	Renames benchmark stage labels to AST/Embed.
docs/architecture.md	Updates pipeline diagrams and narrative to the two-stage AST/Embed pipeline.
docs/adr/0006-pure-python-tree-sitter.md	Updates ADR wording to AST consumer/stage naming.
docs/adr/0005-deployment-process-model.md	Updates deployment/process model diagrams and wording to AST/Embed.
docs/adr/0004-event-driven-tiered-pipeline.md	Updates ADR to describe the two-stage pipeline (FileChanged → AST → EmbedDirty → Embed).
CLAUDE.md	Updates repository architecture and event model documentation for two-stage pipeline.
CHANGELOG.md	Updates historical changelog wording to AST/Embed terminology.
.gitattributes	Adds repo-wide gitattributes for text/binary handling and LFS patterns.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/code_atlas/indexing/consumers.py

src/code_atlas/graph/client.py

tests/integration/indexing/test_consumers.py

src/code_atlas/indexing/consumers.py

SerPeter added 5 commits March 3, 2026 20:22

feat(indexing): add per-file cooldown for daemon mode

533ac73

test(indexing): add integration tests for two-tier pipeline, hash gat…

b7c9db3

…e, and cooldown

refactor(indexing): rename Tier 2/3 to AST/Embed stage across code an…

168c9eb

…d docs

Copilot AI review requested due to automatic review settings March 3, 2026 22:28

Copilot started reviewing on behalf of SerPeter March 3, 2026 22:28 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

fix(indexing): address PR feedback — rstrip, contract fix, public stats

50101e0

SerPeter merged commit 154c283 into main Mar 3, 2026
7 checks passed

SerPeter deleted the refactor/two-tier-pipeline branch March 3, 2026 23:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(indexing): eliminate Tier 1, add hash gate & cooldown, rename to AST/Embed#9

refactor(indexing): eliminate Tier 1, add hash gate & cooldown, rename to AST/Embed#9
SerPeter merged 6 commits intomainfrom
refactor/two-tier-pipeline

SerPeter commented Mar 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

SerPeter commented Mar 3, 2026

Summary

Commits

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants