SerPeter · SerPeter · Mar 3, 2026 · Mar 3, 2026 · Mar 3, 2026 · Mar 3, 2026
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1,84 @@
+# Default
+# ==================
+*                   text=auto eol=lf
+
+# Python Source files
+# =================
+*.pxd               text diff=python
+*.py                text diff=python
+*.py3               text diff=python
+*.pyw               text diff=python
+*.pyx               text diff=python
+*.pyz               text diff=python
+*.pyi               text diff=python
+
+# Python Binary files
+# =================
+*.db                binary
+*.p                 binary
+*.pkl               binary
+*.pickle            binary
+*.pyc               binary export-ignore
+*.pyo               binary export-ignore
+*.pyd               binary
+
+# Jupyter notebook
+# =================
+*.ipynb             text
+
+# ML models
+# =================
+*.h5                filter=lfs diff=lfs merge=lfs -text
+*.onnx              filter=lfs diff=lfs merge=lfs -text
+*.model             filter=lfs diff=lfs merge=lfs -text
+*.msgpack           filter=lfs diff=lfs merge=lfs -text
+*.pb                filter=lfs diff=lfs merge=lfs -text
+*.pt                filter=lfs diff=lfs merge=lfs -text
+*.pth               filter=lfs diff=lfs merge=lfs -text
+pytorch_model.bin   filter=lfs diff=lfs merge=lfs -text
+tokenizer.json      filter=lfs diff=lfs merge=lfs -text
+unigram.json        filter=lfs diff=lfs merge=lfs -text
+
+# Data files
+# =================
+*.csv               filter=lfs diff=lfs merge=lfs -text
+*.tsv               filter=lfs diff=lfs merge=lfs -text
+*.parquet           filter=lfs diff=lfs merge=lfs
+
+# Presentation files
+# =================
+*.pptx              filter=lfs diff=lfs merge=lfs -text
+*.word              filter=lfs diff=lfs merge=lfs -text
+*.xlsx              filter=lfs diff=lfs merge=lfs -text
+*.xls               filter=lfs diff=lfs merge=lfs -text
+*.pdf               filter=lfs diff=lfs merge=lfs -text
+
+# Archives
+# =================
+*.7z                filter=lfs diff=lfs merge=lfs -text
+*.br                filter=lfs diff=lfs merge=lfs -text
+*.gz                filter=lfs diff=lfs merge=lfs -text
+*.tar               filter=lfs diff=lfs merge=lfs -text
+*.tgz               filter=lfs diff=lfs merge=lfs -text
+*.tar.gz            filter=lfs diff=lfs merge=lfs -text
+*.zip               filter=lfs diff=lfs merge=lfs -text
+*.rar               filter=lfs diff=lfs merge=lfs -text
+
+# Image files
+# =================
+*.jpg           filter=lfs diff=lfs merge=lfs -text
+*.jpeg          filter=lfs diff=lfs merge=lfs -text
+*.png           filter=lfs diff=lfs merge=lfs -text
+*.gif           filter=lfs diff=lfs merge=lfs -text
+*.webp          filter=lfs diff=lfs merge=lfs -text
+*.bmp           filter=lfs diff=lfs merge=lfs -text
+*.svg           filter=lfs diff=lfs merge=lfs -text
+*.tiff          filter=lfs diff=lfs merge=lfs -text
+
+# Other
+# =================
+# Windows - keep CRLF
+*.exe           filter=lfs diff=lfs merge=lfs -text
+*.bat            text eol=crlf
+*.cmd            text eol=crlf
+*.ps1            text eol=crlf
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -21,7 +21,7 @@
 - **ci**: Resolve ty check failures with --all-extras in CI
   ([`3f74816`](https://github.com/SerPeter/code-atlas/commit/3f7481635091d2d676aed75c3fbcaa5db4332242))
 
-- **consumers**: Group batches by project in Tier1/Tier2
+- **consumers**: Group batches by project in AST/Embed consumers
   ([#2](https://github.com/SerPeter/code-atlas/pull/2),
   [`5107b24`](https://github.com/SerPeter/code-atlas/commit/5107b24a7dfbcb44cadc7917f632ae6a9743c057))
 
@@ -190,7 +190,7 @@
 - **docs**: Add markdown parser with tree-sitter-markdown
   ([`e8d372c`](https://github.com/SerPeter/code-atlas/commit/e8d372c162652d6d73d1f66da5e14a61fcb2136a))
 
-- **embeddings**: Add EmbedClient with litellm routing and Tier 3 pipeline
+- **embeddings**: Add EmbedClient with litellm routing and embed pipeline
   ([`ad7c972`](https://github.com/SerPeter/code-atlas/commit/ad7c9726f2e48fdb8746b50547089c5c483bcb75))
 
 - **embeddings**: Add three-tier embedding cache with Valkey backend
@@ -241,7 +241,7 @@
 - **naming**: Worktree-aware naming and monorepo sub-project prefixing
   ([`2acdfb3`](https://github.com/SerPeter/code-atlas/commit/2acdfb33ba4b486f966272a01cf8a37f670661f6))
 
-- **parser**: Add py-tree-sitter parser, implement Tier 2 pipeline, drop Rust
+- **parser**: Add py-tree-sitter parser, implement AST pipeline, drop Rust
   ([`d56e7d2`](https://github.com/SerPeter/code-atlas/commit/d56e7d2a686ec279a52d85bbc4903f4d85f51a4e))
 
 - **parsing**: Add multi-language support (10 languages, 7 modules)

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -51,7 +51,7 @@ src/code_atlas/
 ├── __init__.py          # __version__ only
 ├── schema.py            # Graph schema (labels, relationships, DDL generators)
 ├── settings.py          # Pydantic configuration (atlas.toml + env vars)
-├── events.py            # Event types (FileChanged, ASTDirty, EmbedDirty) + Valkey Streams EventBus
+├── events.py            # Event types (FileChanged, EmbedDirty) + Valkey Streams EventBus
 ├── telemetry.py         # OpenTelemetry integration
 ├── cli.py               # Typer CLI entrypoint (index, search, status, mcp, daemon commands)
 │
@@ -69,7 +69,7 @@ src/code_atlas/
 │
 ├── indexing/
 │   ├── orchestrator.py  # Full-index, monorepo detection, staleness checking
-│   ├── consumers.py     # Tier 1/2/3 event consumers (batch-pull pattern)
+│   ├── consumers.py     # AST + Embed event consumers (batch-pull pattern)
 │   ├── watcher.py       # Filesystem watcher (watchfiles + hybrid debounce)
 │   └── daemon.py        # Daemon lifecycle manager (watcher + pipeline)
 │
@@ -78,13 +78,13 @@ src/code_atlas/
     └── health.py        # Infrastructure health checks + diagnostics
 ```
 
-**Event Pipeline:** File Watcher → Valkey Streams → Tier 1 (graph metadata) → Tier 2 (AST diff + gate) → Tier 3 (embeddings) → Memgraph
+**Event Pipeline:** File Watcher → Valkey Streams → AST stage (hash gate + parse + diff) → Embed stage (embeddings) → Memgraph
 
 **Query Pipeline:** MCP Server → Query Router → [Graph Search | Vector Search | BM25 Search] → RRF Fusion → Results
 
 **Deployment:** Daemon (`atlas daemon start`) for indexing + MCP (`atlas mcp`) per agent session, decoupled via Valkey + Memgraph
 
-**Event model:** Events are atomic — one logical change per event (one file per ASTDirty, one entity per EmbedDirty). Never bundle lists of work items into a single event; use `EventBus.publish_many()` for network-efficient batch publishing. The consumer's `max_batch_size` must directly control work volume, not just message count.
+**Event model:** Events are atomic — one logical change per event (one file per FileChanged, one entity per EmbedDirty). Never bundle lists of work items into a single event; use `EventBus.publish_many()` for network-efficient batch publishing. The consumer's `max_batch_size` must directly control work volume, not just message count.
 
 **Infrastructure:** Memgraph (graph DB, port 7687), TEI (embeddings, port 8080), Valkey (event bus, port 6379)
 

diff --git a/docs/adr/0004-event-driven-tiered-pipeline.md b/docs/adr/0004-event-driven-tiered-pipeline.md
@@ -45,53 +45,50 @@ Redis Streams provide the pub/sub backbone with consumer groups:
 Typed frozen dataclasses with JSON serialization for Redis transport:
 
 - `FileChanged(path, change_type, timestamp)` — published by file watcher
-- `ASTDirty(paths, batch_id)` — published by Tier 1
-- `EmbedDirty(entities: list[EntityRef], significance, batch_id)` — published by Tier 2
+- `EmbedDirty(entities: list[EntityRef], significance, batch_id)` — published by AST stage
 
-### Three-Stream Pipeline
+### Two-Stage Pipeline
 
 ```
-                     atlas:file-changed        atlas:ast-dirty         atlas:embed-dirty
-                          stream                   stream                   stream
-                            │                        │                        │
-                     ┌──────▼───────┐         ┌──────▼───────┐        ┌──────▼───────┐
-  File Watcher ────► │   Tier 1     │ ──────► │   Tier 2     │ ─gate─►│   Tier 3     │
-                     │ Graph Metadata│ always  │  AST Diff +  │ only   │  Embeddings  │
-                     │  (0.5s batch) │         │  Graph Update │ if sig │ (15s batch)  │
-                     └──────────────┘         │  (3s batch)  │ change └──────────────┘
-                                              └──────────────┘
+                     atlas:file-changed                                atlas:embed-dirty
+                          stream                                           stream
+                            │                                                │
+                     ┌──────▼───────┐                                ┌──────▼───────┐
+  File Watcher ────► │  AST Stage   │ ─────── significance gate ───► │ Embed Stage  │
+                     │ hash gate +  │  only if semantically changed  │  Embeddings  │
+                     │ parse + diff │                                │ (15s batch)  │
+                     │  (3s batch)  │                                └──────────────┘
+                     └──────────────┘
 ```
 
-Each tier pulls at its own pace via `XREADGROUP`, deduplicates within its batch window, and publishes downstream only if
-warranted.
+Each stage pulls at its own pace via `XREADGROUP`, deduplicates within its batch window, and publishes downstream only
+if warranted.
 
 ### Per-Consumer Batch Policy
 
-| Tier           | Window | Max Batch | Dedup Key             |
-| -------------- | ------ | --------- | --------------------- |
-| Tier 1 (Graph) | 0.5s   | 50        | File path             |
-| Tier 2 (AST)   | 3.0s   | 20        | File path             |
-| Tier 3 (Embed) | 15.0s  | 100       | Entity qualified name |
+| Stage | Window | Max Batch | Dedup Key             |
+| ----- | ------ | --------- | --------------------- |
+| AST   | 3.0s   | 30        | File path             |
+| Embed | 15.0s  | 100       | Entity qualified name |
 
 Hybrid batching: flush when count OR time threshold hit, whichever first. Same file changed 5× in window = 1 work item.
 
 ### Event Data Flow
 
 ```
-FileChanged                ASTDirty                     EmbedDirty
-┌─────────────┐            ┌──────────────────┐          ┌──────────────────────────┐
-│ path: str   │            │ paths: [str]     │          │ entities: [EntityRef]    │
-│ change_type │ ─Tier 1──► │ batch_id: str    │ ─Tier 2─►│ significance: str        │
-│ timestamp   │            └──────────────────┘   gate   │ batch_id: str            │
-└─────────────┘                                          └──────────────────────────┘
-                                                          EntityRef:
-                                                            qualified_name, node_type,
-                                                            file_path
+FileChanged                                         EmbedDirty
+┌─────────────┐                                     ┌──────────────────────────┐
+│ path: str   │                                     │ entity: EntityRef        │
+│ change_type │ ─── AST stage ── sig gate ────────► │ significance: str        │
+│ timestamp   │                                     └──────────────────────────┘
+└─────────────┘                                      EntityRef:
+                                                       qualified_name, node_type,
+                                                       file_path
 ```
 
-### Significance Gating (Tier 2 → 3)
+### Significance Gating (AST → Embed)
 
-Tier 2 evaluates whether a change is semantically significant enough to warrant re-embedding:
+The AST stage evaluates whether a change is semantically significant enough to warrant re-embedding:
 
 | Condition                   | Level    | Action              |
 | --------------------------- | -------- | ------------------- |
@@ -115,25 +112,25 @@ own retries through this mechanism, avoiding the need for a separate dead-letter
 - Cheap operations (staleness flags, graph metadata) are near-instant — MCP queries reflect changes within ~1s
 - Expensive operations (embeddings) only run when semantically justified — significant cost reduction
 - Decoupled stages can be developed, tested, and scaled independently
-- Batching per tier matches the cost profile of each operation
+- Batching per stage matches the cost profile of each operation
 - Multi-process from day one — no rewrite needed when scaling
 - Dual-use of Valkey for event bus + embedding cache
-- Natural extension point: new tiers or event types can be added without restructuring
+- Natural extension point: new stages or event types can be added without restructuring
 
 ### Negative
 
 - More architectural complexity than a simple "reindex everything on change"
 - Significance threshold heuristics need tuning and may produce false negatives (skipping re-embeds that should have
   happened)
-- Debugging event flow across tiers is harder than a linear pipeline
+- Debugging event flow across stages is harder than a linear pipeline
 - Additional infrastructure dependency (Valkey), though lightweight
 
 ### Risks
 
 - Threshold tuning: too aggressive = stale embeddings, too conservative = excessive TEI calls. Need observability on
   gate decisions.
-- Event ordering: if Tier 2 processes file A before file B, but B depends on A's entities, the diff may be incorrect.
-  Batch boundaries must align with dependency boundaries.
+- Event ordering: if the AST stage processes file A before file B, but B depends on A's entities, the diff may be
+  incorrect. Batch boundaries must align with dependency boundaries.
 - Complexity creep: the event bus must stay simple. If we find ourselves adding routing rules, dead-letter queues, or
   retry logic, we've gone too far.
 

diff --git a/docs/adr/0005-deployment-process-model.md b/docs/adr/0005-deployment-process-model.md
@@ -95,12 +95,12 @@ decoupled via Valkey Streams and Memgraph:
               └───────┬────────┘
                       ▼
               ┌────────────────┐
-              │ Create Consumer│    Idempotent XGROUP CREATE for all 3 streams
+              │ Create Consumer│    Idempotent XGROUP CREATE for pipeline streams
               │    Groups      │
               └───────┬────────┘
                       ▼
               ┌────────────────┐
-              │ Start Tier     │    asyncio.gather(tier1.run(), tier2.run(), tier3.run())
+              │ Start Pipeline │    asyncio.gather(ast.run(), embed.run())
               │  Consumers     │
               └───────┬────────┘
                       ▼
@@ -111,7 +111,7 @@ decoupled via Valkey Streams and Memgraph:
                       ▼
               ┌────────────────┐    Git-based fast path: diff stored_commit..HEAD
               │  Reconcile     │    Fallback: mtime comparison for non-git or rebases
-              │  (progressive) │    Enqueue stale files → Tier 1 → 2 → 3
+              │  (progressive) │    Enqueue stale files → AST → Embed
               └───────┬────────┘
                       ▼
               ┌────────────────┐
@@ -230,17 +230,11 @@ Queries:    Agent calls MCP tools ─────► Memgraph ◄──── Da
 ### Data Flow at Runtime
 
 ```
-  ┌──────────┐     FileChanged      ┌─────────┐       ASTDirty        ┌─────────┐
-  │   File   │ ──► events ────────► │ Tier 1  │ ──► events ─────────► │ Tier 2  │
-  │  Watcher │     (Valkey Stream)  │ (graph) │     (Valkey Stream)   │  (AST)  │
-  └──────────┘                      └─────────┘                       └────┬────┘
-                                                                      gate │
-                                                                  EmbedDirty│
-                                                                  (if sig)  │
-                                                                      ┌────▼────┐
-                                                                      │ Tier 3  │
-                                                                      │ (embed) │
-                                                                      └────┬────┘
+  ┌──────────┐     FileChanged      ┌───────────┐     EmbedDirty     ┌───────────┐
+  │   File   │ ──► events ────────► │ AST Stage │ ──► events ──────► │  Embed    │
+  │  Watcher │     (Valkey Stream)  │  (parse)  │    (Valkey Stream) │  Stage    │
+  └──────────┘                      └─────┬─────┘                    └─────┬────┘
+                                          │                                │
                                                                            │
                                        ┌──────────┐                       │
   Agent ◄──── MCP Server ◄──── reads   │ Memgraph │ ◄──── writes ────────┘

diff --git a/docs/adr/0006-pure-python-tree-sitter.md b/docs/adr/0006-pure-python-tree-sitter.md
@@ -17,29 +17,29 @@ actual cost breakdown:
 - **Subprocess overhead** (spawn, JSON serialization, IPC) exceeded the parse time itself for typical files
 - **Build complexity** required both `uv` and `cargo` toolchains in dev/CI/Docker
 - **Contributor friction** — Rust was isolated to one component, but still required a full toolchain install
-- **Parallelism** is already handled by the event bus (multiple Tier 2 consumer instances via Valkey Streams), not by
+- **Parallelism** is already handled by the event bus (multiple AST consumer instances via Valkey Streams), not by
   Rust's threading model
 
 Meanwhile, `py-tree-sitter` uses the exact same C parsing library (tree-sitter) via Python bindings. The grammar
 packages (`tree-sitter-python`, etc.) ship pre-compiled wheels — no compilation step needed.
 
 ## Decision
 
-Drop the Rust binary (`crates/atlas-parser`) and use **py-tree-sitter** called in-process within the Tier 2 pipeline
+Drop the Rust binary (`crates/atlas-parser`) and use **py-tree-sitter** called in-process within the AST pipeline
 consumer. The parser module lives at `src/code_atlas/parser.py`.
 
 ### Architecture
 
 ```
-Tier 2 Consumer
+AST Consumer
   └── parser.parse_file(path, source, project_name)
         └── tree-sitter C engine (via py-tree-sitter bindings)
               └── tree-sitter-python grammar (pre-compiled wheel)
 ```
 
 ### Parallelism Model
 
-Multiple Tier 2 consumer instances can run concurrently — each pulls from the `atlas:ast-dirty` Valkey Stream via its
+Multiple AST consumer instances can run concurrently — each pulls from the `atlas:file-changed` Valkey Stream via its
 own consumer group member. This gives process-level parallelism without the GIL concern, since each consumer is an
 independent process.