Skip to content
80 changes: 80 additions & 0 deletions .github/instructions/context.instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
---
applyTo: src/contextweaver/context/**
---

# Context Engine — Agent Instructions

Path-scoped guidance for `src/contextweaver/context/`. Read before modifying any file here.

## Pipeline stage ordering (must not be reordered)

`ContextManager.build()` executes exactly these 8 stages in order:

1. `generate_candidates` (`candidates.py`) — phase + policy filter over event log
2. `resolve_dependency_closure` (`candidates.py`) — pull in parent items via `parent_id`
3. `apply_sensitivity_filter` (`sensitivity.py`) — drop/redact by sensitivity level
4. `apply_firewall_to_batch` (`firewall.py`) — intercept raw `tool_result` text
5. `score_candidates` (`scoring.py`) — recency + Jaccard token overlap + kind priority + token penalty
6. `deduplicate_candidates` (`dedup.py`) — near-duplicate removal
7. `select_and_pack` (`selection.py`) — budget-aware token selection
8. `render_context` (`prompt.py`) — final prompt assembly

**Never reorder these stages.** Stages 2 and 4 have hard ordering constraints:
dependency closure must run before scoring (ancestors must be scoreable), and the
firewall must run before scoring (summaries, not raw text, must be scored).

## Firewall invariants

- Raw `tool_result` text **never** reaches the prompt. `apply_firewall` replaces
`item.text` with a summary and stores the raw bytes in `ArtifactStore`.
- The artifact handle is always `f"artifact:{item.id}"`.
- `item.artifact_ref` is set on every firewall-processed item.
- Do not bypass `apply_firewall_to_batch` or move raw text past stage 4.
- See `firewall.py` and `docs/agent-context/invariants.md` for full rationale.

## Async-first pattern

- The core pipeline runs in `_build()`, which is **synchronous**. Both `build()`
(async) and `build_sync()` (sync) delegate directly to `_build()`.
- `build()` is `async def` so callers can `await` it today; true async I/O will
be added if pipeline stages gain `await`-able steps in the future.
- Do not wrap `_build()` in `asyncio.run()` — `build_sync()` calls it directly.
- The same pattern applies to `_build_call_prompt()` → `build_call_prompt()` /
`build_call_prompt_sync()`.

## Dependency closure

- `resolve_dependency_closure()` (stage 2) walks `item.parent_id` chains and
adds missing ancestors to the candidate list.
- **Must run before scoring and deduplication.** Removing or skipping it produces
incoherent context: tool results appear without their tool calls.
- Closure count is tracked in `BuildStats.closures_added`.

## `manager.py` size and decomposition

- `manager.py` is currently ~876 lines, which exceeds the ≤300-line module
guideline. Decomposition is tracked in dgenio/contextweaver#73 and
dgenio/contextweaver#69.
- Do not add new methods to `ContextManager` until the decomposition is complete.
- Prefer adding new logic to an existing focused module (e.g. `candidates.py`,
`scoring.py`) and calling it from the manager.

## Sensitivity enforcement

- `sensitivity.py` is security-grade code. Changes require extra review scrutiny.
- Never weaken the default sensitivity floor or default drop action.
- See `.github/instructions/sensitivity.instructions.md` for full rules.

## Import rules

- Raise custom exceptions from `contextweaver.exceptions`, not bare `ValueError`
or `RuntimeError`.
- Text similarity utilities (`tokenize`, `jaccard`, `TfIdfScorer`) must be
imported from `contextweaver._utils` — never duplicated here.
- Use `from __future__ import annotations` in every source file.

## Related issues

- dgenio/contextweaver#73 — `manager.py` decomposition (large file)
- dgenio/contextweaver#69 — context pipeline refactor
- dgenio/contextweaver#63 — context firewall design
87 changes: 87 additions & 0 deletions .github/instructions/routing.instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
applyTo: src/contextweaver/routing/**
---

# Routing Engine — Agent Instructions

Path-scoped guidance for `src/contextweaver/routing/`. Read before modifying any file here.

## ChoiceGraph validation invariants (`graph.py`)

`ChoiceGraph._validate()` enforces four rules — all must hold at all times:

1. **Root exists** — `root_id` must be present in `_nodes`.
2. **Children resolve** — every edge destination must exist in `_nodes | _items`.
3. **No cycles** — `topological_order()` must succeed (raises `GraphBuildError` if not).
4. **All items reachable** — every item in `_items` must be reachable from `root_id`.

Cycle detection is **eager**: `add_edge()` calls `_creates_cycle()` immediately and
raises `GraphBuildError` before the edge is persisted. Do not bypass this check.

Serialisation via `from_dict()` rebuilds `children` / `child_types` from `_edges`
to guarantee consistency — never rely on serialised node metadata for child lists.

## TreeBuilder grouping strategies (`tree.py`)

`TreeBuilder.build()` tries three strategies in priority order:

1. **Namespace grouping** — group by first dot-segment of `item.namespace`;
requires ≥ 50 % of items to have a namespace and ≥ 2 groups.
2. **Jaccard clustering** — farthest-first seeding over `tokenize(_text_repr(item))`;
falls back if clustering yields < 2 groups.
3. **Alphabetical fallback** — sort by `item.name.lower()`, split into even chunks.

The builder is **deterministic**: it sorts items by `item.id` before processing.
Do not introduce randomness or non-deterministic ordering inside `_build_subtree`.

Every node has at most `max_children` children (default 20). Oversized groups are
coalesced via `_coalesce_groups()` or re-split before adding edges.

## Router beam-search constraints (`router.py`)

- **Deterministic tie-breaking**: children are sorted `(-score, id)` — descending
score, alphabetical ID for ties. Never change this sort key.
- `confidence_gap` (default 0.15) widens the beam by 1 when rank-1 and rank-2
scores differ by less than the gap. Must stay in `[0.0, 1.0]`.
- Results are ranked `(-score, item_id)` — same determinism guarantee end-to-end.
- The TF-IDF index is lazily built on first `route()` call via `_ensure_index()`.
Items are indexed by sorted `item_id` before non-item (navigation) nodes; do not change order.
- Fallback scoring (nodes not in TF-IDF index) uses `jaccard()` from
`contextweaver._utils` — never duplicate this logic here.

## Catalog invariants (`catalog.py`)

- Item IDs must be unique within a `Catalog`; `register()` raises `CatalogError`
on duplicates.
- `generate_sample_catalog(n, seed=42)` is seeded for reproducibility. The default
seed **must not change** — demos and tests depend on deterministic output.
- `Catalog.hydrate()` returns **shallow copies** of `args_schema`, `examples`, and
`constraints`. Callers must not mutate the returned dicts; use `copy.deepcopy`
if mutation is needed.

## ChoiceCard constraints (`cards.py`)

- `ChoiceCard` must **never** include a full argument schema. It is a compact,
LLM-friendly summary; full schemas are hydrated on demand via `Catalog.hydrate()`.
- Keep card text representation minimal to avoid consuming prompt tokens.

## Synchronous-only routing

- The entire routing engine is **synchronous** (pure computation, DAG traversal,
beam search). Do not introduce `async`/`await` anywhere in `routing/`.
- The engine has zero runtime dependencies on the context engine — do not import
from `contextweaver.context.*` inside `routing/`.

## Import rules

- Raise custom exceptions from `contextweaver.exceptions` (`GraphBuildError`,
`RouteError`, `CatalogError`, `ItemNotFoundError`), not bare exceptions.
- Text similarity (`tokenize`, `jaccard`, `TfIdfScorer`) must come from
`contextweaver._utils`.
- Use `from __future__ import annotations` in every source file.

## Related issues

- dgenio/contextweaver#73 — module size tracking
- dgenio/contextweaver#69 — routing refactor work
- dgenio/contextweaver#63 — ChoiceGraph design and validation
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added
- Path-scoped Copilot instructions for `context/` and `routing/` (#95)

## [0.1.5] - 2026-03-07

### Added
Expand Down