Goal
Validate whether graphify-ts should move from the current repo → graph → retrieval model toward task → anchors → program slice → budgeted context pack.
This is a research/measurement issue before large rewrites.
Why
Recent real usage showed inconsistent behavior: running against only backend/ was slower/noisier, while running against the full GoValidate workspace produced better/faster/lower-token results. That suggests the problem may be methodological, not just optimization.
Scope
Create a small evaluation harness that compares:
- Current graphify-ts retrieval/context-pack behavior
- Simple lexical/file retrieval baseline
- Prototype task-conditioned slicing strategy
- Optional manual/full-context baseline where practical
Use at least 5–10 real prompts from a TypeScript/NestJS backend or a similar large repo.
Prompts to include
Examples:
- Explain the auth flow end to end
- Why is report generation slow?
- Can this PR break onboarding?
- What tests should cover this change?
- What can break if this service changes?
- Where does this config/env variable affect runtime behavior?
Metrics
Capture:
- runtime
- output token count / context token count
- selected files/symbols count
- missing-context rate
- irrelevant-context rate
- whether selected evidence is enough to answer
- false-confidence cases
Deliverables
docs/experiments/task-conditioned-slicing.md
- Script or fixture under
examples/ or src/**/__tests__ that can be rerun
- A comparison table of current vs prototype behavior
- Recommendation: keep current method, adjust it, or move to slicing architecture
Acceptance criteria
- At least 5 real prompts evaluated
- Current behavior and prototype behavior are compared using the same prompts
- Results include both quality notes and token/runtime measurements
- The issue ends with concrete next-step recommendations, not vague notes
Suggested labels
enhancement, research, performance, context-quality
Goal
Validate whether graphify-ts should move from the current
repo → graph → retrievalmodel towardtask → anchors → program slice → budgeted context pack.This is a research/measurement issue before large rewrites.
Why
Recent real usage showed inconsistent behavior: running against only
backend/was slower/noisier, while running against the full GoValidate workspace produced better/faster/lower-token results. That suggests the problem may be methodological, not just optimization.Scope
Create a small evaluation harness that compares:
Use at least 5–10 real prompts from a TypeScript/NestJS backend or a similar large repo.
Prompts to include
Examples:
Metrics
Capture:
Deliverables
docs/experiments/task-conditioned-slicing.mdexamples/orsrc/**/__tests__that can be rerunAcceptance criteria
Suggested labels
enhancement, research, performance, context-quality