An in-source communication protocol where the writing agent encodes architectural context and the reading agent decodes it. The file is the channel. Every fragment carries the whole.
CodeDNA is an inter-agent communication protocol implemented as in-source annotations. The writing agent embeds architectural context directly into source files; the reading agent decodes it at any point in the file. Like biological DNA β cut a hologram in half and you get two smaller complete images.
No RAG. No vector DB. No external rules files. Minimal drift (context co-located with code).
π― Less Prompt Engineering Needed: CodeDNA annotations help AI agents navigate the codebase with less manual guidance. Even less-technical users can get better multi-file fixes by describing the problem β the architectural context is already in the code.
AI coding agents waste a significant fraction of their context window exploring irrelevant files, re-reading code, and missing cross-file constraints or reverse dependencies. The result: incomplete patches, higher token costs, and models that repeat the same mistakes across sessions.
The root cause is structural. Information like reverse dependencies and domain constraints cannot be inferred from a single file β they require reading the whole codebase. Without a way to persist that knowledge, every agent starts from scratch.
CodeDNA embeds this context directly in source files: used_by: maps reverse dependencies, rules: encodes domain constraints, and agent: / message: accumulate knowledge across sessions. It is not intended to replace retrieval systems, vector databases, or external memory β it provides a persistent architectural context layer inside the repository that any of those systems can build on.
This also enables agent-to-agent communication: a constraint discovered by Agent A is available to Agent B in a different session or a different model. Knowledge compounds in a versioned, inspectable form.
Preliminary results are encouraging. +13pp F1 on Gemini 2.5 Flash and +9pp on DeepSeek Chat on Django tasks β zero-shot, no fine-tuning, just annotations. Results are preliminary and require larger-scale validation.
Every AI coding agent relies on multiple memory layers to navigate a codebase. Most of them are external to the code β chat history, vector databases, markdown rules files. CodeDNA is different: it is the only layer that lives inside the source files themselves.
| Layer | Examples | Where it lives | Shared across tools? |
|---|---|---|---|
| LLM / Agent | Claude, GPT-4, Cursor, Copilot | Cloud | β |
| External memory | Chat history, Projects, Memory API | Cloud / external DB | β tool-specific |
| Native agent memory | Claude auto-memory, Cursor memory, Windsurf memories, Devin session memory, β¦ | Local machine / tool cloud | β tool-specific |
| RAG / Vector DB | Embeddings, Pinecone, pgvector | External infrastructure | depends |
| Markdown / Config | README, CLAUDE.md, .cursorrules, AGENTS.md |
Repo (outside source files) | partial (tool-specific files) |
| CodeDNA | exports, rules, agent, message, .codedna |
Inside every source file + repo root | β always |
Every other layer is either external to the code or tool-specific. CodeDNA is the only memory that:
- Travels with the source file β through clones, forks, and CI pipelines, with no infrastructure dependency
- Is readable by any agent on any tool β Claude, Cursor, Windsurf, Copilot, OpenCode, or a custom script all see the same annotations
CodeDNA does not replace native agent memories β it is additive. Every agentic tool (Claude Code, Cursor, Windsurf, Devin, and any future agent) has its own native memory for user preferences, feedback, and tool-specific context. That context belongs outside the repo. CodeDNA handles the architectural context that belongs inside it. Use both.
This is what makes CodeDNA composable. RAG systems, vector databases, native tool memories, and external memory layers can all be built on top of or alongside CodeDNA annotations. The in-source layer is the shared foundation any of those systems can read from β and the only one that survives a
git clone.
AI coding agents usually begin from a semantic prompt and must infer structure by exploring the repository.
Without persistent architectural context, each session starts from scratch.
CodeDNA turns semantic reasoning into structured reasoning.
Annotations allow the agent to follow explicit dependency and constraint signals instead of relying only on token similarity or retrieval.
This suggests that source code alone may not be the optimal reasoning layer for AI agents. While binary is the lowest layer for execution, structured source + annotations may be closer to the lowest layer for understanding.
Three visual metaphors, same real data (django__django-11808 Β· DeepSeek-Chat Β· 5 runs). Without CodeDNA: agent opens 2 random files and stops β 8/10 critical files missed. With CodeDNA: follows the
used_by:chain β finds 6/10 critical files. Retry risk β52%. βΆ Interactive version β 3 metaphors
π The Network Effect: When an AI agent writes CodeDNA annotations, it leaves a navigable trail for every other agent that reads the code after it β regardless of vendor or model. The more agents that participate, the more useful the protocol becomes.
| You are⦠| Without CodeDNA | With CodeDNA |
|---|---|---|
| Non-technical user | Must learn prompt engineering to guide the AI agent through the codebase | Just describe the problem β annotations give the agent structural context to follow |
| Junior developer | AI finds the obvious file, misses the 5 related ones | used_by: graph helps surface related files that may need changes |
| Senior developer | Spends time writing detailed prompts every session | Writes annotations once β that context persists across sessions |
| Team lead | Each developer's AI makes different mistakes | Annotations encode team knowledge β potentially more consistent results |
The core idea: today, the quality of AI-assisted coding often depends on the user's ability to prompt. CodeDNA moves some of that knowledge from ephemeral prompts into persistent, version-controlled source code.
Setting up CodeDNA is two steps:
- Install the integration for your AI tool β tells the agent how to follow the protocol (Option 1 below)
- Annotate your existing codebase β adds CodeDNA headers to files already in your repo (Option 2 below)
For a new project, step 1 is enough β the agent annotates files as it creates and edits them. For an existing codebase, run both: step 1 first, then step 2 to bulk-annotate what's already there.
Want to try CodeDNA on a sample project or contribute to the codebase? See CONTRIBUTING.md for the dev setup.
Run one command for your tool:
bash <(curl -fsSL https://raw.githubusercontent.com/Larens94/codedna/main/integrations/install.sh) <tool>| Tool | Option | Enforcement |
|---|---|---|
| Claude Code | claude-hooks |
β
Active β 4 hooks + .claude/settings.local.json |
| Cursor | cursor-hooks |
β
Active β hook scripts in .cursor/hooks/ (v1.7+) |
| GitHub Copilot | copilot-hooks |
β
Active β .github/hooks/hooks.json + scripts |
| Cline | cline-hooks |
β
Active β hook scripts in .clinerules/hooks/ (v3.36+) |
| OpenCode | opencode |
β
Active β JS plugin in .opencode/plugins/ |
| Windsurf | windsurf |
|
| Antigravity / custom agents | agents |
|
| Aider | claude |
Active enforcement = hooks validate annotations on every file write/edit automatically, regardless of session length or task complexity. Full reference:
integrations/README.md.
allinstalls everything at once β only useful for teams where each developer uses a different tool.
Done. Your AI tool now follows the CodeDNA protocol. If you have existing files to annotate, continue with Option 2.
Annotate an entire project from the terminal. Supports local models via Ollama at zero cost:
pip install git+https://github.com/Larens94/codedna.git
# Free β structural only, no AI
codedna init /path/to/project --no-llm
# Free β local model via Ollama
codedna init /path/to/project --model ollama/llama3
# Paid β Anthropic Haiku (~$1-3 for a Django project)
ANTHROPIC_API_KEY=sk-... codedna init /path/to/project --model claude-haiku-4-5-20251001| Command | What it does |
|---|---|
codedna init PATH |
First-time annotation β L1 module headers + L2 function Rules: |
codedna update PATH |
Incremental β only unannotated files (safe to re-run) |
codedna check PATH |
Coverage report without modifying files |
codedna init PATH --extensions ts go |
Annotate TypeScript + Go files too (L1 only) |
Supported models via --model:
| Provider | Example | Cost |
|---|---|---|
| Ollama (local) | ollama/llama3, ollama/mistral |
Free |
| Anthropic | claude-haiku-4-5-20251001 |
~$1β3 / project |
| OpenAI | openai/gpt-4o-mini |
Low |
gemini/gemini-2.0-flash |
Low | |
| None | --no-llm |
Free |
Status: accepted by Anthropic, currently under review β not yet available in the public directory. Use Option 1 or 2 in the meantime.
Once available:
claude plugin install codednaNo API key. No extra cost. Uses your existing Claude subscription. Adds /codedna:init, /codedna:check, /codedna:manifest, /codedna:impact commands + four enforcement hooks.
5 real Django issues from SWE-bench, tested across multiple LLMs. Same prompt, same tools, same tasks. Only difference: CodeDNA annotations.
Metric: File Localization F1 β harmonic mean of recall and precision on files read vs ground truth. Isolates the navigation bottleneck that precedes code generation.
Statistical test: Wilcoxon signed-rank test (one-tailed, H1: CodeDNA > Control) over F1 pairs across 5 tasks. N=5 with β₯5 runs per task at T=0.1.
| Model | Ctrl F1 | DNA F1 | Ξ F1 | p-value | Tasks Won |
|---|---|---|---|---|---|
| Gemini 2.5 Flash | 60% | 72% | +13% | 0.040* | 4/5 |
| DeepSeek Chat | 50% | 60% | +9% | 0.11 | 4/5 |
| Gemini 2.5 Pro | 60% | 69% | +9% | 0.11 | 3/5 |
3 of 3 models complete. Full data:
benchmark_agent/runs/Gemini 2.5 Flash: W+=14, N=5, p=0.040 β significant. DeepSeek Chat: W+=12, N=5, p=0.11. Gemini 2.5 Pro: W+=12, N=5, p=0.11. All runs: 5 tasks Γ 3β5 runs at T=0.1.
Empirical analysis across 5 tasks (Gemini 2.5 Flash, β₯5 runs each) suggests a pattern:
| Task type | Example | Ξ F1 |
|---|---|---|
| Clear dependency chain β A calls B which delegates to C | dbshell β client β subprocess (12508) |
+9% |
| Delegation with backend fan-out β one interface, N backends | Trunc β ops.date_trunc_sql (13495) |
+21% |
| Feature addition with flag gating β new capability across feature/schema layers | INCLUDE clause in Index (11991) |
+17% |
| XOR feature with multi-layer propagation | Q() XOR support (14480) |
+18% |
| Cross-cutting fix β same pattern in N unrelated files, no shared ancestor | __eq__ NotImplemented (11808) |
~0% |
| Task | What it is | Why hard without CodeDNA | Ξ F1 (Flash / DeepSeek) |
|---|---|---|---|
| 12508 dbshell | Add -c SQL flag to dbshell management command |
Entry point is obvious by name; 4 backend runshell_db() clients are hidden |
+9% / +1% |
| 11991 INCLUDE | Add INCLUDE clause support to Index |
schema.py is findable; 4 backend schema editors are not |
+17% / +6% |
| 14480 Q() XOR | Add XOR operator to Q() and QuerySet() |
ORMβSQLβbackends cascade requires touching 7 files | +18% / +14% |
| 13495 Trunc tzinfo | Fix timezone handling in TruncDay() for non-DateTimeField |
Per-backend date_trunc_sql() override not reachable by grep alone |
+22% / β8% β |
11808 __eq__ |
Fix __eq__ to return NotImplemented for unknown types |
Entry is models/base.py (847 lines, generic name); 5 subclasses are unconnected |
β0% / +34% |
β Task 13495 shows a model-dependent anomaly: Flash benefits strongly (+22pp) while DeepSeek and Pro regress (β8/β9pp). Under investigation.
Transparency note on 11808: the cross-cutting task was included deliberately to test the limits of the protocol. The benchmark annotations do not pre-populate a list of affected files β the agent must discover them independently. CodeDNA v0.7 shows Ξ β 0% on this task type. This is reported as a known limitation, not hidden. See SPEC.md Β§2.4 for the proposed v0.8 extension (
cross_cutting_patterns:) and why it would not constitute cheating.
CodeDNA is most effective when there is a navigable call chain. The used_by: graph guides the agent from entry point to all affected files. For cross-cutting concerns (same fix in many independent files with no shared ancestor), the benefit is smaller because there is no natural navigation path to follow.
A full audit confirmed no task-specific hints are embedded in the codedna/ files. Where GT files appear in used_by: targets, it is because those files are genuine callers or subclasses β not cherry-picked. The cross-cutting task (11808, Ξβ0%) confirms this: annotations described the architecture accurately but gave no navigation advantage because there is no call chain to follow.
One correction was made during the audit: base/schema.py in task 11991 initially listed only postgresql/schema.py in used_by: β updated to include all 4 backend schema editors that genuinely inherit from it.
Full audit: benchmark_agent/claude_code_challenge/django__django-13495/BENCHMARK_RESULTS.md
Pattern: cheaper models appear to benefit most. Flash (cheapest of the three) shows the strongest gain (p=0.040). This suggests annotating once may allow cheaper models to perform closer to more expensive ones β though the sample is small.
Full data: benchmark_agent/runs/ Β· Script: benchmark_agent/swebench/run_agent_multi.py
The SWE-bench benchmark above tests single-agent file navigation. Here we test a different question: can CodeDNA help teams of agents divide work without collisions and produce integrated software?
Two experiments, both using 5-agent teams orchestrated with Agno (TeamMode.coordinate). Same task, same model, same tools β only the instructions differ.
| Metric | Exp 1 β RPG (DeepSeek Chat) | Exp 2 β SaaS (DeepSeek R1) |
|---|---|---|
| Duration (A / B) | 1h 59m / 3h 11m (1.6x faster) | 82.6m / 99m (17% faster) |
| Output quality | Playable game / static scene | Lower complexity (2.1 vs 3.1) |
| Annotation adoption | 94% | 98.2% (spontaneous, no reminders) |
message: adoption |
0 (not in prompt) | 54 files (100%, organic) |
| Judge fixes needed | 8 / 12 | β |
Full reports: Exp 1 report Β· Exp 2 data
Setup: identical 5-agent team (GameDirector β GameEngineer β GraphicsSpecialist β GameplayDesigner β DataArchitect), same task, same model (DeepSeek deepseek-chat), same tool budget. Only the instructions differed.
| Metric | Condition A β CodeDNA | Condition B β Standard |
|---|---|---|
| Total duration | 1h 59m | 3h 11m |
| Python files | 50 | 45 |
| Total LOC | 10,194 | 14,096 |
| Avg LOC/file | 203 | 313 |
| Annotation coverage | 94% | 0% |
| Judge fixes to boot | 8 | 12 |
| Player controllable after fixes | Yes (WASD) | No |
CodeDNA was 1.60Γ faster. More importantly: after judge intervention to fix both outputs, condition A produced a playable game (ECS running, 5 entities, WASD input). Condition B produced a visible but static scene β engine/ecs.py and gameplay/systems/player_system.py were both correct, but the integration layer connecting them was never written.
Without used_by: contracts, the director spent 25 minutes occupying all four module namespaces before delegating (vs 12 minutes with CodeDNA). Every downstream specialist inherited structure they didn't design:
B Director builds full scaffold (25m β 2.0Γ A)
β GameEngineer reverse-engineers structure (36m β 3.9Γ A)
β GraphicsSpecialist works around pre-built renderer (41m β 1.4Γ A)
β GameplayDesigner inherits 545-line monolith (35m β 2.6Γ A)
β DataArchitect β independent domain, cleanest run (35m β 0.75Γ A β only exception)
The cascade peaks at the agent nearest to the director's territorial decisions and diminishes toward the most independent domain. used_by: forces ownership upfront β the director cannot occupy a module it declared as belonging to another agent.
All 8 fixes in condition A were corrections to existing code. Condition B had 12 fixes β 4 on existing code and 8 missing modules: entity_system.py, physics_engine.py, ai_system.py, player_controller.py, and the entire integration/ directory. These modules were declared by the director in game_state.py but never written by anyone. Writing them from scratch would be outside the scope of judge intervention.
More LOC does not mean more coverage. B produced 38% more lines (14,096 vs 10,194) but 10% fewer files. Average file size: 313 lines vs 203. More code, less functionality.
Full report: experiments/runs/run_20260329_234232/REPORT.md Β· Run data: experiments/runs/run_20260329_234232/
Setup: same 5-agent team, same task (build AgentHub β a multi-tenant SaaS platform to rent, configure and deploy AI agents), upgraded model: DeepSeek R1 (deepseek-reasoner). Two conditions run sequentially on the same machine.
| Metric | Condition A β CodeDNA | Condition B β Standard |
|---|---|---|
| Duration | 82.6 min | 99.0 min |
| Python files | 55 | 50 |
| Total LOC | 14,156 | 11,872 |
| Avg function length | 14.3 lines | 26.2 lines |
| Avg cyclomatic complexity | 2.11 | 3.07 |
| Max function complexity | 10 | 16 |
| Classes | 90 | 50 |
| Annotation coverage | 98.2% | 0% |
| Syntax errors | 1 | 0 |
| Validation score | 0.73 | 0.87 |
The single syntax error in condition A was an em-dash character (
βU+2014) introduced inside arules:annotation field. Without it, validation scores would be near-equal. The gap does not reflect a systematic correctness difference.
DeepSeek R1 annotated 54 of 55 files with all 5 CodeDNA fields (exports, used_by, rules, agent, message) across a full 83-minute multi-agent session β without any prompting mid-run to "remember annotations." This is the highest adoption rate observed across all experiments.
Example β app/agents/agent_wrapper.py (written by the AgentIntegrator specialist):
"""app/agents/agent_wrapper.py β Wraps agno.Agent, counts tokens, enforces credit cap.
exports: AgentWrapper, CreditExhaustedError
used_by: app/agents/agent_runner.py β run_agent_stream,
app/services/agno_integration.py β agent execution
rules: Never call agno.Agent directly from API layer β always go through AgentWrapper
Token count must be extracted from agno response metadata and stored in agent run tokens_used
AgentWrapper must raise CreditExhaustedError (HTTP 402) before starting if balance < min_credits
All agent instructions must be sanitised (strip HTML, limit to 10k chars)
agent: AgentIntegrator | 2024-12-05 | implemented AgentWrapper with token counting and credit cap
message: "implement tool usage tracking and cost estimation"
"""The rules: field encodes four constraints (API layer isolation, token tracking, credit pre-check, input sanitization) that cannot be inferred by reading the file alone β they require knowing the full call chain. The message: field leaves a forward-planning note for the next agent in the session.
The same file shows L2 adoption inside the class body:
class AgentWrapper:
"""Wraps an agno.Agent instance with token counting and credit enforcement.
Rules:
1. Token counting is extracted from agno response metadata
2. Credit cap is enforced before execution
3. Instructions are sanitized (HTML stripped, length limited)
4. All agent interactions go through this wrapper
"""The field was used consistently across all 54 annotated files to encode work that the agent knew was needed but was out of scope for its current task:
# app/agents/agent_runner.py
agent: AgentIntegrator | 2024-12-05 | implemented agent runner with streaming and credit management
message: "implement concurrent execution with asyncio semaphore"
# app/agents/memory_manager.py
agent: AgentIntegrator | 2024-12-05 | implemented persistent memory with similarity search
message: "implement memory summarization for long conversations"
# app/services/scheduler_service.py
agent: Product Architect | 2024-03-30 | created scheduler service skeleton
message: "implement job persistence for fault tolerance across restarts"
# app/services/agent_service.py
agent: Product Architect | 2024-03-30 | created agent service skeleton
message: "implement agent configuration validation against Agno framework schema"These are not instructions the agent received β they are observations it left for itself (and for future agents), co-located with the code where the work would eventually happen. No agent was told to use message: this way.
Condition B (no CodeDNA) produced working code but with a notable structural anomaly: the agent started Flask, then pivoted to FastAPI mid-session, leaving both stacks in the codebase simultaneously.
app/__init__.pyimportsFlask,SQLAlchemy,JWTManager,Bcrypt,Celeryβ initializesdb = SQLAlchemy()app/main.pycreates a FastAPI application viacreate_fastapi_app()run.pycallscreate_app()with a Flask-styleapp.run()- Jinja2 templates (
base.html,home.html,marketplace.html) and static JS files are residue from the Flask phase
The pivot is not a bug in the usual sense β condition B's individual files are syntactically correct (0 errors). But the integration layer is inconsistent. CodeDNA's rules: and used_by: fields force the agent to declare architectural boundaries upfront, which appears to reduce mid-session pivots.
Despite the architectural inconsistency, condition B fully implemented modules that A left as stubs:
app/billing/credit_engine.py(413 LOC) β completeCreditEnginewithdebit(),credit(),reserve(),release(), transaction logging,InsufficientCreditsErrorapp/memory/manager.py(638 LOC) βMemoryManagerwith vector similarity search, importance scoring, TTL expirydemo_seed.pyβ realistic seed data (A had none)test_app.pyβ basic test file (A had none)
A built stronger architecture (ServiceContainer DI, 9 exception types, async SQLAlchemy); B built more domain implementation. Neither was production-ready without further work.
| Question | Answer |
|---|---|
| Does a reasoning model adopt CodeDNA spontaneously? | Yes β 98.2% across 54 files, sustained over 83 min |
| Does CodeDNA change code structure? | Yes β lower complexity (2.11 vs 3.07), shorter functions (14 vs 26 lines), more classes (90 vs 50) |
| Does it prevent bugs? | No β the one syntax error was inside an annotation field |
Does message: get used as designed? |
Yes β 54 files, organically, without explicit instruction |
| Does it prevent mid-session architectural pivots? | Likely yes β B changed stack mid-session; A did not |
N=1 per condition. Results are directional, not statistically powered. The experiment is presented as a qualitative case study to complement the SWE-bench navigation benchmark.
Full run data: experiments/runs/run_20260331_002754/ Β· Script: experiments/run_experiment_webapp2.py
Both multi-agent experiments are N=1 per condition β results are directional, not statistically powered. Experiment 2 used sequential runs on shared hardware (machine state may differ between conditions). Task 13495 shows an unexplained model-dependent anomaly (Flash +22pp, DeepSeek -8pp). Independent replication across different models, team sizes, and project types is needed.
The SWE-bench benchmark measures file navigation (did the agent open the right files?). This second benchmark measures fix completeness (did the agent produce the correct patch?).
Setup: two Claude Code sessions on django__django-13495, same model (claude-sonnet-4-6), same prompt, same bug. Ground truth: the official Django patch (7 files).
Bug: TruncDay('created_at', output_field=DateField(), tzinfo=tz_kyiv)
generates SQL without AT TIME ZONE β timezone param silently ignored.
Results:
| Metric | Control | CodeDNA |
|---|---|---|
| Session time | ~10β11 min | ~8 min |
| Total interactions (estimated) | ~33 | ~30 |
| Failed edits | 5 | 0 |
| Files matching official patch | 6 / 7 | 7 / 7 |
date_trunc_sql fixed (DateField) |
β all backends | β all backends |
time_trunc_sql fixed (TimeField) |
β not touched | β all backends |
sqlite3/base.py updated |
β | β |
| SQLite approach matches official patch | β | β |
| Knowledge left for next agent | β | β
rules: + agent: updated |
What made the difference: a single rules: annotation on TimezoneMixin.get_tzname():
def get_tzname(self):
"""
Rules: Timezone conversion must occur BEFORE applying datetime functions;
database stores UTC but results must reflect input datetime's timezone.
"""This described an architectural principle, not the bug. The control saw the same time_trunc_sql call on the line immediately below the reported bug β and didn't touch it. CodeDNA read the constraint and applied the fix to the full pattern.
Validity note: this is a single run, not a statistically powered study. The result is presented as an illustrative case, not a population estimate. The causal mechanism is traceable: one annotation changed the frame from "fix DateField" to "fix the timezone pattern across all output fields."
Full report: benchmark_agent/claude_code_challenge/django__django-13495/BENCHMARK_RESULTS.md
Session logs: control Β· codedna
Reproduce: HOW_TO_RERUN.md
Run it yourself:
- Clone the control repository:
git clone https://github.com/Larens94/codedna-challenge-control
- Clone the CodeDNA-annotated version:
git clone https://github.com/Larens94/codedna-challenge-codedna
- Open either repository in your AI coding agent (Claude Code, Cursor, etc.)
- Paste the same prompt into your agent and score how many of the 7 patch files it touches.
Quick test with the CLI:
# Check annotation coverage
codedna check ./codedna-challenge-codedna
# Run a dry-run annotation (no LLM)
codedna init ./codedna-challenge-codedna --no-llm --dry-runCodeDNA v0.8 is the current release. The planned development path:
| Milestone | Goal | Status |
|---|---|---|
| M1 β Protocol & CLI | v0.8 spec Β· codedna init/update/check Β· AST-based auto-extraction Β· message: agent chat layer |
β Done |
| M2 β Benchmark Expansion | 20+ SWE-bench tasks Β· 5+ LLMs Β· Zenodo dataset Β· public dashboard | π |
| M3 β Multi-Tool Hooks | Active enforcement hooks for Claude Code Β· Cursor Β· Copilot Β· Cline Β· OpenCode β validates on every write | β Done |
| M4 β Language Extension | 11 languages: Python Β· TS/JS Β· Go Β· PHP (Laravel) Β· Rust Β· Java Β· Kotlin Β· Ruby Β· C# Β· Swift Β· Blade/Jinja2/Vue | β Done |
| M5 β Editor & Workflow | VS Code extension (used_by graph Β· agent timeline Β· model heatmap) Β· GitHub Action CI | π |
| M6 β Research & Dissemination | arXiv preprint Β· ICSE NIER/workshop submission Β· annotate Flask, FastAPI | π |
This roadmap is part of a funding application to NLnet NGI0 Commons Fund (deadline April 1st 2026). If you find CodeDNA useful and want to support its development, β the repo and share it.
The agent: field records what an agent did. The message: sub-field (new in v0.8) adds a conversational layer β soft observations, open questions, and forward-looking notes left directly for the next agent.
"""analytics/revenue.py β Monthly/annual revenue aggregation.
...
agent: claude-sonnet-4-6 | anthropic | 2026-03-10 | Implemented monthly_revenue.
message: "rounding edge case in multi-currency β investigate before next release"
agent: gemini-2.5-pro | google | 2026-03-18 | Added annual_summary.
message: "@prev: confirmed, promoted to rules:. New: timezone rollover in January"
"""message: works at both levels:
- Level 1 (module docstring) β for agents that read the full file
- Level 2 (function docstring) β for agents using a sliding window that never sees the top of the file
The lifecycle: an observation left in message: either gets promoted to rules: (architectural truth confirmed) or dismissed with a reply. Append-only, never deleted.
Git is already immutable, append-only, and diff-complete. v0.8 uses git trailers β the same standard as Co-Authored-By:, natively recognised by GitHub β to embed AI session metadata directly in commit messages:
implement monthly revenue aggregation
AI-Agent: claude-sonnet-4-6
AI-Provider: anthropic
AI-Session: s_a1b2c3
AI-Visited: analytics/revenue.py, payments/models.py, api/reports.py
AI-Message: found rounding edge case in multi-currency β investigate before next release
Git already records the diff, date, and changed files. AI-Visited: is the only addition β files read during the session, which git does not track natively.
This gives you audit queries immediately:
git log --grep="AI-Agent:" # all AI commits
git log --grep="AI-Agent: claude" -p -- revenue.py # claude's changes to a file
git log --format="%b" | grep "AI-Agent:" | sort | uniq -c # model distributionThree-tier architecture: git (authoritative audit, full diff) β .codedna (lean session summary for agent navigation) β file agent: field (one-liner, sliding-window safe). A session_id links all three.
Built on top of git log with AI trailers:
- CodeLens β last AI agent + commit count inline on every file and function
- File heatmap β how many AI sessions touched each file, by provider
- Agent Timeline β chronological session log with git diff per session
- Stats panel β model distribution chart, navigation efficiency per model
Full spec: SPEC.md Β§4.7β4.8 Β· VSCode extension is planned for M3.
A single YAML file at the repo root. The agent reads this first β before opening any source file β to understand packages, their purposes, and inter-package dependencies.
# .codedna β auto-generated by codedna init
project: myapp
packages:
payments/:
purpose: "Invoice generation, payment processing"
analytics/:
purpose: "Revenue reports, KPI dashboards"
depends_on: [payments/, tenants/]
tenants/:
purpose: "Multi-tenant management, suspension"A docstring at the top of every file. Only includes information that cannot be inferred from the code: the public API (exports:), who depends on this file (used_by:), and domain constraints (rules:). Import statements already declare dependencies β no need to duplicate them.
"""orders/orders.py β Order lifecycle management.
exports: get_active_orders() -> list[dict] | create_order(user_id, items) -> None
used_by: analytics/revenue.py β get_revenue_rows
rules: User system uses soft delete β NEVER return orders for users
where users.deleted_at IS NOT NULL. Always JOIN on users.
"""Rules: docstrings on critical functions, written organically by agents as they discover constraints. Each agent that fixes a bug or learns something important leaves a Rules: for the next agent β knowledge accumulates over time.
def get_active_orders() -> list[dict]:
"""Return all non-cancelled orders for active (non-deleted) users.
Rules: MUST JOIN users and filter deleted_at before returning results.
Failure to filter inflates revenue reports with deleted-user orders.
"""Variable names encode type, shape, domain, and origin. Any 10-line extract is self-documenting.
# β Standard β agent must trace the entire call chain
data = get_users()
price = request.json["price"]
# β
CodeDNA β readable in any context window
list_dict_users_from_db = get_users()
int_cents_price_from_req = request.json["price"]To plan edits across 10+ files: read .codedna first, then read only the module docstring of each file (first 8β12 lines), build an exports: β used_by: graph, then open only the relevant files in full.
The key rule for rules: annotations: describe the mechanism, not the solution.
# β Wrong β gives away the answer
rules: Fix mysql/operations.py, oracle/operations.py, postgresql/operations.py
# β
Correct β describes the delegation chain
rules: Trunc.as_sql() delegates to connection.ops.date_trunc_sql() and
time_trunc_sql(). Each backend implements these independently.used_by: is a navigation map, not a to-do list. The agent reasons about which targets are relevant to the current task and opens only those. In the benchmark, CodeDNA runs showed P=100% (zero wasted reads) on the tasks measured, while control runs scattered across irrelevant files.
CodeDNA is designed for multi-agent environments β different models, different tools, different sessions. Each agent leaves knowledge for the next:
Agent A fixes a bug β adds Rules: "MUST filter soft-deleted users"
Agent B reads Rules: β avoids the same bug without re-discovering it
Agent C discovers an edge case β extends the Rules:
Unlike docs (which go stale), Rules: annotations are co-located with the code β read every time the function is edited.
Current benchmark results are zero-shot β no fine-tuning on the protocol. Models follow used_by: and rules: by general language understanding alone. A fine-tuned model could potentially treat these as native structured signals, which might reduce variance further β this remains to be tested.
See SPEC.md for the full inter-agent model, verification protocol, fine-tuning potential, and training corpus design.
CodeDNA v0.8 supports 11 languages. Python is the reference implementation with full AST-based extraction (L1 module headers + L2 function Rules:). All other languages get L1-only annotation via regex adapters β no external toolchain required.
| Language | Extensions | L1 | L2 | Framework awareness |
|---|---|---|---|---|
| Python | .py |
β AST | β AST | β |
| TypeScript / JavaScript | .ts .tsx .js .jsx .mjs |
β | β | β |
| Go | .go |
β | β | β |
| PHP | .php |
β | β | Laravel (Route facades, Eloquent) Β· Phalcon (Controller/Model, DI, Router) |
| Rust | .rs |
β | β | β |
| Java | .java |
β | β | β |
| Kotlin | .kt .kts |
β | β | β |
| C# | .cs |
β | β | β |
| Swift | .swift |
β | β | β |
| Ruby | .rb |
β | β | β |
Template engines (L1 via block-comment extraction):
| Template | Extensions | Comment syntax |
|---|---|---|
| Blade (Laravel) | .blade.php |
{{-- --}} |
| Jinja2 / Twig | .j2 .jinja2 .twig |
{# #} |
| Volt (Phalcon) | .volt |
{# #} |
| ERB / EJS | .erb .ejs |
<%# %> |
| Handlebars / Mustache | .hbs .mustache |
{{!-- --}} |
| Razor / Cshtml | .cshtml .razor |
@* *@ |
| Vue SFC / Svelte | .vue .svelte |
<!-- --> |
Pass --extensions to annotate non-Python files:
codedna init ./src --extensions ts go # TypeScript + Go
codedna init ./app --extensions php # PHP/Laravel or PHP/Phalcon
codedna init ./templates --extensions volt blade # Phalcon Volt + Laravel Blade
codedna init . --extensions ts go php rs java # mixed project
codedna check . --extensions ts go -v # coverage report<?php
// app/Http/Controllers/UserController.php β Handles user CRUD endpoints.
//
// exports: UserController::index() -> Response
// UserController::store(Request) -> JsonResponse
// used_by: routes/web.php -> Route::resource('users', UserController::class)
// rules: must extend App\Http\Controllers\Controller.
// all public methods are auto-detected as exports.
// agent: claude-sonnet-4-6 | anthropic | 2026-04-02 | s_20260402_001 | initial controller scaffold<?php
// app/controllers/UserController.php β Handles user CRUD in Phalcon MVC.
//
// exports: UserController::indexAction() -> Response
// UserController::createAction() -> Response
// route:/users
// service:userService
// used_by: app/config/router.php -> $router->addGet('/users', ...)
// rules: extends Phalcon\Mvc\Controller β do not add constructor, use DI.
// $di->set('userService', ...) registers this service globally.
// agent: claude-sonnet-4-6 | anthropic | 2026-04-02 | s_20260402_001 | initial Phalcon controller
namespace App\Controllers;
use Phalcon\Mvc\Controller;
class UserController extends Controller
{
public function indexAction() { ... }
public function createAction() { ... }
}The PHP adapter auto-detects:
extends Controller/extends Model/extends Phalcon\Mvc\Controllerβ marks as Phalcon component$router->addGet('/uri', ...)β exports asroute:/uri$di->set('serviceName', ...)/$di->setShared(...)β exports asservice:serviceName- Public methods β annotated as
ClassName::method
codedna/
βββ README.md β you are here
βββ QUICKSTART.md β 2-minute setup for every AI tool
βββ SPEC.md β full technical specification v0.8
βββ integrations/
β βββ CLAUDE.md β Claude Code system prompt
β βββ .cursorrules β Cursor / Windsurf rules file
β βββ .windsurfrules β Windsurf rules file
β βββ .clinerules β Cline rules file
β βββ copilot-instructions.md β GitHub Copilot instructions
β βββ install.sh β one-line installer for all tools
βββ codedna_tool/ β installable CLI package (codedna init/update/check)
β βββ cli.py
β βββ __init__.py
β βββ languages/ β per-language annotation adapters
βββ codedna-plugin/ β Claude Code plugin (pending review)
βββ benchmark_agent/
β βββ swebench/
β β βββ run_agent_multi.py β multi-model benchmark (5 providers)
β β βββ analyze_multi.py β multi-model comparison
β βββ claude_code_challenge/ β fix-quality benchmark (control vs CodeDNA)
β β βββ django__django-13495/
β βββ runs/ β results by model
βββ examples/
β βββ python/ β annotated Python example
β βββ python-api/ β annotated Flask/FastAPI example
β βββ typescript-api/ β annotated TypeScript example
β βββ go-api/ β annotated Go example
β βββ java-service/ β annotated Java example
β βββ rust-cli/ β annotated Rust example
β βββ php-laravel/ β annotated Laravel example
β βββ ruby-sinatra/ β annotated Ruby/Sinatra example
βββ paper/ β scientific paper (arXiv preprint)
β βββ codedna_paper.pdf
β βββ codedna_paper.html
β βββ codedna_whitepaper_EN.html
β βββ codedna_paper_IT.html
βββ tools/
βββ pre-commit β CodeDNA v0.8 pre-commit hook (validates staged files)
βββ install-hooks.sh β installer: copies pre-commit into .git/hooks/
βββ validate_manifests.py β deep annotation validator (format, agent dates, purpose length)
βββ agent_history.py β session history viewer (reads AI git trailers)
βββ traces_to_training.py β SFT/DPO/PRM dataset converter from benchmark runs
βββ extract_city_data.py β extract annotations to JSON for city visualization
This is my first paper. I'm not a researcher β I'm a developer who is genuinely passionate about AI and how it interacts with code.
I built CodeDNA because I kept running into the same problem: AI agents making mistakes not because they were wrong, but because they had no context. I wondered: what if the context was already in the file? What if every snippet the agent read was self-sufficient?
I'm sharing this with complete humility. The benchmark is real, the data is reproducible, and the spec is open. Maybe it's useful to you. Maybe it sparks a better idea. Either way, I hope it contributes something.
If you find it helpful, try it, break it, improve it β or just tell me what you think. Feedback from people who actually use it is the only way this gets better.
If CodeDNA saved you some context tokens, a coffee is always welcome: ko-fi.com/codedna
β Fabrizio
See CONTRIBUTING.md. Examples in any language are welcome.


