LLM proposes. System decides. State persists.
Enterprise-grade memory and state management for any LLM — crash recovery, conflict tracking, audit trails, and deterministic lifecycle control. Single file. Zero dependencies. Zero lock-in. Outperforms multi-tool stacks while fitting inside a chat window.
Every LLM session starts from zero. Close the tab, lose the state. The industry "solutions" are duct tape: chat history dumps, vector DBs that hallucinate retrieval, framework lock-in that breaks across platforms.
RAG Runtime Kernel wraps around your project — it doesn't replace your workflow, it adds a structured memory and orchestration layer on top. One markdown file. Zero dependencies. Drop it into any LLM session and you get: deterministic state persistence, crash recovery, conflict tracking, and cross-session memory that actually works — across Claude, GPT, and any LLM.
In head-to-head benchmarks, this single-file specification matches or exceeds multi-tool stacks (Claude Code, lean-ctx, LLM Wiki) on state management, crash recovery, and cross-platform interoperability — while requiring zero installation.
Key benefits:
- Persistence — your project state survives across sessions, tabs, and platforms
- Reduced context loss — HOT/COLD memory tiers keep only what's needed in context
- Improved autonomy — the LLM self-enforces all rules without external tooling
- Audit trail — every decision, conflict, and state change is logged and traceable
- Create a project folder with a
RAG/subfolder - Copy
rag_kernel/into your project (see below) - Open Cowork, select the folder, start a session
- Run:
python -m rag_kernel init --spec RAG/INIT_UNIVERSAL_RUNTIME_KERNEL_v3.1.8.md --output RAG/— deterministic bootstrap, zero tokens, zero LLM - Optionally merge project-specific context:
python -m rag_kernel configure --rag RAG/RAG_MASTER.json --context your_context.json
Copy rag_kernel/ into your project:
PowerShell:
git clone https://github.com/arcadamarket/rag-runtime-kernel.git temp-clone
Copy-Item -Recurse temp-clone\rag_kernel YOUR_PROJECT\rag_kernel
Remove-Item -Recurse -Force temp-cloneCMD:
git clone https://github.com/arcadamarket/rag-runtime-kernel.git temp-clone
xcopy temp-clone\rag_kernel YOUR_PROJECT\rag_kernel\ /E /I
rmdir /s /q temp-cloneBash:
git clone https://github.com/arcadamarket/rag-runtime-kernel.git temp-clone
cp -r temp-clone/rag_kernel YOUR_PROJECT/rag_kernel
rm -rf temp-cloneThen run as MCP server or HTTP server:
python -m rag_kernel mcp --project /path/to/your/RAG # MCP mode (Claude Desktop)
python -m rag_kernel serve --project /path/to/your/RAG # HTTP mode (GPT / any LLM)Full platform-specific setup: docs/LAUNCH_MANUAL.md
Note: The Init Prompt is a full specification (~16K tokens). It goes into a project session, not the Instructions/System Prompt field (which has size limitations on most platforms).
- Create a new Project (or open an existing one)
- Copy
rag_kernel/into your project folder and placeINIT_UNIVERSAL_RUNTIME_KERNEL_v3.1.8.mdinRAG/ - Start a new session, run:
python -m rag_kernel init --spec RAG/INIT_UNIVERSAL_RUNTIME_KERNEL_v3.1.8.md --output RAG/ - The kernel parses the spec deterministically and produces RAG_MASTER.json — zero LLM tokens
- Copy the generated pointer block into your Project Instructions when prompted
- All subsequent sessions auto-load the RAG and enforce all rules
Without rag_kernel (Autonomous mode): Drop INIT_UNIVERSAL_RUNTIME_KERNEL_v3.1.8.md into a session as a file and send "Initialize the project." — the LLM self-bootstraps.
- Open a new conversation (or use Custom GPT if available)
- Upload or paste the contents of
INIT_UNIVERSAL_RUNTIME_KERNEL_v3.1.8.md - Send your first message — the system bootstraps in autonomous mode
- Follow on-screen steps (same as above)
- At session end, download the generated RAG files and save to your project folder
- Upload RAG files at the start of each new session to restore state
For hard runtime validation of every state transition, use the Python runtime:
# HTTP mode (for GPT Chat Custom Actions or any HTTP client)
python -m rag_kernel serve --project /path/to/your/RAG --port 7437
# MCP mode (for Claude Desktop)
python -m rag_kernel mcp --project /path/to/your/RAGFull setup instructions for all platforms and modes: docs/LAUNCH_MANUAL.md
Cowork is Anthropic's desktop tool for non-developers to automate file and task management.
New project: Create a project folder with a RAG/ subfolder, open Cowork, start a session, and drop the Init Prompt file in. The system bootstraps, scans your project folder, and builds the RAG.
Existing project: Point the system to your existing project folder during bootstrap. The boot scan inventories all existing files, classifies them by tier, and extracts knowledge into COLD storage. Your existing work becomes queryable, trackable, and persistent.
Benefits: Cowork's file access lets the kernel read/write RAG files directly — no manual copy-paste. Task automation pairs naturally with the kernel's checkpoint and audit system.
Claude Code is Anthropic's CLI tool for agentic coding tasks.
New project: Initialize your project directory, reference the Init Prompt in a Claude Code session, and the system creates RAG files in your RAG/ directory via direct filesystem access.
Existing project: Add a RAG/ directory to your existing codebase, bootstrap the kernel — it scans your project, builds inventory, and starts tracking state.
How it enhances Claude Code: Context persistence across stateless sessions. Deterministic state machine structures long-running development. Zero-token file ops via direct filesystem access. Conflict ledger preserves both sides when code changes contradict prior decisions.
Full benchmark: docs/benchmark_comparison.md
| Capability | RAG Runtime Kernel | Claude Code | lean-ctx | LLM Wiki |
|---|---|---|---|---|
| Cross-session memory | Full: HOT/COLD + WAL + crash recovery | Partial: CLAUDE.md, no crash recovery | None | Pattern only |
| Deterministic state machine | BOOTING > READY > WORKING > CHECKPOINTING > CLOSING + RECOVERY | None | None | None |
| Token efficiency | 60-90% reduction (HOT-only boot ~4K tokens) | Unbounded growth without curation | 60-99% raw compression (best-in-class I/O) | Depends on wiki quality |
| Cross-platform | Claude + GPT + any LLM, same spec | Claude Code only | Editor-focused | Platform-agnostic pattern |
| Dependencies | Zero. Single markdown file | Node.js + CLI | Rust binary | Varies |
| Crash recovery | WAL replay + .bak rotation + RECOVERY state | File-history checkpoints | N/A | None |
| Conflict tracking | Explicit ledger — both sources preserved | None | N/A | None |
- Only system with a formal state machine on LLM workflows — deterministic transition guards, not ad-hoc
- Only system that works identically across Claude and GPT — the spec is the invariant
- Only system with atomic write protocol + WAL + backup rotation — enterprise-grade persistence
- Formally verified with TLA+ — the same technique Amazon uses for AWS infrastructure (see below)
- Zero install, zero dependencies — the specification IS the product
- Conflict ledger is unique — no other system tracks disagreements between sources
A specification — a complete protocol that turns any LLM into a controlled, auditable agent with persistent project memory. 3-layer architecture:
LLM (reasoning engine)
| JSON proposals
Policy Layer (this specification)
| validated transitions
Runtime Kernel (state + persistence)
| atomic writes
Filesystem (source of truth)
The state machine is verified using TLA+ and the TLC model checker — the same formal methods technique used by Amazon to verify AWS infrastructure.
TLC exhaustively explored 389,522 states (168,520 distinct) at depth 19 and verified all 8 safety invariants + 3 liveness properties with zero violations:
| Safety Invariant | What It Proves |
|---|---|
| TypeInvariant | All state variables hold valid types at all times |
| TransitionSafety | Every reachable state is legal per the transition graph |
| SingleWriter | At most one proposal staged at any time (no concurrent mutations) |
| WALConsistency | Write-ahead log is append-only, monotone, and never lags behind state |
| TerminalSafety | CLOSING is irreversible — no exit, no crash, no pending proposals |
| NoDeadlock | Every non-terminal state has at least one enabled action |
| CrashRecoveryConsistency | Crash flag is only true when state is RECOVERY |
| WALPrecedesStateChange | WAL entry exists before any state transition commits |
| Liveness Property | What It Proves |
|---|---|
| EventualReady | The system always eventually reaches READY from any reachable state |
| EventualCheckpoint | Once in WORKING, the system always eventually checkpoints |
| EventualClose | The system always eventually reaches CLOSING (no infinite loops) |
Phase 2 verification found and fixed two genuine liveness bugs: a BOOTING/RECOVERY direct-transition loop, and a crash-at-full-WAL deadlock.
The TLA+ specification (formal/RAGKernel.tla) is a direct transcription of the Python state machine — every transition, guard, and invariant maps 1:1 to the runtime code. Full results in formal/TLC_RESULTS.md.
Unit tests prove "these 401 scenarios work." TLA+ proves "no scenario can ever violate the invariants — and the system always makes progress." That is a fundamentally stronger guarantee.
Structured Memory (HOT/COLD) — Active state stays lean (~15KB). Archival data loads on-demand with automatic partitioning.
Deterministic State Machine — BOOTING > READY > WORKING > CHECKPOINTING > CLOSING with RECOVERY path.
Proposal > Validation > Commit — LLM proposes JSON actions. System validates against policy, then commits or rejects.
Atomic Persistence — All writes atomic and verified. WAL enables crash recovery.
COLD Partitioning — Auto-splits into sessions/inventory/conflicts/evidence with sub-partitioning and integrity-preserving chopping.
Tool Fallback Chain — Ordered fallback for file operations across platform tools.
Cross-Platform — Claude Projects, ChatGPT, Cowork, Claude Code, any LLM.
Multi-Account Safety — Session identity tagging, write collision detection, anti-corruption guards.
Full Audit Trail — Every state transition, decision, and conflict logged.
Token Efficiency — 70-95% reduction vs. naive approaches.
| Mode | How It Works |
|---|---|
| Autonomous | LLM self-enforces all rules. No external software needed. Default mode. |
| Enforced | Python runtime (v0.2.0) intercepts all mutations. 9 modules, 401 tests. Zero-touch bootstrap: rag_kernel init parses spec deterministically — no LLM needed. |
Minimum: An LLM that supports file uploads or long-form input + a project folder.
Recommended: Filesystem MCP for direct file read/write.
Optional: Shell/PowerShell MCP, Python 3.10+ (ENFORCED mode), Claude Code or Cowork.
rag-runtime-kernel/
├── INIT_UNIVERSAL_RUNTIME_KERNEL_v3.1.8.md # The specification (current version)
├── INIT_UNIVERSAL_RUNTIME_KERNEL_v3.1.7.md # Previous version (archived)
├── CONTRIBUTING.md # How to report issues
├── CHANGELOG.md # Version history
├── docs/
│ ├── architecture.md # System architecture
│ ├── benchmark_comparison.md # Head-to-head vs alternatives
│ ├── design_principles.md # Core design philosophy
│ ├── test_analysis_gpt_web.md # GPT Web test findings
│ ├── LAUNCH_MANUAL.md # Full setup guide (all platforms + modes)
│ ├── LOCAL_TESTING_GUIDE.md # Local dev testing & GPT Custom Actions
│ ├── v3.2_ARCHITECTURE_DESIGN.md # Runtime architecture (v0.1.0 design doc)
│ └── ROADMAP.md # Development roadmap
├── rag_kernel/ # v0.2.0 Runtime Kernel (ENFORCED mode)
│ ├── __init__.py # Package entry, discover() capability registry
│ ├── __main__.py # CLI entry point (init / configure / health / serve / mcp)
│ ├── api.py # HTTP API (FastAPI)
│ ├── state_machine.py # Deterministic state engine
│ ├── persistence.py # Atomic writes, WAL, hash verification
│ ├── cold_manager.py # COLD partition manager
│ ├── concurrency.py # Lock manager, write collision guard
│ ├── mcp_transport.py # MCP tool interface
│ ├── schemas.py # Pydantic models for proposals/state
│ └── spec_parser.py # Deterministic MD→RAG parser (zero LLM)
├── tests/ # Test suites
│ ├── test_state_machine.py # State machine unit tests
│ ├── test_persistence.py # Persistence + WAL tests
│ ├── test_cold_manager.py # COLD partition tests
│ ├── test_concurrency.py # Lock + collision tests
│ ├── test_api.py # HTTP API tests
│ ├── test_mcp_transport.py # MCP transport tests
│ ├── test_schemas.py # Schema validation tests
│ ├── test_main.py # CLI entry point tests
│ ├── test_spec_parser.py # Spec parser + init/configure tests (64)
│ ├── UNIT_TEST_CLAUDE_DESKTOP.md # Claude Desktop spec-level tests (42)
│ └── UNIT_TEST_GPT_WEB.md # GPT Web spec-level tests (43)
├── .github/
│ ├── FUNDING.yml # GitHub Sponsors
│ └── ISSUE_TEMPLATE/ # Bug report + feature request templates
├── formal/
│ ├── RAGKernel.tla # TLA+ state machine specification
│ ├── RAGKernel.cfg # TLC model checker configuration
│ └── TLC_RESULTS.md # Verification results (389K states, 8 safety + 3 liveness)
├── LICENSE # AGPL-3.0
└── README.md
- BOOTING — Load HOT, verify consistency, check WAL, probe tools
- READY — Accept tasks
- WORKING / INGESTING — Execute tasks, ingest files, extract knowledge
- CHECKPOINTING — Save atomically with backup rotation
- CLOSING — Audit findings, final save
- Autonomous mode is self-enforced — the LLM follows the spec by instruction, not by hard runtime constraints
- Persistence depends on platform — full atomic writes with MCP; manual file management on GPT Web
- Context window ceiling — spec consumes ~16K tokens; large projects may hit limits
- Not a database — structured file-based memory, not a production database replacement
See docs/test_analysis_gpt_web.md for detailed platform-specific findings.
- Context window bound — spec ~16K tokens; large projects may hit limits
- No cross-filesystem bridge yet — relies on platform tools; user-assisted I/O without them
- Single-writer — concurrent writes detected-and-halted, not auto-merged
- GPT Web — no atomic writes, no real token counter, manual persistence
See docs/ROADMAP.md for complete roadmap.
| Release | Status | Focus |
|---|---|---|
| v3.1.8 | Released | Machine-parseable spec with rag-config fenced blocks for deterministic parsing. Zero-touch bootstrap target. |
| v0.2.1 | Released | Graduated POV enforcement (STRICT/ADVISORY/SILENT), 427 tests. Version scheme cleanup. |
| v0.2.0 | Released | 9 modules, 401 tests. Zero-touch bootstrap (rag_kernel init), capability self-discovery (discover()), project configuration (rag_kernel configure). Paradigm shift: fully autonomous OS-level Python backbone — LLM role reduced to task assignor, results checker, orchestrator. |
| v0.3.0 | In Progress | Delta checkpoints (ENH-006), conflict auto-categorization (ENH-005) |
| v0.4.0+ | Planned | Graph Orchestrator: DAG execution, parallel tasks, dependency tracking |
Found a bug? Please open an issue using the provided templates. See CONTRIBUTING.md.
Developer: Artem Pakhol LinkedIn: linkedin.com/in/pakhol
This project is licensed under the GNU Affero General Public License v3.0 — see LICENSE.
What this means: You may use, modify, and distribute this software, but any modified version you deploy (including as a network service) must also be released under AGPL-3.0 with attribution to the original project.
