CFAdv is a context compiler for LLMs. It ingests files of any format, scores and selects the most relevant content under a token budget, and assembles provider-ready packets for OpenAI, Anthropic, Ollama, and compatible APIs.
Built on context-fusion and extended with an attention fusion layer (AttnRes-inspired) that reorders selected context by query relevance, so the most useful content appears first in the prompt.
- Multiformat ingestion: text, PDF, DOCX, CSV, JSON, images (OCR), code, Markdown
- Normalization: uniform
ContextBlockobjects with token counts, trust and freshness scores - Task-specific compact representations: QA, code, agent, and universal variants
- Utility and risk scoring: relevance, trust, freshness, structure, diversity, hallucination proxy
- Multi-objective planner: value density + token + latency + cacheability ranking
- Attention-based fusion: query-dependent softmax weighting inspired by AttnRes (arxiv 2603.15031)
- Two-level block attention: intra-block ranking + cross-block mean-pooled ordering
- Canonical IR and delta fusion:
ContextPacket,CacheSegment, incrementalContextDelta - Dedup and fingerprinting: exact + near-duplicate collapse with provenance retention
- Multi-provider adapters: OpenAI, Anthropic, Ollama, and OpenAI-compatible APIs
- Provider-aware compilation:
chat,qa,code,agentpackers with mode-aware system prompts - Cache-aware assembly: stable/dynamic segment split for reuse across repeated turns
- MCP server: expose CFAdv tools and resources over MCP
- Framework integrations: retriever wrappers for LangChain and LlamaIndex
- Precompute pipeline: fingerprints, summaries, token stats, compact variants, features
- Compression pipeline: JSON minify, schema prune, citation compaction
- Ablation studies: identify which context blocks contribute most to outcomes
- Memory management: persistent storage with compaction and retention policies
- Web UI: local browser app to run and inspect pipeline outputs
pip install context-portfolio-optimizerFor development:
git clone https://github.com/rotsl/CFAdv.git
cd CFAdv
make install-devCopy .env.example to .env and fill in your API keys:
cp .env.example .envfrom context_portfolio_optimizer import PipelineRunner
runner = PipelineRunner()
result = runner.run(["document.pdf", "code.py"], budget=3000)
print(result["context"]) # Optimized, attention-ranked context string
print(result["stats"]) # Processing statistics# Ingest and display content
cpo ingest ./data
# Run full optimization pipeline
cpo run ./data --budget 3000 --query "Summarize architecture" --output context.txt \
--provider openai --model gpt-5-mini --mode chat --profile openai_chat
# Plan context for a task
cpo plan "Summarize these documents" --budget 5000
# Compile provider-ready packet (qa/code/agent/chat modes)
cpo compile ./data \
--task "Answer with citations" \
--provider openai \
--model gpt-5-mini \
--mode qa \
--budget 4000 \
--compression light \
--delta
# Precompute artifacts for latency reduction
cpo precompute ./data --store-dir .cpo_cache/precompute --semantic-dedup
# Run MCP-style server
cpo serve-mcp --host <host> --port 8765
# Inspect cache + precompute store
cpo inspect-cache
# Run ablation study
cpo ablate ./data --budget 3000
# Launch local visualization UI
cpo ui --host <host> --port 8080Provider/client mapping:
- ChatGPT / OpenAI API:
--provider openaiwithOPENAI_API_KEY - Claude AI / Claude API:
--provider anthropicwithANTHROPIC_API_KEY - Local models with Ollama:
--provider ollama(no cloud key required) - OpenAI-compatible APIs (Grok, DeepSeek, etc.):
--provider openai_compatiblewithOPENAI_COMPAT_BASE_URL - MCP clients:
cpo serve-mcp --host <host> --port <port>
README.mdfor setup and commands.envfor provider API keysconfigs/for provider and budget config overrides (optional)examples/gui_input/for quick GUI test inputs- CLI commands:
run,compile,ui,serve-mcp
flowchart TD
A[Install CFAdv] --> B[Add API keys in .env]
B --> C{Pick workflow}
C --> D[Chat workflow]
C --> E[Agent workflow]
D --> D1[Choose model: gpt-5-mini or claude-sonnet-4-6 or local ollama]
E --> E1[Choose agentic model: claude-sonnet-4-6 or gpt-5-mini or tool-using model]
D1 --> F[Run cpo compile or cpo run]
E1 --> F
F --> G[Provider adapter builds request]
G --> H[Model response + citations + context stats]
flowchart TD
A[Prepare corpus] --> B[Run cpo precompute]
B --> C[Run benchmarks and tests]
C --> D{Serve path}
D --> E[CLI and app integration]
D --> F[Web UI]
E --> G{Runtime mode}
F --> G
G --> H[Chat or QA packer]
G --> I[Agent packer + delta fusion]
H --> J[Provider adapter]
I --> J
J --> K[OpenAI or Anthropic or Ollama or compatible]
K --> L[Track token, latency, and cache metrics]
CFAdv uses a middleware pipeline:
Ingest → Normalize → Canonical IR → Precompute → Dedup/Fingerprint
→ Query Classify → Candidate Retrieval → Fast Rerank → Budget Planner
→ Context Compression → Attention Fusion → Delta Fusion → Provider Adapter → Cache-Aware Assemble
- Ingest: Extract content from multiple file formats
- Normalize: Convert to uniform
ContextBlockobjects - Represent: Generate alternative compact representations per block
- Precompute: Persist compact variants, token stats, retrieval features, and fingerprints
- Retrieve: Query classify → top-100 lexical retrieval → top-20/25 rerank
- Plan: Multi-objective latency-aware representation selection under token budget
- Fuse: Query-dependent attention ranking (AttnRes-inspired) +
ContextDeltafor agent turns - Assemble: Build cache segments and canonical
ContextPacket - Compile: Build provider-specific request-ready payloads
See docs/architecture.md for full component detail and docs/attention_fusion.md for the attention fusion design.
| Format | Extensions | Dependencies |
|---|---|---|
| Text | .txt, .log |
— |
| Documents | .pdf |
pdfminer.six |
.docx |
python-docx | |
| Structured | .csv, .tsv |
pandas |
.json, .jsonl |
— | |
| Images | .png, .jpg, .tiff |
Pillow, pytesseract |
| Code | .py, .js, .ts, .go, .rs, etc. |
tree-sitter (optional) |
| Markdown | .md |
— |
Copy configs/default.yaml or create your own config.yaml:
budget:
instructions: 1000
retrieval: 3000
memory: 2000
examples: 1500
tool_trace: 1000
output_reserve: 1000
scoring:
utility_weights:
retrieval: 0.25
trust: 0.20
freshness: 0.15
structure: 0.15
diversity: 0.15
token_cost: -0.10
provider:
name: anthropic
model: claude-sonnet-4-6
features:
use_attention_fusion: true
attention_temperature: 1.0Available providers: openai, anthropic, ollama, openai_compatible.
CFAdv formulates context selection as a multi-objective knapsack problem:
maximize Σ(
w_u * utility_i
- w_r * risk_i
- w_t * token_cost_i
- w_l * latency_cost_i
+ w_c * cacheability_i
+ w_d * diversity_i
) * z_i
subject to:
Σ(token_i * z_i) <= token_budget
z_i ∈ {0, 1}
After selection, contexts are reordered by query-dependent attention weights (softmax over cosine similarity between the query embedding and each context embedding), so the most relevant content appears first. See docs/algorithm.md.
CFAdv adds AttentionContextFusion and BlockAttentionFusion on top of the base planner,
inspired by Block Attention Residuals (AttnRes, arxiv 2603.15031):
- Each context is embedded with
bow_embedding(64-dim, L2-normalized, vocabulary-aware) - Query-to-context cosine similarity scores are computed and passed through temperature-scaled softmax
- Contexts are reordered by descending weight, so highest relevance appears first
BlockAttentionFusionapplies the same hierarchy to named blocks (system / history / retrieval / tools), using mean-pooled embeddings as block representatives for cross-block ranking
See docs/attention_fusion.md for the full design and formulas.
cpo precompute ./data --store-dir .cpo_cache/precompute --semantic-dedupStores fingerprints, summaries, compact variants, and retrieval features in .cpo_cache/precompute.
Use --precomputed-only in run/compile to avoid regeneration on cache hits.
| Mode | Packing strategy |
|---|---|
chat |
Concise context for standard conversation prompts |
qa |
Extractive evidence + citation-first packing |
code |
Signatures, changed regions, dependency-focused packing |
agent |
Working-memory and constraint deltas with optional incremental fusion |
Compression levels (none, light, medium, aggressive) apply:
- Citation map compaction (
Source URI→[id]) - JSON minification
- Schema field pruning for structured payloads
Use --delta with run or compile to compute incremental packet changes across turns:
- added blocks, updated blocks, removed blocks, unchanged block IDs
Each packet splits into stable and dynamic segments:
- stable: task/system instructions, citation maps, cacheable blocks
- dynamic: non-cacheable or volatile blocks
Enables reuse across repeated chat/agent turns and lowers effective prompt churn.
python examples/multiformat_ingestion_demo.py # multi-format ingestion
python examples/rag_context_optimizer.py # RAG-optimized context selection
python examples/memory_compaction_demo.py # memory management
python examples/ablation_demo.py # ablation studies
make examples # run all fourSee examples/EXAMPLE_RESULTS.md for latest run outputs.
cpo ui --host <host> --port 8080
# or
make uiOpen http://<host>:8080 to:
- Choose
Input Mode(DirectoryorFile list) and enter a path (e.g../examples/gui_input) - Set
Task Mode(chat,qa,code,agent) and enter a query - Set
Budget(token budget) - Pick
ProviderandModel(default:anthropic/claude-sonnet-4-6) - Click
Run Pipeline
The results panel shows run stats, representation usage, selected blocks, context preview, and model answer.
CFAdv is built on context-fusion and adds:
| Capability | context-fusion | CFAdv |
|---|---|---|
| Multiformat ingestion, normalization, scoring | ✓ | ✓ |
| Knapsack budget planner + BM25 retrieval | ✓ | ✓ |
| Compact representations, delta fusion, providers | ✓ | ✓ |
| Query-dependent context ordering | — | ✓ |
| Two-level block attention hierarchy | — | ✓ |
| Vocabulary-aware 64-dim embeddings (L2-norm) | — | ✓ |
docs/attention_fusion.md |
— | ✓ |
| Test count | ~49 | 72 |
For a detailed side-by-side, see docs/comparison.md.
make benchmark # tiny eval (baseline vs cf_uniform vs cf_attention)
make benchmark-weights # same with attention weight detail
make benchmark-api # live Anthropic API benchmark (requires .env)
make benchmark-all # all local benchmarksLatest tiny benchmark (2026-03-21, local deterministic):
| Mode | Avg tokens | Success | vs baseline |
|---|---|---|---|
| baseline | 99.0 | 100% | — |
| cf_uniform | 3.7 | 100% | −96.3% |
| cf_attention | 3.7 | 100% | −96.3% |
Latest Claude API benchmark (2026-03-21, claude-sonnet-4-6):
| Mode | Avg context tokens | Success |
|---|---|---|
| with_cfadv | 10.3 | 100% |
| without_cfadv | 947.0 | 100% |
Context-token reduction with CFAdv: 98.9%
Tiny benchmark — context tokens (lower is better)
With CFAdv 3.7 | █
Without CFAdv 99.0 | ████████████████████████
Claude API — context tokens (lower is better)
With CFAdv 10.3 | █
Without CFAdv 947.0 | ████████████████████████████████████████
See benchmarks/BENCHMARK_RESULTS.md, benchmarks/BENCHMARK_API_RESULTS.md, and
benchmarks/BENCHMARK_SUPPLEMENTAL_RESULTS.md for full per-task detail.
make test # run full suite
make test-cov # with coverage report
make test-integrationLatest run (2026-03-21): 72 passed, 0 failed. See tests/TEST_RESULTS.md.
Coverage highlights: attention_fusion.py 83%, planner.py 95%, bm25.py 97%, registry.py 98%.
Latest local smoke checks (2026-03-21):
- pipeline:
cpo run ./docs --budget 600 --query "Summarize key architecture points"— passed - GUI:
cpo ui --host <host> --port 8081— HTML served,/api/runresponded with JSON
make bootstrap # first-time setup
make install-dev # install package + dev tools + pre-commit hooks
make lint # ruff check
make format # ruff format
make type-check # mypy
make all-checks # format + lint + type-check + test
make build # build sdist + wheel
make docs # build MkDocs siteCFAdv/
├── README.md
├── CITATION.cff
├── CONTRIBUTING.md
├── SECURITY.md
├── pyproject.toml
├── Makefile
├── requirements.txt
├── requirements-dev.txt
├── .env.example
├── src/context_portfolio_optimizer/
│ ├── ingestion/ # File loaders (text, PDF, DOCX, CSV, JSON, image, code)
│ ├── normalization/ # ContextBlock building
│ ├── representations/ # Compact representation variants
│ ├── retrieval/ # BM25 + reranker + query classifier
│ ├── scoring/ # Utility and risk models
│ ├── allocation/ # Budget + knapsack + multi-objective planner
│ ├── dedup/ # Fingerprinting + duplicate collapse
│ ├── compression/ # JSON/citation/schema compression
│ ├── caching/ # Cache segment and packet cache
│ ├── fusion/ # Attention fusion + delta computation
│ ├── assembly/ # Provider-aware packet compiler
│ ├── ir/ # Canonical ContextPacket IR
│ ├── providers/ # Provider adapters + registry
│ ├── precompute/ # Offline precompute pipeline + bow_embedding
│ ├── orchestration/ # Pipeline runner
│ ├── memory/ # Memory storage + compaction
│ ├── agents/ # Agent loop support
│ ├── integrations/ # LangChain / LlamaIndex wrappers
│ ├── mcp_server/ # MCP-style server
│ ├── web_ui.py # Local visualization server
│ └── cli.py # Command-line interface (`cpo`)
├── configs/ # Provider and runtime YAML configs
├── docs/ # Architecture, algorithm, attention_fusion, comparison, CLI
├── benchmarks/ # Benchmark runners + result reports
├── examples/ # Demo scripts + GUI input samples
└── tests/ # Test suite (72 tests)
Local-only artifacts excluded by .gitignore: .env, virtualenvs, caches, coverage outputs.
@software{r2026cfadv,
author = {Rohan R},
title = {CFAdv},
year = {2026},
url = {https://github.com/rotsl/CFAdv},
version = {0.1.0},
orcid = {0009-0005-9225-1775}
}Apache-2.0. See LICENSE for details.
See CONTRIBUTING.md for guidelines.
- Additional file format support (EPUB, HTML)
- Learned utility models from feedback
- Distributed processing for large datasets
- Tighter integration with popular RAG frameworks
CFAdv builds on ideas from information retrieval and operations research. The attention fusion module is inspired by Block Attention Residuals (AttnRes, arxiv 2603.15031).
CFAdv: less context, more signal.