Score RAG chunks by information gain — not just relevance.
Vector databases and semantic chunkers retrieve by relevance (cosine similarity). A chunk can be highly relevant yet contain only facts the model already memorized during training — wasted context window. pymrsf uses the model's own predictive surprise to detect which chunks contain genuinely new information.
- Novelty: Does the model already know this? (surprise-based)
- Relevance: Is this related to the query? (cosine similarity)
- Query Ignorance: Does the model even know the answer? (probe-based gate)
- Diversity: Does a better chunk already cover this? (dedup post-filter)
# Start fast with an API provider (no 4 GB model download)
pip install pymrsf[openai]
export OPENAI_API_KEY='sk-...'
# Or for full features (probing, smart_chunk, round-trip):
pip install pymrsf[local]All providers require Ollama for embeddings:
ollama pull nomic-embed-textfrom pymrsf import score_chunk, filter_chunks
chunks = [
"Backpropagation computes gradients using the chain rule.",
"Neural networks are inspired by the human brain.",
"The sky is blue because of Rayleigh scattering.",
]
# Score a single chunk
result = score_chunk(chunks[0], query="How does backpropagation work?")
print(result["rag_score"]) # 0–100
print(result["verdict"]) # "excellent" / "good" / "moderate" / "weak" / "skip"
# Filter to only the useful chunks
useful = filter_chunks(chunks, query="How does backpropagation work?", min_rag_score=50)
# useful ≈ ["Backpropagation computes gradients..."]With async for production pipelines:
import asyncio
from pymrsf import filter_chunks_async
useful = asyncio.run(filter_chunks_async(chunks, query="...", min_rag_score=50))Instead of splitting at fixed sizes or sentence boundaries, smart_chunk uses the model's surprise signal to find natural knowledge transitions:
from pymrsf import smart_chunk
long_article = """
Quantum computing leverages superposition and entanglement to perform
calculations that would be infeasible for classical computers. Unlike
classical bits, qubits can exist in multiple states simultaneously.
...
Machine learning models learn patterns from data through iterative
optimization of a loss function. Neural networks, in particular,
use backpropagation to adjust millions of parameters.
...
"""
# Chunks split at the boundary between "quantum computing" and "ML" —
# where the model's surprise signal drops after absorbing one topic
pieces = smart_chunk(long_article, min_chunk_len=200, max_chunk_len=800)Requires the local provider. Falls back to sentence splitting for API providers.
This is the most important table in this README — it tells you which features work with which provider.
| Feature | local | openai | anthropic |
|---|---|---|---|
| RAG scoring | Full (novelty + relevance + ignorance) | Relevance-only | Relevance-only |
| Knowledge probing | ✅ Full | ❌ | |
| smart_chunk (surprise-guided) | ✅ Yes | Fallback to sentence | Fallback to sentence |
| Delta compression / round-trip | ✅ Yes | ❌ | ❌ |
| Model session (KV-cache) | ✅ Yes | ❌ | ❌ |
| Async scoring | ✅ | ✅ | ✅ |
| Score caching | ✅ | ✅ | ✅ |
Key takeaway: probing, smart_chunk, and the experimental round-trip storage all require the local provider (pip install pymrsf[local] + a GGUF model). If you only need relevance-based RAG scoring, OpenAI or Anthropic work fine.
import pymrsf
# Enable pymrsf log output (silent by default)
pymrsf.configure_logging("INFO")
# Tweak runtime settings without touching env vars
pymrsf.configure(
provider="openai",
embed_timeout=60,
default_relevance_cutoff=0.4,
)Environment variables for container/CI environments:
PYMRSF_PROVIDER=openai
OPENAI_API_KEY=sk-...
PYMRSF_ALLOW_PROVIDER_FALLBACK=true # silently fall back on embed failures
PYMRSF_EMBED_TIMEOUT=30PYMRSF_ALLOW_PROVIDER_FALLBACK— whentrue, embed failures log a warning and continue instead of raising. Off by default (fail-fast).pymrsf.configure_logging("WARNING")— pymrsf ships with aNullHandlersoimport pymrsfis silent until you opt in.
See ENV_CONFIG.md for all supported variables.
The round-trip storage backend stores only "surprise" tokens (40–60% compression) and reconstructs text via O(n) model inference. Import from pymrsf.experimental to signal the research-grade scope:
from pymrsf.experimental import mrsf_write, mrsf_read, save_index
doc = mrsf_write("The Eiffel Tower was built in 1889.")
print(doc["compression"]) # 0.47 — 47% of tokens were predictable
save_index()
results = mrsf_read("famous French landmark", top_k=1)| Score | Verdict | Suggested action |
|---|---|---|
| 80–100 | excellent | Prioritise |
| 60–79 | good | Include |
| 40–59 | moderate | Include if space allows |
| 20–39 | weak | Skip if better chunks exist |
| 0–19 | skip | Model already knows this |
- PROVIDER_SUPPORT.md — full capability matrix with programmatic checks
- ENV_CONFIG.md — all environment variables
- docs/CONCURRENCY.md — threading and process-safety model
- CHANGELOG.md — version history
The technical approach is described in the MRSF paper (link forthcoming). For now, see CHANGELOG.md for the research lineage and the experimental module for the delta-compression implementation.
MIT