Semantic Cache

An OpenClaw skill that caches LLM responses by meaning using Redis vector search. Similar questions return cached answers in ~100ms instead of making expensive API calls.

77x faster. 60-80% cost reduction. One command to install.

clawhub install semantic-cache

The Problem

Every OpenClaw agent makes LLM API calls. Many of these are semantically identical — "How do I reset my password?" and "I forgot my password, how do I change it?" are the same question worded differently. Without caching, each one costs tokens and takes seconds.

Exact-match caching doesn't help because users never ask the same question the same way twice.

The Solution

Semantic Cache embeds every query into a vector and stores it in Redis. When a new query comes in, it finds the most semantically similar cached query using cosine similarity. If the match is strong enough, it returns the cached response instantly.

First ask:  "What are the benefits of drinking water?"
            → Cache MISS → LLM call (9.2s, 398 tokens) → cached

Second ask: "What are the health benefits of drinking water regularly?"
            → Cache HIT (0.832 similarity, 119ms) → instant response, zero tokens

How It Works

Query → Embed (text-embedding-3-small) → Redis HNSW Vector Search
                                              ↓
                                    similarity > 0.80?
                                    ↓              ↓
                                   YES             NO
                                    ↓              ↓
                              Return cached    Call LLM → Cache → Return
                              (100ms)          (2-10s)

Incoming query is embedded into a 1536-dimension vector
Redis vector search (HNSW algorithm) finds the nearest cached query
If cosine similarity exceeds the threshold (default 0.80), return the cached response
If not, pass through to the LLM, cache the response for future similar queries

Performance

Metric	Value
Cache hit lookup	~100-120ms
Full LLM call	~2-10 seconds
Speedup	77x on cache hits
Embedding cost	~$0.00002 per query
Storage per entry	~6KB
Concurrent lookups	5 in 491ms (98ms avg)

Install

clawhub install semantic-cache

Environment Variables

Variable	Required	Description
`REDIS_URL`	Yes	Redis connection string (Redis Cloud or Redis Stack with vector search)
`OPENAI_API_KEY`	Yes	For generating embeddings
`SEMANTIC_CACHE_THRESHOLD`	No	Similarity threshold 0-1 (default: 0.80)
`SEMANTIC_CACHE_TTL`	No	Cache TTL in seconds (default: 86400 = 24 hours)

Redis Setup

Free tier on Redis Cloud works. Requires vector search support (included in Redis Cloud and Redis Stack).

Usage

Check cache, call LLM on miss, cache result

node scripts/cache.js query "How do I reset my password?"

Store a response manually

node scripts/cache.js store "What is your return policy?" "We offer 30-day returns on all items."

Look up without calling LLM

node scripts/cache.js lookup "How can I return a product?"

View cache stats

node scripts/cache.js stats

Clear cache

node scripts/cache.js clear

Run tests

node scripts/cache.js test

Test Results

=== SEMANTIC CACHE STRESS TEST ===

Test 1: Store and exact recall
  PASS: Exact match is a hit
  PASS: Similarity is ~1.0
  PASS: Lookup under 500ms

Test 2: Semantic paraphrase
  PASS: Paraphrase detected as similar

Test 3: Completely different query
  PASS: Different query is a miss
  PASS: Low similarity

Test 4: Multiple entries
  PASS: Cancel query matches cancel entry
  PASS: Payment query matches payment entry
  PASS: Docs query matches docs entry

Test 5: Edge cases
  PASS: Empty string doesn't crash
  PASS: Single char doesn't crash
  PASS: Very long query doesn't crash

Test 6: Concurrent lookups
  PASS: 5 concurrent lookups complete
  PASS: Total time under 3s (491ms for 5 queries)

Test 7: Hit count tracking
  PASS: Entries exist

=== RESULTS ===
Passed: 15/15
Failed: 0/15

Tuning the Threshold

The SEMANTIC_CACHE_THRESHOLD controls the tradeoff between cache hits and accuracy:

Threshold	Behavior	Best For
0.70	Aggressive — more hits, higher false positive risk	FAQ bots, support with limited question variety
0.80	Balanced (default) — good hit rate, low false positives	General purpose
0.90	Conservative — fewer hits, very precise matching	Code generation, technical queries where precision matters
0.95	Strict — near-exact matches only	Safety-critical applications

Architecture

┌─────────────────────────────────────────────┐
│                OpenClaw Agent               │
├─────────────────────────────────────────────┤
│              Semantic Cache Skill            │
│                                             │
│  ┌──────────┐   ┌────────────────────────┐  │
│  │  OpenAI   │   │    Redis Cloud         │  │
│  │ Embeddings│   │  ┌──────────────────┐  │  │
│  │           │   │  │  HNSW Vector     │  │  │
│  │ text-     │──>│  │  Search Index    │  │  │
│  │ embedding-│   │  │                  │  │  │
│  │ 3-small   │   │  │  1536-dim float  │  │  │
│  └──────────┘   │  │  cosine distance  │  │  │
│                  │  └──────────────────┘  │  │
│                  └────────────────────────┘  │
└─────────────────────────────────────────────┘

Built With

Redis (Gold Sponsor) — HNSW vector search for semantic similarity matching
OpenAI — text-embedding-3-small for query vectorization
OpenClaw — Skill framework and ClawHub publishing

License

MIT-0 (ClawHub standard)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
scripts		scripts
.gitignore		.gitignore
README.md		README.md
SKILL.md		SKILL.md
demo.gif		demo.gif
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Cache

The Problem

The Solution

How It Works

Performance

Install

Environment Variables

Redis Setup

Usage

Check cache, call LLM on miss, cache result

Store a response manually

Look up without calling LLM

View cache stats

Clear cache

Run tests

Test Results

Tuning the Threshold

Architecture

Built With

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Semantic Cache

The Problem

The Solution

How It Works

Performance

Install

Environment Variables

Redis Setup

Usage

Check cache, call LLM on miss, cache result

Store a response manually

Look up without calling LLM

View cache stats

Clear cache

Run tests

Test Results

Tuning the Threshold

Architecture

Built With

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages