Skip to content

rylinjames/openclaw-semantic-cache

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic Cache

An OpenClaw skill that caches LLM responses by meaning using Redis vector search. Similar questions return cached answers in ~100ms instead of making expensive API calls.

77x faster. 60-80% cost reduction. One command to install.

Demo

clawhub install semantic-cache

The Problem

Every OpenClaw agent makes LLM API calls. Many of these are semantically identical — "How do I reset my password?" and "I forgot my password, how do I change it?" are the same question worded differently. Without caching, each one costs tokens and takes seconds.

Exact-match caching doesn't help because users never ask the same question the same way twice.

The Solution

Semantic Cache embeds every query into a vector and stores it in Redis. When a new query comes in, it finds the most semantically similar cached query using cosine similarity. If the match is strong enough, it returns the cached response instantly.

First ask:  "What are the benefits of drinking water?"
            → Cache MISS → LLM call (9.2s, 398 tokens) → cached

Second ask: "What are the health benefits of drinking water regularly?"
            → Cache HIT (0.832 similarity, 119ms) → instant response, zero tokens

How It Works

Query → Embed (text-embedding-3-small) → Redis HNSW Vector Search
                                              ↓
                                    similarity > 0.80?
                                    ↓              ↓
                                   YES             NO
                                    ↓              ↓
                              Return cached    Call LLM → Cache → Return
                              (100ms)          (2-10s)
  1. Incoming query is embedded into a 1536-dimension vector
  2. Redis vector search (HNSW algorithm) finds the nearest cached query
  3. If cosine similarity exceeds the threshold (default 0.80), return the cached response
  4. If not, pass through to the LLM, cache the response for future similar queries

Performance

Metric Value
Cache hit lookup ~100-120ms
Full LLM call ~2-10 seconds
Speedup 77x on cache hits
Embedding cost ~$0.00002 per query
Storage per entry ~6KB
Concurrent lookups 5 in 491ms (98ms avg)

Install

clawhub install semantic-cache

Environment Variables

Variable Required Description
REDIS_URL Yes Redis connection string (Redis Cloud or Redis Stack with vector search)
OPENAI_API_KEY Yes For generating embeddings
SEMANTIC_CACHE_THRESHOLD No Similarity threshold 0-1 (default: 0.80)
SEMANTIC_CACHE_TTL No Cache TTL in seconds (default: 86400 = 24 hours)

Redis Setup

Free tier on Redis Cloud works. Requires vector search support (included in Redis Cloud and Redis Stack).

Usage

Check cache, call LLM on miss, cache result

node scripts/cache.js query "How do I reset my password?"

Store a response manually

node scripts/cache.js store "What is your return policy?" "We offer 30-day returns on all items."

Look up without calling LLM

node scripts/cache.js lookup "How can I return a product?"

View cache stats

node scripts/cache.js stats

Clear cache

node scripts/cache.js clear

Run tests

node scripts/cache.js test

Test Results

=== SEMANTIC CACHE STRESS TEST ===

Test 1: Store and exact recall
  PASS: Exact match is a hit
  PASS: Similarity is ~1.0
  PASS: Lookup under 500ms

Test 2: Semantic paraphrase
  PASS: Paraphrase detected as similar

Test 3: Completely different query
  PASS: Different query is a miss
  PASS: Low similarity

Test 4: Multiple entries
  PASS: Cancel query matches cancel entry
  PASS: Payment query matches payment entry
  PASS: Docs query matches docs entry

Test 5: Edge cases
  PASS: Empty string doesn't crash
  PASS: Single char doesn't crash
  PASS: Very long query doesn't crash

Test 6: Concurrent lookups
  PASS: 5 concurrent lookups complete
  PASS: Total time under 3s (491ms for 5 queries)

Test 7: Hit count tracking
  PASS: Entries exist

=== RESULTS ===
Passed: 15/15
Failed: 0/15

Tuning the Threshold

The SEMANTIC_CACHE_THRESHOLD controls the tradeoff between cache hits and accuracy:

Threshold Behavior Best For
0.70 Aggressive — more hits, higher false positive risk FAQ bots, support with limited question variety
0.80 Balanced (default) — good hit rate, low false positives General purpose
0.90 Conservative — fewer hits, very precise matching Code generation, technical queries where precision matters
0.95 Strict — near-exact matches only Safety-critical applications

Architecture

┌─────────────────────────────────────────────┐
│                OpenClaw Agent               │
├─────────────────────────────────────────────┤
│              Semantic Cache Skill            │
│                                             │
│  ┌──────────┐   ┌────────────────────────┐  │
│  │  OpenAI   │   │    Redis Cloud         │  │
│  │ Embeddings│   │  ┌──────────────────┐  │  │
│  │           │   │  │  HNSW Vector     │  │  │
│  │ text-     │──>│  │  Search Index    │  │  │
│  │ embedding-│   │  │                  │  │  │
│  │ 3-small   │   │  │  1536-dim float  │  │  │
│  └──────────┘   │  │  cosine distance  │  │  │
│                  │  └──────────────────┘  │  │
│                  └────────────────────────┘  │
└─────────────────────────────────────────────┘

Built With

  • Redis (Gold Sponsor) — HNSW vector search for semantic similarity matching
  • OpenAI — text-embedding-3-small for query vectorization
  • OpenClaw — Skill framework and ClawHub publishing

License

MIT-0 (ClawHub standard)

About

OpenClaw skill — Semantic cache for LLM API calls using Redis vector search. Cache by meaning, not exact match. 77x faster responses.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors