Summary
Add a result cache to the TTC Lambda pipeline so that repeat text candidates skip embedding, KNN, and reranking. Cache lives in a new OpenSearch index in the existing cluster, keyed by (normalized_text, data_field) with the final reranked LOINC Code as the value.
Motivation
Common lab test names (e.g. "Hemoglobin A1c", "BUN", "Creatinine") will likely repeat across eICRs. Currently every pipeline run pays for a SentenceTransformer encode, an OpenSearch KNN query, and a CrossEncoder rerank. The pipeline output is deterministic on (candidate text, data field), so a content-addressable cache eliminates the redundant work without changing semantics.
Proposed design
New cache index (ttc-result-cache, same OpenSearch domain, no KNN field):
{
"mappings": {
"properties": {
"cache_key": {"type": "keyword"},
"text": {"type": "keyword"},
"data_field": {"type": "keyword"},
"code": {"type": "object", "enabled": false},
"score": {"type": "float"},
"cached_at": {"type": "date"}
}
}
}
Key: sha256(text.strip().lower() + "|" + data_field.value), used as the OpenSearch doc _id.
Value: serialized Code (from shared_models) plus the reranker score and a cached_at timestamp for diagnostics.
Pipeline integration: in text-to-code-lambda/lambda_function.py, between select_relevant_text and embed:
- Compute key, attempt
GET /ttc-result-cache/_doc/{key}.
- Hit → reconstruct
Code, build NonstandardCodeInstance, emit cache_hit metric, continue.
- Miss → run existing embed/query/rerank, write the result back to the cache index, emit
cache_miss metric.
Encapsulate this in a new ResultCache service under packages/text-to-code/src/text_to_code/services/result_cache.py.
Invalidation: wipe-and-refill on reindex. The ttc-index-lambda is invoked at deploy time; extend it to DELETE /ttc-result-cache and recreate the empty index. When the OSIS pipeline reloads LOINC data, a follow-up invocation of the index Lambda clears the cache so stale entries can't survive a LOINC refresh.
Tasks
Out of scope / questions
- Caching of intermediate artifacts (embedding / raw KNN hits)? Should only the final reranked top result get cached, or should we store the retriever/reranker results?
- Per-key TTL or version-tagging. Wipe-and-refill on reindex is the only invalidation mechanism for v1.
Risks
- Cold-start penalty after reindex: every entry must be repopulated. Mitigated by the fact that reindex is monthly and warm-up is cheap relative to total volume.
- Cache index growth: at ~1 KB/entry, 1 M unique texts ≈ 1 GB. Well within current 10 GB EBS per node, but worth a CloudWatch alarm on index size.
- Filter logic drift: if
VectorSearchParams.filter_value mapping changes (e.g., new data fields, changed loinc_type semantics), cached results become incorrect for the new mapping. Mitigation: include filter-mapping version in the key, OR force a wipe whenever the mapping changes.
Summary
Add a result cache to the TTC Lambda pipeline so that repeat text candidates skip embedding, KNN, and reranking. Cache lives in a new OpenSearch index in the existing cluster, keyed by
(normalized_text, data_field)with the final reranked LOINCCodeas the value.Motivation
Common lab test names (e.g. "Hemoglobin A1c", "BUN", "Creatinine") will likely repeat across eICRs. Currently every pipeline run pays for a SentenceTransformer encode, an OpenSearch KNN query, and a CrossEncoder rerank. The pipeline output is deterministic on
(candidate text, data field), so a content-addressable cache eliminates the redundant work without changing semantics.Proposed design
New cache index (
ttc-result-cache, same OpenSearch domain, no KNN field):{ "mappings": { "properties": { "cache_key": {"type": "keyword"}, "text": {"type": "keyword"}, "data_field": {"type": "keyword"}, "code": {"type": "object", "enabled": false}, "score": {"type": "float"}, "cached_at": {"type": "date"} } } }Key:
sha256(text.strip().lower() + "|" + data_field.value), used as the OpenSearch doc_id.Value: serialized
Code(fromshared_models) plus the reranker score and acached_attimestamp for diagnostics.Pipeline integration: in
text-to-code-lambda/lambda_function.py, betweenselect_relevant_textandembed:GET /ttc-result-cache/_doc/{key}.Code, buildNonstandardCodeInstance, emitcache_hitmetric, continue.cache_missmetric.Encapsulate this in a new
ResultCacheservice underpackages/text-to-code/src/text_to_code/services/result_cache.py.Invalidation: wipe-and-refill on reindex. The
ttc-index-lambdais invoked at deploy time; extend it toDELETE /ttc-result-cacheand recreate the empty index. When the OSIS pipeline reloads LOINC data, a follow-up invocation of the index Lambda clears the cache so stale entries can't survive a LOINC refresh.Tasks
ResultCacheservice inpackages/text-to-code/src/text_to_code/services/result_cache.pywithget/putmethods and unit tests inpackages/text-to-code/tests/unit/test_result_cache.py.packages/index-lambda/src/index_lambda/lambda_function.pyto create (and on each invocation, drop and recreate) thettc-result-cacheindex.packages/text-to-code-lambda/src/text_to_code_lambda/lambda_function.pyper-error loop.cache_hit/cache_missCloudWatch metrics (via embedded metric format) and acache_lookup_mstiming.RESULT_CACHE_INDEX_NAMEenv var toterraform/main.tffor both the TTC and index Lambdas (defaultttc-result-cache).terraform/README.md.e2e/test_e2e.py).aws_e2e.sh.Out of scope / questions
Risks
VectorSearchParams.filter_valuemapping changes (e.g., new data fields, changedloinc_typesemantics), cached results become incorrect for the new mapping. Mitigation: include filter-mapping version in the key, OR force a wipe whenever the mapping changes.