Is there an existing issue for this?
Bug summary
Description
The KnowledgeGraph class currently utilis functools.lru_cache decorators on several instance methods (e.g., concept_view, predicate, edges, parents, etc.) to cache query results. This application-level caching introduces architectural risks, memory leaks, and provides negligible performance benefits, especially when running against an in-process SQLite backend.
As explicitly noted in the module-level comments:
# IMPORTANT: The lru_cache has access to self in each cache. We need to avoid this if we use it
# TODO: Get rid of the LRU cache and instead optimise the queries!
This issue tracks the complete removal of these global lru_cache decorators and shifts the responsibility of data caching and optimization to the underlying relational database layer.
Justification
-
Memory Leaks (Strong Instance References): Because lru_cache is a global decorator applied to instance methods, the self instance of KnowledgeGraph is implicitly included as part of the cache key. This creates strong references to the graph instance, preventing the Python Garbage Collector from cleaning up KnowledgeGraph objects or their associated resources until clear_caches() is manually invoked.
-
Redundancy with SQLite's In-Process Architecture: When utilizing a SQLite backend, the database engine is an in-process C library running within the same memory space as the Python application. SQLite's internal Page Cache already handles high-frequency index and table data retrieval in memory with microsecond latency. Storing fully hydrated Python objects via lru_cache creates a "cache of a cache," unnecessarily doubling the memory footprint.
-
Data Staleness & Cache Invalidation Risks: lru_cache operates blindly with respect to the state of the underlying relational database or transaction scopes. If database contents are mutated by an external process, a concurrent connection, or a different application state, the graph will continue to return stale snapshots out of Python memory, breaking data integrity expectations.
-
Code Simplicity: Removing the decorators eliminates the need to maintain an explicit, custom clear_caches() housekeeping method and simplifies unit testing environments where state separation between test cases is critical.
Proposed Changes
Code for reproduction
No needed. Part of the regular code.
Error messages
Is there an existing issue for this?
Bug summary
Description
The
KnowledgeGraphclass currently utilisfunctools.lru_cachedecorators on several instance methods (e.g.,concept_view,predicate,edges,parents, etc.) to cache query results. This application-level caching introduces architectural risks, memory leaks, and provides negligible performance benefits, especially when running against an in-process SQLite backend.As explicitly noted in the module-level comments:
This issue tracks the complete removal of these global
lru_cachedecorators and shifts the responsibility of data caching and optimization to the underlying relational database layer.Justification
Memory Leaks (Strong Instance References): Because
lru_cacheis a global decorator applied to instance methods, theselfinstance ofKnowledgeGraphis implicitly included as part of the cache key. This creates strong references to the graph instance, preventing the Python Garbage Collector from cleaning upKnowledgeGraphobjects or their associated resources untilclear_caches()is manually invoked.Redundancy with SQLite's In-Process Architecture: When utilizing a SQLite backend, the database engine is an in-process C library running within the same memory space as the Python application. SQLite's internal Page Cache already handles high-frequency index and table data retrieval in memory with microsecond latency. Storing fully hydrated Python objects via
lru_cachecreates a "cache of a cache," unnecessarily doubling the memory footprint.Data Staleness & Cache Invalidation Risks:
lru_cacheoperates blindly with respect to the state of the underlying relational database or transaction scopes. If database contents are mutated by an external process, a concurrent connection, or a different application state, the graph will continue to return stale snapshots out of Python memory, breaking data integrity expectations.Code Simplicity: Removing the decorators eliminates the need to maintain an explicit, custom
clear_caches()housekeeping method and simplifies unit testing environments where state separation between test cases is critical.Proposed Changes
@lru_cachedecorators fromKnowledgeGraphmethods insrc/graph/facade.py(or respective path).clear_caches()method from theKnowledgeGraphinterface.q_concept_view,q_edges) are backed by proper structural database indexes on the OMOP tables (CONCEPT,CONCEPT_ANCESTOR,CONCEPT_RELATIONSHIP).cache_clearcapability.Code for reproduction
Error messages