Skip to content

Remove application-level lru_cache from KnowledgeGraph #12

@nicoloesch

Description

@nicoloesch

Is there an existing issue for this?

  • I have searched the existing issues

Bug summary

Description

The KnowledgeGraph class currently utilis functools.lru_cache decorators on several instance methods (e.g., concept_view, predicate, edges, parents, etc.) to cache query results. This application-level caching introduces architectural risks, memory leaks, and provides negligible performance benefits, especially when running against an in-process SQLite backend.

As explicitly noted in the module-level comments:

# IMPORTANT: The lru_cache has access to self in each cache. We need to avoid this if we use it
# TODO: Get rid of the LRU cache and instead optimise the queries!

This issue tracks the complete removal of these global lru_cache decorators and shifts the responsibility of data caching and optimization to the underlying relational database layer.

Justification

  1. Memory Leaks (Strong Instance References): Because lru_cache is a global decorator applied to instance methods, the self instance of KnowledgeGraph is implicitly included as part of the cache key. This creates strong references to the graph instance, preventing the Python Garbage Collector from cleaning up KnowledgeGraph objects or their associated resources until clear_caches() is manually invoked.

  2. Redundancy with SQLite's In-Process Architecture: When utilizing a SQLite backend, the database engine is an in-process C library running within the same memory space as the Python application. SQLite's internal Page Cache already handles high-frequency index and table data retrieval in memory with microsecond latency. Storing fully hydrated Python objects via lru_cache creates a "cache of a cache," unnecessarily doubling the memory footprint.

  3. Data Staleness & Cache Invalidation Risks: lru_cache operates blindly with respect to the state of the underlying relational database or transaction scopes. If database contents are mutated by an external process, a concurrent connection, or a different application state, the graph will continue to return stale snapshots out of Python memory, breaking data integrity expectations.

  4. Code Simplicity: Removing the decorators eliminates the need to maintain an explicit, custom clear_caches() housekeeping method and simplifies unit testing environments where state separation between test cases is critical.

Proposed Changes

  • Remove all @lru_cache decorators from KnowledgeGraph methods in src/graph/facade.py (or respective path).
  • Remove the clear_caches() method from the KnowledgeGraph interface.
  • Ensure that all core underlying queries (e.g., q_concept_view, q_edges) are backed by proper structural database indexes on the OMOP tables (CONCEPT, CONCEPT_ANCESTOR, CONCEPT_RELATIONSHIP).
  • Update any existing test suites that explicitly assert or rely on cache_clear capability.

Code for reproduction

No needed. Part of the regular code.

Error messages

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions