Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements module-level caching for ML model artifacts to address issue #17, delivering massive performance improvements for batch processing scenarios.
Summary: The PR adds a simple but highly effective caching mechanism using module-level global variables that load models once and reuse them for subsequent calls within the same Python process. This eliminates repeated disk I/O operations.
Key changes:
- Implemented module-level cache in
src/triage/model.pywith global variables_VECTORIZERand_MODEL - Updated
src/triage/cli.pyto use cached loader instead of directjoblib.load() - Added comprehensive test suite with 7 tests covering caching behavior, memory stability, and performance
- Created benchmark tool to measure performance improvements
- Added extensive documentation in
docs/performance.mdcovering architecture, results, and troubleshooting - Created beginner-friendly tutorial notebook
notebooks/00_getting_started_tutorial.ipynb
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_model_caching.py | Comprehensive test suite (7 tests) verifying cache behavior, memory stability, CLI integration, and performance gains |
| src/triage/model.py | Enhanced docstring for load_vectorizer_and_model() explaining caching behavior, benefits, and invalidation strategy |
| src/triage/cli.py | Updated load_artifacts() to use cached load_vectorizer_and_model() function instead of direct file loading |
| scripts/benchmark_model_loading.py | Performance measurement tool comparing baseline vs cached loading with memory leak detection |
| notebooks/00_getting_started_tutorial.ipynb | Interactive tutorial for new users covering setup, predictions, visualizations, and uncertainty analysis |
| docs/performance.md | Complete optimization guide with architecture details, benchmark results, troubleshooting, and best practices |
| docs/notebooks.md | Updated to include new tutorial notebook as recommended starting point |
| README.md | Added performance optimization feature and documentation reference |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Module-level cache for vectorizer and model artifacts | ||
| # These persist for the lifetime of the Python process, enabling: | ||
| # 1. Fast repeated predictions in batch mode | ||
| # 2. Reduced disk I/O overhead |
There was a problem hiding this comment.
The comment on line 11 states "2. Reduced disk I/O overhead" but this appears to be part of a list that should start with "1." However, there's no "1." visible in the provided diff. This comment may be misleading or incomplete without proper context.
| vectorizer = joblib.load(vectorizer_path) | ||
| clf = joblib.load(model_path) |
There was a problem hiding this comment.
Variable vectorizer is not used.
| vectorizer = joblib.load(vectorizer_path) | |
| clf = joblib.load(model_path) | |
| joblib.load(vectorizer_path) | |
| joblib.load(model_path) |
| vectorizer = joblib.load(vectorizer_path) | ||
| clf = joblib.load(model_path) |
There was a problem hiding this comment.
Variable clf is not used.
| vectorizer = joblib.load(vectorizer_path) | |
| clf = joblib.load(model_path) | |
| _ = joblib.load(vectorizer_path) | |
| _ = joblib.load(model_path) |
| # Subsequent calls (cache hits - should be nearly instant) | ||
| for i in range(10): | ||
| start = time.perf_counter() | ||
| vectorizer2, clf2 = load_vectorizer_and_model() |
There was a problem hiding this comment.
Variable vectorizer2 is not used.
| vectorizer2, clf2 = load_vectorizer_and_model() | |
| load_vectorizer_and_model() |
| # Subsequent calls (cache hits - should be nearly instant) | ||
| for i in range(10): | ||
| start = time.perf_counter() | ||
| vectorizer2, clf2 = load_vectorizer_and_model() |
There was a problem hiding this comment.
Variable clf2 is not used.
| vectorizer2, clf2 = load_vectorizer_and_model() | |
| load_vectorizer_and_model() |
| def test_cache_consistency_across_predictions(): | ||
| """Test that cache works correctly for multiple load operations.""" | ||
| from src.triage.model import load_vectorizer_and_model | ||
| import src.triage.model as model_module |
There was a problem hiding this comment.
Module 'src.triage.model' is imported with both 'import' and 'import from'.
| def test_cli_load_artifacts_uses_cache(): | ||
| """Test that CLI's load_artifacts() function uses the cache.""" | ||
| from src.triage.cli import load_artifacts | ||
| import src.triage.model as model_module |
There was a problem hiding this comment.
Module 'src.triage.model' is imported with both 'import' and 'import from'.
| def test_cache_memory_stability(): | ||
| """Test that repeated calls don't cause memory issues.""" | ||
| from src.triage.model import load_vectorizer_and_model | ||
| import src.triage.model as model_module |
There was a problem hiding this comment.
Module 'src.triage.model' is imported with both 'import' and 'import from'.
| def test_cache_invalidation_on_reset(): | ||
| """Test that cache can be manually invalidated.""" | ||
| from src.triage.model import load_vectorizer_and_model | ||
| import src.triage.model as model_module |
There was a problem hiding this comment.
Module 'src.triage.model' is imported with both 'import' and 'import from'.
| def test_concurrent_predictions_use_cache(): | ||
| """Test that cache provides performance benefit for repeated loads.""" | ||
| from src.triage.model import load_vectorizer_and_model | ||
| import src.triage.model as model_module |
There was a problem hiding this comment.
Module 'src.triage.model' is imported with both 'import' and 'import from'.
texasbe2trill
left a comment
There was a problem hiding this comment.
@KnightofInd Please review Copilots review and comments left on #28. I'll complete my review once the changes are implemented. There's also a conflict with main branch. Did you run git pull or sync your fork before implementation and opening this PR? Unless there were changes to notebooks/00_getting_started_tutorial.ipynb, this PR is attempting to merge another 5k lines or so of code. Once the changes are implemented, I'll review #28 for squash and merge into main Thanks again for the help!
Implemented module-level cache to prevent repeated disk I/O. Models load once per process, providing massive speedup for batch operations.
Resolving the conflicts
- Modified load_artifacts() to dynamically resolve model module from sys.modules - Fixes test_cli_load_artifacts_uses_cache failure in CI environment - Handles both 'triage.model' (installed) and 'src.triage.model' (dev) import paths - Ensures cache works correctly even when tests reload modules with importlib.reload() - Resolves issue where cached imports referenced old module instances after reload
…olution The test suite uses src.triage.model and may reload it with importlib.reload(). When both triage.model and src.triage.model exist in sys.modules (after pip install -e .), we must check src.triage.model FIRST to get the correct reloaded module instance. This ensures load_artifacts() always uses the same module instance that tests manipulate.
- Changed test to import triage.model (when installed) instead of always using src.triage.model - Ensures test checks the same module object that load_artifacts() uses - Fixes test failure where triage.model and src.triage.model were different objects in CI
|
Hi @texasbe2trill, PR #28 has one test failing in CI but passing locally: test_cli_load_artifacts_uses_cache Issue: Options: Remove this test (it's redundant since we already test caching directly) Which approach would you prefer? Thanks! |
@KnightofInd I’ll take a look into this but thinking we can remove the redundant test for this implementation. |
|
@KnightofInd I'll review this in the morning before squash and merging into main. Thanks again! |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 7 changed files in this pull request and generated 19 comments.
Comments suppressed due to low confidence (1)
src/triage/model.py:47
- The module-level caching using global variables is not thread-safe. While the comment on line 189 mentions "Thread-safe enough: Python GIL protects reads/writes" in the documentation, this is misleading. The GIL only prevents race conditions for individual operations, but the check-then-set pattern in
load_vectorizer_and_model()(lines 46-47) is not atomic.
If two threads call this function simultaneously when the cache is empty, both could pass the check on line 46 before either populates the cache, resulting in duplicate loads. While this might be acceptable for the current single-threaded CLI use case, the code should either:
- Add a threading.Lock to protect the cache access
- Document clearly that this is NOT thread-safe and should only be used in single-threaded contexts
- Use threading.local() if thread-safety is desired without locks
global _VECTORIZER, _MODEL
if _VECTORIZER is not None and _MODEL is not None:
return _VECTORIZER, _MODEL
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| model_module._VECTORIZER = None | ||
| model_module._MODEL = None |
There was a problem hiding this comment.
The global variable names _VECTORIZER and _MODEL are intended to be private (indicated by the leading underscore), but the tests directly access and modify them. This violates encapsulation and makes the tests tightly coupled to the implementation details.
Consider adding a public function to reset the cache for testing purposes, such as clear_cache() or reset_cache(), which would make the API cleaner and the tests less brittle if the internal variable names change in the future.
| original_joblib_load = None | ||
| load_count = {"count": 0} | ||
|
|
||
| import joblib | ||
| original_joblib_load = joblib.load |
There was a problem hiding this comment.
The variable original_joblib_load is initialized to None on line 26 and then immediately reassigned on line 30. The initial assignment is unnecessary and can be removed for cleaner code.
| # We expect at least 1000x improvement based on benchmark results | ||
| speedup = first_time / avg_cached_time if avg_cached_time > 0 else 0 | ||
|
|
||
| print(f"\n First load: {first_time*1000:.2f} ms") | ||
| print(f" Avg cached load: {avg_cached_time*1000:.4f} ms") | ||
| print(f" Speedup: {speedup:.0f}x") | ||
|
|
||
| # Very conservative check - cache should be at least 100x faster | ||
| assert speedup > 100, \ | ||
| f"Cache not providing significant speedup: {speedup:.1f}x (expected >100x)" |
There was a problem hiding this comment.
The performance test assumes at least 100x speedup but uses time.perf_counter() which can be unreliable for very fast operations (sub-millisecond). On some systems or under heavy load, the cached load time could fluctuate enough to cause this test to be flaky.
Consider either:
- Reducing the threshold to something more conservative (e.g., 10x or 50x)
- Running the cached operations multiple times and averaging to get more stable measurements
- Adding a tolerance range instead of a hard threshold
| # We expect at least 1000x improvement based on benchmark results | |
| speedup = first_time / avg_cached_time if avg_cached_time > 0 else 0 | |
| print(f"\n First load: {first_time*1000:.2f} ms") | |
| print(f" Avg cached load: {avg_cached_time*1000:.4f} ms") | |
| print(f" Speedup: {speedup:.0f}x") | |
| # Very conservative check - cache should be at least 100x faster | |
| assert speedup > 100, \ | |
| f"Cache not providing significant speedup: {speedup:.1f}x (expected >100x)" | |
| # We expect a substantial improvement, but use a conservative threshold | |
| speedup = first_time / avg_cached_time if avg_cached_time > 0 else 0 | |
| print(f"\n First load: {first_time*1000:.2f} ms") | |
| print(f" Avg cached load: {avg_cached_time*1000:.4f} ms") | |
| print(f" Speedup: {speedup:.0f}x") | |
| # Conservative check - cache should be at least 50x faster | |
| assert speedup > 50, \ | |
| f"Cache not providing significant speedup: {speedup:.1f}x (expected >50x)" |
| from triage import model as model_module | ||
| except ImportError: | ||
| from src.triage import model as model_module |
There was a problem hiding this comment.
The module resolution logic here is overly complex and could behave unexpectedly. The code checks sys.modules first and then falls back to imports, but this creates potential issues:
- If a module is partially imported or in an error state, it could still be in
sys.modulesbut unusable - The fallback import order (triage first, then src.triage) is opposite to the sys.modules check order
- This complexity suggests the import structure may need refactoring
Consider simplifying this to use a consistent import approach, or at minimum, ensure the fallback order matches the sys.modules check order for consistency.
| from triage import model as model_module | |
| except ImportError: | |
| from src.triage import model as model_module | |
| from src.triage import model as model_module | |
| except ImportError: | |
| from triage import model as model_module |
| 1. **Simple**: No additional dependencies or complexity | ||
| 2. **Effective**: Perfect for single-threaded CLI usage | ||
| 3. **Process-scoped**: Automatic cleanup on process exit | ||
| 4. **Thread-safe enough**: Python GIL protects reads/writes |
There was a problem hiding this comment.
The claim "Thread-safe enough: Python GIL protects reads/writes" is misleading and technically incorrect. While the GIL prevents race conditions on individual operations, it does NOT make the check-then-set pattern in the caching logic thread-safe. Two threads could both pass the None check before either sets the cache, resulting in duplicate loads.
This should be updated to clearly state that the caching is NOT thread-safe and is designed for single-threaded CLI usage only. If thread-safety is needed in the future, a lock would be required.
| 4. **Thread-safe enough**: Python GIL protects reads/writes | |
| 4. **Single-threaded only**: Cache is *not* thread-safe; intended for CLI usage in a single process. A lock would be required for safe multi-threaded use. |
| def test_cache_memory_stability(): | ||
| """Test that repeated calls don't cause memory issues.""" | ||
| from src.triage.model import load_vectorizer_and_model | ||
| import src.triage.model as model_module |
There was a problem hiding this comment.
Module 'src.triage.model' is imported with both 'import' and 'import from'.
| def test_cache_invalidation_on_reset(): | ||
| """Test that cache can be manually invalidated.""" | ||
| from src.triage.model import load_vectorizer_and_model | ||
| import src.triage.model as model_module |
There was a problem hiding this comment.
Module 'src.triage.model' is imported with both 'import' and 'import from'.
| def test_concurrent_predictions_use_cache(): | ||
| """Test that cache provides performance benefit for repeated loads.""" | ||
| from src.triage.model import load_vectorizer_and_model | ||
| import src.triage.model as model_module |
There was a problem hiding this comment.
Module 'src.triage.model' is imported with both 'import' and 'import from'.
| 4. Cache works correctly in CLI context | ||
| """ | ||
|
|
||
| import sys |
There was a problem hiding this comment.
Import of 'sys' is not used.
| import sys |
| """ | ||
|
|
||
| import sys | ||
| import importlib |
There was a problem hiding this comment.
Import of 'importlib' is not used.
| import importlib |
🚀 Performance Optimization: Model Caching
Summary
Implements module-level caching for ML model artifacts, eliminating repeated disk I/O operations and providing massive performance improvements for batch processing scenarios.
Problem
Model artifacts (vectorizer + ML model) were loaded from disk on every CLI invocation, causing:
Solution
Implemented simple module-level cache using global variables:
Performance Gains
Changes
src/triage/cli.py- Use cached loader inload_artifacts()src/triage/model.py- Enhanced documentation for caching behaviorscripts/benchmark_model_loading.py- Performance measurement tooltests/test_model_caching.py- 7 comprehensive tests (all passing)docs/performance.md- Complete optimization guideREADME.md- Added performance feature referencesTesting
Acceptance Criteria
Breaking Changes
None - fully backward compatible
Documentation
See docs/performance.md for complete guide including:
Closes #17