Skip to content

Updates for issue #17#28

Open
KnightofInd wants to merge 7 commits intotexasbe2trill:mainfrom
KnightofInd:main
Open

Updates for issue #17#28
KnightofInd wants to merge 7 commits intotexasbe2trill:mainfrom
KnightofInd:main

Conversation

@KnightofInd
Copy link
Copy Markdown
Contributor

🚀 Performance Optimization: Model Caching

Summary

Implements module-level caching for ML model artifacts, eliminating repeated disk I/O operations and providing massive performance improvements for batch processing scenarios.

Problem

Model artifacts (vectorizer + ML model) were loaded from disk on every CLI invocation, causing:

  • ~300ms overhead per load
  • ~30 seconds wasted for 100 incident batch processing
  • Unnecessary disk I/O stress

Solution

Implemented simple module-level cache using global variables:

  • Load models once on first call
  • Reuse cached objects for subsequent calls within same process
  • Automatic cleanup on process exit

Performance Gains

Metric Before After Improvement
First load 296.14 ms 21.28 ms 13.9x faster
Cached load 296.14 ms 0.0009 ms 22,885x faster
100 incidents batch ~29.6s ~0.02s 1,480x faster
Memory growth (50 calls) N/A +0.00 MB ✅ Stable

Changes

  • src/triage/cli.py - Use cached loader in load_artifacts()
  • src/triage/model.py - Enhanced documentation for caching behavior
  • scripts/benchmark_model_loading.py - Performance measurement tool
  • tests/test_model_caching.py - 7 comprehensive tests (all passing)
  • docs/performance.md - Complete optimization guide
  • README.md - Added performance feature references

Testing

  • ✅ 7/7 caching tests passing
  • ✅ Zero memory leaks (verified with 50+ calls)
  • ✅ Cache prevents disk reloading
  • ✅ Returns identical object references
  • ✅ CLI integration working correctly
  • ✅ Performance improvement verified by benchmark

Acceptance Criteria

  • Add model caching to prevent reloading
  • Measure startup time before and after
  • Document performance improvement
  • No memory leaks (verified with multiple invocations)
  • All tests pass
  • Update/create docs/performance.md
  • Include cache invalidation strategy

Breaking Changes

None - fully backward compatible

Documentation

See docs/performance.md for complete guide including:

  • Architecture details
  • Benchmark methodology
  • Cache invalidation strategies
  • Troubleshooting guide
  • Future enhancements

Closes #17

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements module-level caching for ML model artifacts to address issue #17, delivering massive performance improvements for batch processing scenarios.

Summary: The PR adds a simple but highly effective caching mechanism using module-level global variables that load models once and reuse them for subsequent calls within the same Python process. This eliminates repeated disk I/O operations.

Key changes:

  • Implemented module-level cache in src/triage/model.py with global variables _VECTORIZER and _MODEL
  • Updated src/triage/cli.py to use cached loader instead of direct joblib.load()
  • Added comprehensive test suite with 7 tests covering caching behavior, memory stability, and performance
  • Created benchmark tool to measure performance improvements
  • Added extensive documentation in docs/performance.md covering architecture, results, and troubleshooting
  • Created beginner-friendly tutorial notebook notebooks/00_getting_started_tutorial.ipynb

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
tests/test_model_caching.py Comprehensive test suite (7 tests) verifying cache behavior, memory stability, CLI integration, and performance gains
src/triage/model.py Enhanced docstring for load_vectorizer_and_model() explaining caching behavior, benefits, and invalidation strategy
src/triage/cli.py Updated load_artifacts() to use cached load_vectorizer_and_model() function instead of direct file loading
scripts/benchmark_model_loading.py Performance measurement tool comparing baseline vs cached loading with memory leak detection
notebooks/00_getting_started_tutorial.ipynb Interactive tutorial for new users covering setup, predictions, visualizations, and uncertainty analysis
docs/performance.md Complete optimization guide with architecture details, benchmark results, troubleshooting, and best practices
docs/notebooks.md Updated to include new tutorial notebook as recommended starting point
README.md Added performance optimization feature and documentation reference

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/triage/model.py
# Module-level cache for vectorizer and model artifacts
# These persist for the lifetime of the Python process, enabling:
# 1. Fast repeated predictions in batch mode
# 2. Reduced disk I/O overhead
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 11 states "2. Reduced disk I/O overhead" but this appears to be part of a list that should start with "1." However, there's no "1." visible in the provided diff. This comment may be misleading or incomplete without proper context.

Copilot uses AI. Check for mistakes.
Comment thread scripts/benchmark_model_loading.py Outdated
Comment on lines +60 to +61
vectorizer = joblib.load(vectorizer_path)
clf = joblib.load(model_path)
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable vectorizer is not used.

Suggested change
vectorizer = joblib.load(vectorizer_path)
clf = joblib.load(model_path)
joblib.load(vectorizer_path)
joblib.load(model_path)

Copilot uses AI. Check for mistakes.
Comment thread scripts/benchmark_model_loading.py Outdated
Comment on lines +60 to +61
vectorizer = joblib.load(vectorizer_path)
clf = joblib.load(model_path)
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable clf is not used.

Suggested change
vectorizer = joblib.load(vectorizer_path)
clf = joblib.load(model_path)
_ = joblib.load(vectorizer_path)
_ = joblib.load(model_path)

Copilot uses AI. Check for mistakes.
Comment thread scripts/benchmark_model_loading.py Outdated
# Subsequent calls (cache hits - should be nearly instant)
for i in range(10):
start = time.perf_counter()
vectorizer2, clf2 = load_vectorizer_and_model()
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable vectorizer2 is not used.

Suggested change
vectorizer2, clf2 = load_vectorizer_and_model()
load_vectorizer_and_model()

Copilot uses AI. Check for mistakes.
Comment thread scripts/benchmark_model_loading.py Outdated
# Subsequent calls (cache hits - should be nearly instant)
for i in range(10):
start = time.perf_counter()
vectorizer2, clf2 = load_vectorizer_and_model()
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable clf2 is not used.

Suggested change
vectorizer2, clf2 = load_vectorizer_and_model()
load_vectorizer_and_model()

Copilot uses AI. Check for mistakes.
def test_cache_consistency_across_predictions():
"""Test that cache works correctly for multiple load operations."""
from src.triage.model import load_vectorizer_and_model
import src.triage.model as model_module
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'src.triage.model' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.
Comment thread tests/test_model_caching.py Outdated
def test_cli_load_artifacts_uses_cache():
"""Test that CLI's load_artifacts() function uses the cache."""
from src.triage.cli import load_artifacts
import src.triage.model as model_module
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'src.triage.model' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.
def test_cache_memory_stability():
"""Test that repeated calls don't cause memory issues."""
from src.triage.model import load_vectorizer_and_model
import src.triage.model as model_module
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'src.triage.model' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.
def test_cache_invalidation_on_reset():
"""Test that cache can be manually invalidated."""
from src.triage.model import load_vectorizer_and_model
import src.triage.model as model_module
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'src.triage.model' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.
def test_concurrent_predictions_use_cache():
"""Test that cache provides performance benefit for repeated loads."""
from src.triage.model import load_vectorizer_and_model
import src.triage.model as model_module
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'src.triage.model' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.
@texasbe2trill texasbe2trill self-requested a review December 27, 2025 18:03
Copy link
Copy Markdown
Owner

@texasbe2trill texasbe2trill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KnightofInd Please review Copilots review and comments left on #28. I'll complete my review once the changes are implemented. There's also a conflict with main branch. Did you run git pull or sync your fork before implementation and opening this PR? Unless there were changes to notebooks/00_getting_started_tutorial.ipynb, this PR is attempting to merge another 5k lines or so of code. Once the changes are implemented, I'll review #28 for squash and merge into main Thanks again for the help!

KnightofInd added a commit to KnightofInd/AlertSage that referenced this pull request Dec 28, 2025
KnightofInd and others added 3 commits December 28, 2025 11:45
Implemented module-level cache to prevent repeated disk I/O. Models load once per process, providing massive speedup for batch operations.
- Modified load_artifacts() to dynamically resolve model module from sys.modules
- Fixes test_cli_load_artifacts_uses_cache failure in CI environment
- Handles both 'triage.model' (installed) and 'src.triage.model' (dev) import paths
- Ensures cache works correctly even when tests reload modules with importlib.reload()
- Resolves issue where cached imports referenced old module instances after reload
…olution

The test suite uses src.triage.model and may reload it with importlib.reload().
When both triage.model and src.triage.model exist in sys.modules (after pip install -e .),
we must check src.triage.model FIRST to get the correct reloaded module instance.

This ensures load_artifacts() always uses the same module instance that tests manipulate.
- Changed test to import triage.model (when installed) instead of always using src.triage.model
- Ensures test checks the same module object that load_artifacts() uses
- Fixes test failure where triage.model and src.triage.model were different objects in CI
@KnightofInd
Copy link
Copy Markdown
Contributor Author

Hi @texasbe2trill,

PR #28 has one test failing in CI but passing locally: test_cli_load_artifacts_uses_cache

Issue:
The test checks if load_artifacts() uses caching, but fails in CI due to module import resolution differences between environments. All other caching tests (5 tests) pass and confirm the underlying load_vectorizer_and_model() caching works correctly.

Options:

Remove this test (it's redundant since we already test caching directly)
Simplify the load_artifacts() import logic
The caching feature itself works - just this one wrapper test has environment-specific issues.

Which approach would you prefer?

Thanks!

@texasbe2trill
Copy link
Copy Markdown
Owner

Hi @texasbe2trill,

PR #28 has one test failing in CI but passing locally: test_cli_load_artifacts_uses_cache

Issue: The test checks if load_artifacts() uses caching, but fails in CI due to module import resolution differences between environments. All other caching tests (5 tests) pass and confirm the underlying load_vectorizer_and_model() caching works correctly.

Options:

Remove this test (it's redundant since we already test caching directly) Simplify the load_artifacts() import logic The caching feature itself works - just this one wrapper test has environment-specific issues.

Which approach would you prefer?

Thanks!

@KnightofInd I’ll take a look into this but thinking we can remove the redundant test for this implementation.

@texasbe2trill texasbe2trill moved this to Backlog in AlertSage Jan 3, 2026
@texasbe2trill texasbe2trill moved this from Backlog to In progress in AlertSage Jan 3, 2026
@texasbe2trill texasbe2trill moved this from In progress to In review in AlertSage Jan 3, 2026
@texasbe2trill
Copy link
Copy Markdown
Owner

@KnightofInd I'll review this in the morning before squash and merging into main. Thanks again!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 19 comments.

Comments suppressed due to low confidence (1)

src/triage/model.py:47

  • The module-level caching using global variables is not thread-safe. While the comment on line 189 mentions "Thread-safe enough: Python GIL protects reads/writes" in the documentation, this is misleading. The GIL only prevents race conditions for individual operations, but the check-then-set pattern in load_vectorizer_and_model() (lines 46-47) is not atomic.

If two threads call this function simultaneously when the cache is empty, both could pass the check on line 46 before either populates the cache, resulting in duplicate loads. While this might be acceptable for the current single-threaded CLI use case, the code should either:

  1. Add a threading.Lock to protect the cache access
  2. Document clearly that this is NOT thread-safe and should only be used in single-threaded contexts
  3. Use threading.local() if thread-safety is desired without locks
    global _VECTORIZER, _MODEL

    if _VECTORIZER is not None and _MODEL is not None:
        return _VECTORIZER, _MODEL

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +22 to +23
model_module._VECTORIZER = None
model_module._MODEL = None
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The global variable names _VECTORIZER and _MODEL are intended to be private (indicated by the leading underscore), but the tests directly access and modify them. This violates encapsulation and makes the tests tightly coupled to the implementation details.

Consider adding a public function to reset the cache for testing purposes, such as clear_cache() or reset_cache(), which would make the API cleaner and the tests less brittle if the internal variable names change in the future.

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +30
original_joblib_load = None
load_count = {"count": 0}

import joblib
original_joblib_load = joblib.load
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable original_joblib_load is initialized to None on line 26 and then immediately reassigned on line 30. The initial assignment is unnecessary and can be removed for cleaner code.

Copilot uses AI. Check for mistakes.
Comment on lines +218 to +227
# We expect at least 1000x improvement based on benchmark results
speedup = first_time / avg_cached_time if avg_cached_time > 0 else 0

print(f"\n First load: {first_time*1000:.2f} ms")
print(f" Avg cached load: {avg_cached_time*1000:.4f} ms")
print(f" Speedup: {speedup:.0f}x")

# Very conservative check - cache should be at least 100x faster
assert speedup > 100, \
f"Cache not providing significant speedup: {speedup:.1f}x (expected >100x)"
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance test assumes at least 100x speedup but uses time.perf_counter() which can be unreliable for very fast operations (sub-millisecond). On some systems or under heavy load, the cached load time could fluctuate enough to cause this test to be flaky.

Consider either:

  1. Reducing the threshold to something more conservative (e.g., 10x or 50x)
  2. Running the cached operations multiple times and averaging to get more stable measurements
  3. Adding a tolerance range instead of a hard threshold
Suggested change
# We expect at least 1000x improvement based on benchmark results
speedup = first_time / avg_cached_time if avg_cached_time > 0 else 0
print(f"\n First load: {first_time*1000:.2f} ms")
print(f" Avg cached load: {avg_cached_time*1000:.4f} ms")
print(f" Speedup: {speedup:.0f}x")
# Very conservative check - cache should be at least 100x faster
assert speedup > 100, \
f"Cache not providing significant speedup: {speedup:.1f}x (expected >100x)"
# We expect a substantial improvement, but use a conservative threshold
speedup = first_time / avg_cached_time if avg_cached_time > 0 else 0
print(f"\n First load: {first_time*1000:.2f} ms")
print(f" Avg cached load: {avg_cached_time*1000:.4f} ms")
print(f" Speedup: {speedup:.0f}x")
# Conservative check - cache should be at least 50x faster
assert speedup > 50, \
f"Cache not providing significant speedup: {speedup:.1f}x (expected >50x)"

Copilot uses AI. Check for mistakes.
Comment thread src/triage/cli.py
Comment on lines +1176 to +1178
from triage import model as model_module
except ImportError:
from src.triage import model as model_module
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module resolution logic here is overly complex and could behave unexpectedly. The code checks sys.modules first and then falls back to imports, but this creates potential issues:

  1. If a module is partially imported or in an error state, it could still be in sys.modules but unusable
  2. The fallback import order (triage first, then src.triage) is opposite to the sys.modules check order
  3. This complexity suggests the import structure may need refactoring

Consider simplifying this to use a consistent import approach, or at minimum, ensure the fallback order matches the sys.modules check order for consistency.

Suggested change
from triage import model as model_module
except ImportError:
from src.triage import model as model_module
from src.triage import model as model_module
except ImportError:
from triage import model as model_module

Copilot uses AI. Check for mistakes.
Comment thread docs/performance.md
1. **Simple**: No additional dependencies or complexity
2. **Effective**: Perfect for single-threaded CLI usage
3. **Process-scoped**: Automatic cleanup on process exit
4. **Thread-safe enough**: Python GIL protects reads/writes
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The claim "Thread-safe enough: Python GIL protects reads/writes" is misleading and technically incorrect. While the GIL prevents race conditions on individual operations, it does NOT make the check-then-set pattern in the caching logic thread-safe. Two threads could both pass the None check before either sets the cache, resulting in duplicate loads.

This should be updated to clearly state that the caching is NOT thread-safe and is designed for single-threaded CLI usage only. If thread-safety is needed in the future, a lock would be required.

Suggested change
4. **Thread-safe enough**: Python GIL protects reads/writes
4. **Single-threaded only**: Cache is *not* thread-safe; intended for CLI usage in a single process. A lock would be required for safe multi-threaded use.

Copilot uses AI. Check for mistakes.
def test_cache_memory_stability():
"""Test that repeated calls don't cause memory issues."""
from src.triage.model import load_vectorizer_and_model
import src.triage.model as model_module
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'src.triage.model' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.
def test_cache_invalidation_on_reset():
"""Test that cache can be manually invalidated."""
from src.triage.model import load_vectorizer_and_model
import src.triage.model as model_module
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'src.triage.model' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.
def test_concurrent_predictions_use_cache():
"""Test that cache provides performance benefit for repeated loads."""
from src.triage.model import load_vectorizer_and_model
import src.triage.model as model_module
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'src.triage.model' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.
4. Cache works correctly in CLI context
"""

import sys
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'sys' is not used.

Suggested change
import sys

Copilot uses AI. Check for mistakes.
"""

import sys
import importlib
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'importlib' is not used.

Suggested change
import importlib

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

Optimize Model Loading Performance

3 participants