Updates for issue #17 by KnightofInd · Pull Request #28 · texasbe2trill/AlertSage

KnightofInd · 2025-12-27T17:44:18Z

🚀 Performance Optimization: Model Caching

Summary

Implements module-level caching for ML model artifacts, eliminating repeated disk I/O operations and providing massive performance improvements for batch processing scenarios.

Problem

Model artifacts (vectorizer + ML model) were loaded from disk on every CLI invocation, causing:

~300ms overhead per load
~30 seconds wasted for 100 incident batch processing
Unnecessary disk I/O stress

Solution

Implemented simple module-level cache using global variables:

Load models once on first call
Reuse cached objects for subsequent calls within same process
Automatic cleanup on process exit

Performance Gains

Metric	Before	After	Improvement
First load	296.14 ms	21.28 ms	13.9x faster
Cached load	296.14 ms	0.0009 ms	22,885x faster
100 incidents batch	~29.6s	~0.02s	1,480x faster
Memory growth (50 calls)	N/A	+0.00 MB	✅ Stable

Changes

✅ src/triage/cli.py - Use cached loader in load_artifacts()
✅ src/triage/model.py - Enhanced documentation for caching behavior
✅ scripts/benchmark_model_loading.py - Performance measurement tool
✅ tests/test_model_caching.py - 7 comprehensive tests (all passing)
✅ docs/performance.md - Complete optimization guide
✅ README.md - Added performance feature references

Testing

✅ 7/7 caching tests passing
✅ Zero memory leaks (verified with 50+ calls)
✅ Cache prevents disk reloading
✅ Returns identical object references
✅ CLI integration working correctly
✅ Performance improvement verified by benchmark

Acceptance Criteria

Add model caching to prevent reloading
Measure startup time before and after
Document performance improvement
No memory leaks (verified with multiple invocations)
All tests pass
Update/create docs/performance.md
Include cache invalidation strategy

Breaking Changes

None - fully backward compatible

Documentation

See docs/performance.md for complete guide including:

Architecture details
Benchmark methodology
Cache invalidation strategies
Troubleshooting guide
Future enhancements

Closes #17

Copilot

Pull request overview

This PR implements module-level caching for ML model artifacts to address issue #17, delivering massive performance improvements for batch processing scenarios.

Summary: The PR adds a simple but highly effective caching mechanism using module-level global variables that load models once and reuse them for subsequent calls within the same Python process. This eliminates repeated disk I/O operations.

Key changes:

Implemented module-level cache in src/triage/model.py with global variables _VECTORIZER and _MODEL
Updated src/triage/cli.py to use cached loader instead of direct joblib.load()
Added comprehensive test suite with 7 tests covering caching behavior, memory stability, and performance
Created benchmark tool to measure performance improvements
Added extensive documentation in docs/performance.md covering architecture, results, and troubleshooting
Created beginner-friendly tutorial notebook notebooks/00_getting_started_tutorial.ipynb

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 18 comments.

Show a summary per file

File	Description
tests/test_model_caching.py	Comprehensive test suite (7 tests) verifying cache behavior, memory stability, CLI integration, and performance gains
src/triage/model.py	Enhanced docstring for `load_vectorizer_and_model()` explaining caching behavior, benefits, and invalidation strategy
src/triage/cli.py	Updated `load_artifacts()` to use cached `load_vectorizer_and_model()` function instead of direct file loading
scripts/benchmark_model_loading.py	Performance measurement tool comparing baseline vs cached loading with memory leak detection
notebooks/00_getting_started_tutorial.ipynb	Interactive tutorial for new users covering setup, predictions, visualizations, and uncertainty analysis
docs/performance.md	Complete optimization guide with architecture details, benchmark results, troubleshooting, and best practices
docs/notebooks.md	Updated to include new tutorial notebook as recommended starting point
README.md	Added performance optimization feature and documentation reference

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-27T17:56:23Z

+# Module-level cache for vectorizer and model artifacts
+# These persist for the lifetime of the Python process, enabling:
+# 1. Fast repeated predictions in batch mode
+# 2. Reduced disk I/O overhead  


The comment on line 11 states "2. Reduced disk I/O overhead" but this appears to be part of a list that should start with "1." However, there's no "1." visible in the provided diff. This comment may be misleading or incomplete without proper context.

Copilot · 2025-12-27T17:56:23Z

+        vectorizer = joblib.load(vectorizer_path)
+        clf = joblib.load(model_path)


Variable vectorizer is not used.

Suggested change

vectorizer = joblib.load(vectorizer_path)

clf = joblib.load(model_path)

joblib.load(vectorizer_path)

joblib.load(model_path)

Copilot · 2025-12-27T17:56:24Z

+        vectorizer = joblib.load(vectorizer_path)
+        clf = joblib.load(model_path)


Variable clf is not used.

Suggested change

vectorizer = joblib.load(vectorizer_path)

clf = joblib.load(model_path)

_ = joblib.load(vectorizer_path)

_ = joblib.load(model_path)

Copilot · 2025-12-27T17:56:24Z

+    # Subsequent calls (cache hits - should be nearly instant)
+    for i in range(10):
+        start = time.perf_counter()
+        vectorizer2, clf2 = load_vectorizer_and_model()


Variable vectorizer2 is not used.

Suggested change

vectorizer2, clf2 = load_vectorizer_and_model()

load_vectorizer_and_model()

Copilot · 2025-12-27T17:56:24Z

+    # Subsequent calls (cache hits - should be nearly instant)
+    for i in range(10):
+        start = time.perf_counter()
+        vectorizer2, clf2 = load_vectorizer_and_model()


Variable clf2 is not used.

Suggested change

vectorizer2, clf2 = load_vectorizer_and_model()

load_vectorizer_and_model()

Copilot · 2025-12-27T17:56:27Z

+def test_cache_consistency_across_predictions():
+    """Test that cache works correctly for multiple load operations."""
+    from src.triage.model import load_vectorizer_and_model
+    import src.triage.model as model_module


Module 'src.triage.model' is imported with both 'import' and 'import from'.

Copilot · 2025-12-27T17:56:27Z

+def test_cli_load_artifacts_uses_cache():
+    """Test that CLI's load_artifacts() function uses the cache."""
+    from src.triage.cli import load_artifacts
+    import src.triage.model as model_module


Module 'src.triage.model' is imported with both 'import' and 'import from'.

Copilot · 2025-12-27T17:56:27Z

+def test_cache_memory_stability():
+    """Test that repeated calls don't cause memory issues."""
+    from src.triage.model import load_vectorizer_and_model
+    import src.triage.model as model_module


Module 'src.triage.model' is imported with both 'import' and 'import from'.

Copilot · 2025-12-27T17:56:27Z

+def test_cache_invalidation_on_reset():
+    """Test that cache can be manually invalidated."""
+    from src.triage.model import load_vectorizer_and_model
+    import src.triage.model as model_module


Module 'src.triage.model' is imported with both 'import' and 'import from'.

Copilot · 2025-12-27T17:56:28Z

+def test_concurrent_predictions_use_cache():
+    """Test that cache provides performance benefit for repeated loads."""
+    from src.triage.model import load_vectorizer_and_model
+    import src.triage.model as model_module


Module 'src.triage.model' is imported with both 'import' and 'import from'.

texasbe2trill

@KnightofInd Please review Copilots review and comments left on #28. I'll complete my review once the changes are implemented. There's also a conflict with main branch. Did you run git pull or sync your fork before implementation and opening this PR? Unless there were changes to notebooks/00_getting_started_tutorial.ipynb, this PR is attempting to merge another 5k lines or so of code. Once the changes are implemented, I'll review #28 for squash and merge into main Thanks again for the help!

Implemented module-level cache to prevent repeated disk I/O. Models load once per process, providing massive speedup for batch operations.

Resolving the conflicts

- Modified load_artifacts() to dynamically resolve model module from sys.modules - Fixes test_cli_load_artifacts_uses_cache failure in CI environment - Handles both 'triage.model' (installed) and 'src.triage.model' (dev) import paths - Ensures cache works correctly even when tests reload modules with importlib.reload() - Resolves issue where cached imports referenced old module instances after reload

…olution The test suite uses src.triage.model and may reload it with importlib.reload(). When both triage.model and src.triage.model exist in sys.modules (after pip install -e .), we must check src.triage.model FIRST to get the correct reloaded module instance. This ensures load_artifacts() always uses the same module instance that tests manipulate.

- Changed test to import triage.model (when installed) instead of always using src.triage.model - Ensures test checks the same module object that load_artifacts() uses - Fixes test failure where triage.model and src.triage.model were different objects in CI

KnightofInd · 2025-12-28T15:05:09Z

Hi @texasbe2trill,

PR #28 has one test failing in CI but passing locally: test_cli_load_artifacts_uses_cache

Issue:
The test checks if load_artifacts() uses caching, but fails in CI due to module import resolution differences between environments. All other caching tests (5 tests) pass and confirm the underlying load_vectorizer_and_model() caching works correctly.

Options:

Remove this test (it's redundant since we already test caching directly)
Simplify the load_artifacts() import logic
The caching feature itself works - just this one wrapper test has environment-specific issues.

Which approach would you prefer?

Thanks!

texasbe2trill · 2025-12-29T03:00:41Z

Hi @texasbe2trill,

PR #28 has one test failing in CI but passing locally: test_cli_load_artifacts_uses_cache

Issue: The test checks if load_artifacts() uses caching, but fails in CI due to module import resolution differences between environments. All other caching tests (5 tests) pass and confirm the underlying load_vectorizer_and_model() caching works correctly.

Options:

Remove this test (it's redundant since we already test caching directly) Simplify the load_artifacts() import logic The caching feature itself works - just this one wrapper test has environment-specific issues.

Which approach would you prefer?

Thanks!

@KnightofInd I’ll take a look into this but thinking we can remove the redundant test for this implementation.

texasbe2trill · 2026-01-03T02:11:23Z

@KnightofInd I'll review this in the morning before squash and merging into main. Thanks again!

Copilot

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 19 comments.

Comments suppressed due to low confidence (1)

src/triage/model.py:47

The module-level caching using global variables is not thread-safe. While the comment on line 189 mentions "Thread-safe enough: Python GIL protects reads/writes" in the documentation, this is misleading. The GIL only prevents race conditions for individual operations, but the check-then-set pattern in load_vectorizer_and_model() (lines 46-47) is not atomic.

If two threads call this function simultaneously when the cache is empty, both could pass the check on line 46 before either populates the cache, resulting in duplicate loads. While this might be acceptable for the current single-threaded CLI use case, the code should either:

Add a threading.Lock to protect the cache access
Document clearly that this is NOT thread-safe and should only be used in single-threaded contexts
Use threading.local() if thread-safety is desired without locks

    global _VECTORIZER, _MODEL

    if _VECTORIZER is not None and _MODEL is not None:
        return _VECTORIZER, _MODEL

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-05T12:04:33Z

+    model_module._VECTORIZER = None
+    model_module._MODEL = None


The global variable names _VECTORIZER and _MODEL are intended to be private (indicated by the leading underscore), but the tests directly access and modify them. This violates encapsulation and makes the tests tightly coupled to the implementation details.

Consider adding a public function to reset the cache for testing purposes, such as clear_cache() or reset_cache(), which would make the API cleaner and the tests less brittle if the internal variable names change in the future.

Copilot · 2026-01-05T12:04:33Z

+    original_joblib_load = None
+    load_count = {"count": 0}
+
+    import joblib
+    original_joblib_load = joblib.load


The variable original_joblib_load is initialized to None on line 26 and then immediately reassigned on line 30. The initial assignment is unnecessary and can be removed for cleaner code.

Copilot · 2026-01-05T12:04:33Z

+    # We expect at least 1000x improvement based on benchmark results
+    speedup = first_time / avg_cached_time if avg_cached_time > 0 else 0
+
+    print(f"\n  First load: {first_time*1000:.2f} ms")
+    print(f"  Avg cached load: {avg_cached_time*1000:.4f} ms")
+    print(f"  Speedup: {speedup:.0f}x")
+
+    # Very conservative check - cache should be at least 100x faster
+    assert speedup > 100, \
+        f"Cache not providing significant speedup: {speedup:.1f}x (expected >100x)"


The performance test assumes at least 100x speedup but uses time.perf_counter() which can be unreliable for very fast operations (sub-millisecond). On some systems or under heavy load, the cached load time could fluctuate enough to cause this test to be flaky.

Consider either:

Reducing the threshold to something more conservative (e.g., 10x or 50x)

Running the cached operations multiple times and averaging to get more stable measurements

Adding a tolerance range instead of a hard threshold

Suggested change

# We expect at least 1000x improvement based on benchmark results

speedup = first_time / avg_cached_time if avg_cached_time > 0 else 0

print(f"\n First load: {first_time*1000:.2f} ms")

print(f" Avg cached load: {avg_cached_time*1000:.4f} ms")

print(f" Speedup: {speedup:.0f}x")

# Very conservative check - cache should be at least 100x faster

assert speedup > 100, \

f"Cache not providing significant speedup: {speedup:.1f}x (expected >100x)"

# We expect a substantial improvement, but use a conservative threshold

speedup = first_time / avg_cached_time if avg_cached_time > 0 else 0

print(f"\n First load: {first_time*1000:.2f} ms")

print(f" Avg cached load: {avg_cached_time*1000:.4f} ms")

print(f" Speedup: {speedup:.0f}x")

# Conservative check - cache should be at least 50x faster

assert speedup > 50, \

f"Cache not providing significant speedup: {speedup:.1f}x (expected >50x)"

Copilot · 2026-01-05T12:04:34Z

+            from triage import model as model_module
+        except ImportError:
+            from src.triage import model as model_module


The module resolution logic here is overly complex and could behave unexpectedly. The code checks sys.modules first and then falls back to imports, but this creates potential issues:

If a module is partially imported or in an error state, it could still be in sys.modules but unusable

The fallback import order (triage first, then src.triage) is opposite to the sys.modules check order

This complexity suggests the import structure may need refactoring

Consider simplifying this to use a consistent import approach, or at minimum, ensure the fallback order matches the sys.modules check order for consistency.

Suggested change

from triage import model as model_module

except ImportError:

from src.triage import model as model_module

from src.triage import model as model_module

except ImportError:

from triage import model as model_module

Copilot · 2026-01-05T12:04:34Z

+1. **Simple**: No additional dependencies or complexity
+2. **Effective**: Perfect for single-threaded CLI usage
+3. **Process-scoped**: Automatic cleanup on process exit
+4. **Thread-safe enough**: Python GIL protects reads/writes


The claim "Thread-safe enough: Python GIL protects reads/writes" is misleading and technically incorrect. While the GIL prevents race conditions on individual operations, it does NOT make the check-then-set pattern in the caching logic thread-safe. Two threads could both pass the None check before either sets the cache, resulting in duplicate loads.

This should be updated to clearly state that the caching is NOT thread-safe and is designed for single-threaded CLI usage only. If thread-safety is needed in the future, a lock would be required.

Suggested change

4. **Thread-safe enough**: Python GIL protects reads/writes

4. **Single-threaded only**: Cache is *not* thread-safe; intended for CLI usage in a single process. A lock would be required for safe multi-threaded use.

Copilot · 2026-01-05T12:04:37Z

+def test_cache_memory_stability():
+    """Test that repeated calls don't cause memory issues."""
+    from src.triage.model import load_vectorizer_and_model
+    import src.triage.model as model_module


Module 'src.triage.model' is imported with both 'import' and 'import from'.

Copilot · 2026-01-05T12:04:37Z

+def test_cache_invalidation_on_reset():
+    """Test that cache can be manually invalidated."""
+    from src.triage.model import load_vectorizer_and_model
+    import src.triage.model as model_module


Module 'src.triage.model' is imported with both 'import' and 'import from'.

Copilot · 2026-01-05T12:04:37Z

+def test_concurrent_predictions_use_cache():
+    """Test that cache provides performance benefit for repeated loads."""
+    from src.triage.model import load_vectorizer_and_model
+    import src.triage.model as model_module


Module 'src.triage.model' is imported with both 'import' and 'import from'.

Copilot · 2026-01-05T12:04:37Z

+4. Cache works correctly in CLI context
+"""
+
+import sys


Import of 'sys' is not used.

Suggested change

import sys

Copilot · 2026-01-05T12:04:37Z

+"""
+
+import sys
+import importlib


Import of 'importlib' is not used.

Suggested change

import importlib

texasbe2trill requested a review from Copilot December 27, 2025 17:53

Copilot started reviewing on behalf of texasbe2trill December 27, 2025 17:53 View session

Copilot AI reviewed Dec 27, 2025

View reviewed changes

texasbe2trill self-requested a review December 27, 2025 18:03

texasbe2trill reviewed Dec 27, 2025

View reviewed changes

KnightofInd added a commit to KnightofInd/AlertSage that referenced this pull request Dec 28, 2025

updates for issue texasbe2trill#28

fd0cd19

KnightofInd and others added 3 commits December 28, 2025 11:45

Updates for Issue texasbe2trill#17

a986156

Implemented module-level cache to prevent repeated disk I/O. Models load once per process, providing massive speedup for batch operations.

updates for issue texasbe2trill#28

fb48cf1

Delete notebooks/00_getting_started_tutorial.ipynb

f616860

Resolving the conflicts

KnightofInd force-pushed the main branch from aa54c3c to f616860 Compare December 28, 2025 06:16

KnightofInd added 4 commits December 28, 2025 12:01

ci: trigger workflow rerun

1376780

texasbe2trill moved this to Backlog in AlertSage Jan 3, 2026

texasbe2trill added this to AlertSage Jan 3, 2026

texasbe2trill moved this from Backlog to In progress in AlertSage Jan 3, 2026

texasbe2trill moved this from In progress to In review in AlertSage Jan 3, 2026

texasbe2trill requested a review from Copilot January 5, 2026 11:58

Copilot started reviewing on behalf of texasbe2trill January 5, 2026 11:58 View session

Copilot AI reviewed Jan 5, 2026

View reviewed changes

		vectorizer = joblib.load(vectorizer_path)
		clf = joblib.load(model_path)

	vectorizer2, clf2 = load_vectorizer_and_model()
	load_vectorizer_and_model()

	4. Thread-safe enough: Python GIL protects reads/writes
	4. Single-threaded only: Cache is not thread-safe; intended for CLI usage in a single process. A lock would be required for safe multi-threaded use.

Uh oh!

Conversation

KnightofInd commented Dec 27, 2025

🚀 Performance Optimization: Model Caching

Summary

Problem

Solution

Performance Gains

Changes

Testing

Acceptance Criteria

Breaking Changes

Documentation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

texasbe2trill left a comment

Choose a reason for hiding this comment

Uh oh!

KnightofInd commented Dec 28, 2025

Uh oh!

texasbe2trill commented Dec 29, 2025

Uh oh!

texasbe2trill commented Jan 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!