Refactor: Enhance inference testability, add --start-frame support, and improve error handling#149
Conversation
…placed assert checks in cli with ValueError logging\n- Extracted _read_frame out of run_inference pipeline\n- Integrated dependency injection in run_inference for mocked unit testing\n- Added --start-frame capability\n- Implemented offline-compatible unit tests mocking PyTorch logic
|
There are some smart ideas here, and I'm always a fan of keeping things more modular. Separating I/O from inference logic makes sense. |
- Split long logger.warning line in clip_manager.py to fix E501 - Remove trailing whitespace in clip_manager.py and corridorkey_cli.py - Remove global sys.modules torch/cv2 mocking in test_clip_manager.py that was poisoning torch for test_color_utils.py in the same session, causing TypeError on <= comparisons; use @patch decorators instead Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Fixed all three CI failures — removed the global sys.modules torch/cv2 mocking in test_clip_manager.py that was |
shezmic
left a comment
There was a problem hiding this comment.
Thanks for tackling this — engine_override for testing and --start-frame for resumable renders are both genuinely useful additions. A few things worth flagging before this lands:
Test regression
The new tests/test_clip_manager.py replaces 292 lines with 141. The deleted tests covered stable utility functions (is_image_file, is_video_file, map_path, ClipAsset frame counting, ClipEntry.find_assets, organize_target) that have no coverage elsewhere. Removing them drops coverage on code that hasn't changed. Would it be possible to keep the existing tests and add the new run_inference tests alongside them?
Mock / interface mismatch
The new tests assert mock_engine.process_frame.call_count == 2, but the engine interface uses .forward(), not .process_frame(). As written, the tests would pass even if run_inference never calls the engine at all — which means they wouldn't catch a regression.
Atomicity
This PR bundles four independent concerns:
_read_frame()extraction (architecture)--start-framefeature (new CLI flag)engine_override(testing infrastructure)assert→ValueError+logger.debug(error handling)
Each could be a clean, reviewable PR on its own. Given the prior feedback on atomic PRs (#118 → #119/#120), splitting these would likely make review faster.
Happy to be corrected if I've misread anything — just wanted to surface these before merge.
- Replace forward.return_value (wrong method) with process_frame.return_value using the correct key contract: alpha/fg/comp/processed - Fix alpha shape to (10,10,1) so ndim==3 branch is exercised correctly - Add TestFileTypeHelpers covering is_image_file / is_video_file - Add TestClipAsset covering _calculate_length for sequences - Add TestClipEntry covering find_assets for the sequence path Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@shezmic Test regression + mock mismatch both fixed in the latest commit (0e78f8d). The utility tests (is_image_file, Atomicity you're right that these are four distinct concerns. The honest reason they landed together: _read_frame() Let me know if there's anything else to address before merge. |
Code Review — Community ContributionHey @karimnagdii, thanks for putting this together. I've gone through the diff carefully against the current 1.
|
|
I tested your PR locally and got the error. Starting Corridor Key locally...
Target: "C:\Users\Rajes\Downloads\sample_inputs"
Uninstalled 2 packages in 1.59s
░░░░░░░░░░░░░░░░░░░░ [0/2] Installing wheels... warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance.
If the cache and target directories are on different filesystems, hardlinking may not be supported.
If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning.
Installed 2 packages in 1m 29s
[22:19:28] INFO Auto-selected device: cuda device_utils.py:22
INFO Using device: cuda corridorkey_cli.py:218
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ CORRIDOR KEY — SMART WIZARD │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Windows Path: C:\Users\Rajes\Downloads\sample_inputs
Running locally: C:\Users\Rajes\Downloads\sample_inputs
Found 1 potential clip folders.
Found 1 folders needing setup:
• sample_inputs
Organize clips & create hint folders? [y/n] (n): y
[22:21:33] INFO Organizing Target: C:\Users\Rajes\Downloads\sample_inputs clip_manager.py:880
Organization complete.
Status Report
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Category ┃ Count ┃ Clips ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Ready (AlphaHint) │ 1 │ sample_inputs │
├────────────────────────────┼───────┼───────────────┤
│ Masked (VideoMaMaMaskHint) │ 0 │ — │
├────────────────────────────┼───────┼───────────────┤
│ Raw (Input only) │ 0 │ — │
└────────────────────────────┴───────┴───────────────┘
╭───────────────────────────────────────────────────────────────────────────────── Actions ─────────────────────────────────────────────────────────────────────────────────╮
│ i — Run Inference (1 ready clips) │
│ r — Re-scan folders │
│ q — Quit │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Select action [v/g/b/i/r/q] (q): i
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Corridor Key Inference │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Inference Settings │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Input colorspace [linear/srgb] (srgb): linear
Despill strength (0–10, 10 = max despill) (5):
Enable auto-despeckle (removes tracking dots)? [y/n] (y):
Despeckle size (min pixels for a spot) (400):
Refiner strength multiplier (experimental) (1.0):
[22:21:52] INFO Found 1 clips ready for inference. clip_manager.py:643
INFO Not Apple Silicon — using torch backend backend.py:53
INFO Torch engine loaded: CorridorKey.pth (device=cuda) backend.py:238
INFO Loading CorridorKey from D:\CorridorKey\CorridorKeyModule\checkpoints\CorridorKey.pth inference_engine.py:68
INFO Initializing hiera_base_plus_224.mae_in1k_ft_in1k (img_size=2048) model_transformer.py:159
[22:21:53] INFO Skipped downloading base weights (relying on custom checkpoint) model_transformer.py:164
INFO Patched input layer: 3 → 4 channels (extra initialized to 0) model_transformer.py:240
INFO Feature channels: [112, 224, 448, 896] model_transformer.py:177
W0314 22:22:15.718000 24040 Lib\site-packages\torch\_inductor\utils.py:1250] [0/0] Not enough SMs to use max_autotune_gemm mode
[22:24:52] INFO Running Inference on: sample_inputs clip_manager.py:665
INFO Input frames: 15, Alpha frames: 15 -> Processing form frame 0 up to 15 clip_manager.py:684
⠼ sample_inputs ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/15 0:00:00
╭──────────────────────────────────────────────────────────────────── Traceback (most recent call last) ────────────────────────────────────────────────────────────────────╮
│ D:\CorridorKey\corridorkey_cli.py:328 in wizard │
│ │
│ 325 │ path: Annotated[str, typer.Argument(help="Target path (Windows or local)")], │
│ 326 ) -> None: │
│ 327 │ """Interactive wizard for organizing clips and running the pipeline.""" │
│ ❱ 328 │ interactive_wizard(path, device=ctx.obj["device"]) │
│ 329 │
│ 330 │
│ 331 # --------------------------------------------------------------------------- │
│ │
│ D:\CorridorKey\corridorkey_cli.py:551 in interactive_wizard │
│ │
│ 548 │ │ │ try: │
│ 549 │ │ │ │ settings = _prompt_inference_settings() │
│ 550 │ │ │ │ with ProgressContext() as ctx_progress: │
│ ❱ 551 │ │ │ │ │ run_inference( │
│ 552 │ │ │ │ │ │ ready, │
│ 553 │ │ │ │ │ │ device=device, │
│ 554 │ │ │ │ │ │ settings=settings, │
│ │
│ D:\CorridorKey\clip_manager.py:737 in run_inference │
│ │
│ 734 │ │ │ │ img_srgb = _read_frame(None, input_files, clip.input_asset.path, i, inpu │
│ 735 │ │ │ │ input_stem = os.path.splitext(input_files[i])[0] │
│ 736 │ │ │ │ │
│ ❱ 737 │ │ │ │ is_exr = fpath.lower().endswith(".exr") │
│ 738 │ │ │ │ if is_exr: │
│ 739 │ │ │ │ │ img_srgb = read_image_frame(fpath, gamma_correct_exr=not input_is_li │
│ 740 │ │ │ │ │ if img_srgb is None: │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
UnboundLocalError: cannot access local variable 'fpath' where it is not associated with a value
Press any key to continue . . . |
Context & Objective
This pull request addresses findings from a recent codebase audit, focusing on architectural decoupling, robust error handling, and developer experience. The primary objective is to enable local testing in hardware-constrained environments by decoupling core inference logic from heavy ML dependencies. Additionally, this PR introduces a highly requested workflow enhancement for handling interrupted rendering sequences.
Technical Implementation
I/O Abstraction: Extracted image and video reading logic (handling .exr, .png, and video caps) from the main run_inference loop into an isolated _read_frame() helper function. This separates file I/O from core inference logic, significantly improving maintainability.
Dependency Injection: Refactored run_inference to accept an optional engine_override parameter. This allows us to bypass the resource-intensive create_engine() method (which loads PyTorch/Triton) and inject a lightweight mock engine during testing.
In-Memory Test Suite: Implemented an isolated automated test suite (tests/test_clip_manager.py). Using unittest.mock, heavy dependencies (torch, torchvision, cv2) are patched prior to import, enabling rapid validation of the run_inference logic flow on standard CPU environments without hardware bottlenecks.
Resumable Rendering: Implemented a --start-frame argument in corridorkey_cli.py, propagated to run_inference in clip_manager.py. This prevents the need to restart full renders from frame 0 if long sequence processing is interrupted.
Production-Safe Validation: Replaced development-only assert statements in corridorkey_cli.py with explicit ValueError exceptions, mitigating the risk of silent failures in optimized (.pyc) execution environments.
Verbose Exception Logging: Added explicit exception handling and logging to the interactive_wizard clip scanning phase, providing transparent failure reasons for invalid media.
Validation
[x] Executed mocked unit test suite locally (execution time: ~15ms).
[x] Verified --start-frame boundary calculations to ensure correct frame iteration limits without altering output directory structures.