Skip to content

Refactor: Enhance inference testability, add --start-frame support, and improve error handling#149

Open
karimnagdii wants to merge 5 commits intonikopueringer:mainfrom
karimnagdii:refactor/infrastructure-enhancements
Open

Refactor: Enhance inference testability, add --start-frame support, and improve error handling#149
karimnagdii wants to merge 5 commits intonikopueringer:mainfrom
karimnagdii:refactor/infrastructure-enhancements

Conversation

@karimnagdii
Copy link

Context & Objective
This pull request addresses findings from a recent codebase audit, focusing on architectural decoupling, robust error handling, and developer experience. The primary objective is to enable local testing in hardware-constrained environments by decoupling core inference logic from heavy ML dependencies. Additionally, this PR introduces a highly requested workflow enhancement for handling interrupted rendering sequences.

Technical Implementation

  1. Architecture & Code Quality

I/O Abstraction: Extracted image and video reading logic (handling .exr, .png, and video caps) from the main run_inference loop into an isolated _read_frame() helper function. This separates file I/O from core inference logic, significantly improving maintainability.

Dependency Injection: Refactored run_inference to accept an optional engine_override parameter. This allows us to bypass the resource-intensive create_engine() method (which loads PyTorch/Triton) and inject a lightweight mock engine during testing.

  1. Testing Infrastructure

In-Memory Test Suite: Implemented an isolated automated test suite (tests/test_clip_manager.py). Using unittest.mock, heavy dependencies (torch, torchvision, cv2) are patched prior to import, enabling rapid validation of the run_inference logic flow on standard CPU environments without hardware bottlenecks.

  1. Feature Additions

Resumable Rendering: Implemented a --start-frame argument in corridorkey_cli.py, propagated to run_inference in clip_manager.py. This prevents the need to restart full renders from frame 0 if long sequence processing is interrupted.

  1. Security & Error Handling

Production-Safe Validation: Replaced development-only assert statements in corridorkey_cli.py with explicit ValueError exceptions, mitigating the risk of silent failures in optimized (.pyc) execution environments.

Verbose Exception Logging: Added explicit exception handling and logging to the interactive_wizard clip scanning phase, providing transparent failure reasons for invalid media.

Validation
[x] Executed mocked unit test suite locally (execution time: ~15ms).

[x] Verified --start-frame boundary calculations to ensure correct frame iteration limits without altering output directory structures.

…placed assert checks in cli with ValueError logging\n- Extracted _read_frame out of run_inference pipeline\n- Integrated dependency injection in run_inference for mocked unit testing\n- Added --start-frame capability\n- Implemented offline-compatible unit tests mocking PyTorch logic
@nikopueringer
Copy link
Owner

There are some smart ideas here, and I'm always a fan of keeping things more modular. Separating I/O from inference logic makes sense.

karimnagdii and others added 2 commits March 13, 2026 11:33
- Split long logger.warning line in clip_manager.py to fix E501
- Remove trailing whitespace in clip_manager.py and corridorkey_cli.py
- Remove global sys.modules torch/cv2 mocking in test_clip_manager.py
  that was poisoning torch for test_color_utils.py in the same session,
  causing TypeError on <= comparisons; use @patch decorators instead

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@karimnagdii
Copy link
Author

Fixed all three CI failures — removed the global sys.modules torch/cv2 mocking in test_clip_manager.py that was
leaking into test_color_utils.py (causing the <= TypeError), and split the long warning line in clip_manager.py to clear the E501. Tests and lint should be green now.

Copy link
Contributor

@shezmic shezmic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this — engine_override for testing and --start-frame for resumable renders are both genuinely useful additions. A few things worth flagging before this lands:

Test regression

The new tests/test_clip_manager.py replaces 292 lines with 141. The deleted tests covered stable utility functions (is_image_file, is_video_file, map_path, ClipAsset frame counting, ClipEntry.find_assets, organize_target) that have no coverage elsewhere. Removing them drops coverage on code that hasn't changed. Would it be possible to keep the existing tests and add the new run_inference tests alongside them?

Mock / interface mismatch

The new tests assert mock_engine.process_frame.call_count == 2, but the engine interface uses .forward(), not .process_frame(). As written, the tests would pass even if run_inference never calls the engine at all — which means they wouldn't catch a regression.

Atomicity

This PR bundles four independent concerns:

  1. _read_frame() extraction (architecture)
  2. --start-frame feature (new CLI flag)
  3. engine_override (testing infrastructure)
  4. assertValueError + logger.debug (error handling)

Each could be a clean, reviewable PR on its own. Given the prior feedback on atomic PRs (#118#119/#120), splitting these would likely make review faster.

Happy to be corrected if I've misread anything — just wanted to surface these before merge.

- Replace forward.return_value (wrong method) with process_frame.return_value
  using the correct key contract: alpha/fg/comp/processed
- Fix alpha shape to (10,10,1) so ndim==3 branch is exercised correctly
- Add TestFileTypeHelpers covering is_image_file / is_video_file
- Add TestClipAsset covering _calculate_length for sequences
- Add TestClipEntry covering find_assets for the sequence path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@karimnagdii
Copy link
Author

@shezmic
Thanks for the thorough review all fair points.

Test regression + mock mismatch both fixed in the latest commit (0e78f8d). The utility tests (is_image_file,
is_video_file, ClipAsset._calculate_length, ClipEntry.find_assets) are restored as new test classes. The mock is now
correctly wired to process_frame.return_value with the right key contract (alpha/fg/comp/processed) and shapes, so the
assertions actually verify pipeline behavior rather than passing vacuously.

Atomicity you're right that these are four distinct concerns. The honest reason they landed together: _read_frame()
extraction was the prerequisite to make engine_override injectable without threading the engine through unrelated call
sites, and the assert ValueError sweep happened while touching those same lines. In hindsight the --start-frame flag
could have been a separate PR. Noted for next time I'll keep the #118 pattern in mind.

Let me know if there's anything else to address before merge.

@shezmic
Copy link
Contributor

shezmic commented Mar 14, 2026

Code Review — Community Contribution

Hey @karimnagdii, thanks for putting this together. I've gone through the diff carefully against the current main branch and wanted to flag a few things that would need attention before this could merge cleanly.


1. NameError in the else branch after _read_frame refactor

After extracting the _read_frame() helper, the else branch in run_inference still references fpath on this line:

else:
    img_srgb = _read_frame(None, input_files, clip.input_asset.path, i, input_is_linear)
    input_stem = os.path.splitext(input_files[i])[0]

    is_exr = fpath.lower().endswith(".exr")  # ← fpath is no longer defined here

fpath was previously assigned as os.path.join(clip.input_asset.path, input_files[i]) before the extraction, but that line was removed when _read_frame took over the read logic. This will raise a NameError at runtime for any image-sequence clip.

2. _read_frame accepts is_linear but never uses it

The is_linear parameter is accepted by _read_frame(cap, files, path, index, is_linear) but the function body never references it. This is misleading — callers pass it expecting it to affect behaviour, but it has no effect. Either remove the parameter or implement the linear/sRGB handling inside the helper.

3. Redundant guard after required_flags_set

required_flags_set = all(v is not None for v in [linear, despill, despeckle, refiner])
if required_flags_set:
    if any(v is None for v in [linear, despill, despeckle, refiner]):
        raise ValueError("Missing required flags for inference settings.")

The inner if any(v is None ...) can never be True because required_flags_set already guarantees all values are non-None. The assertValueError intention is good (production safety), but the check is logically dead code. You could simplify to just the outer if required_flags_set: block.

4. Typo in log message

f" Alpha frames: {clip.alpha_asset.frame_count} -> Processing form frame {start_frame} up to {num_frames}"

"form" → "from"

5. Existing test suite replaced entirely

The PR replaces the full test_clip_manager.py file — removing ~200 lines of well-structured pytest-style tests covering is_image_file, is_video_file, map_path, ClipAsset, ClipEntry.find_assets, validate_pair, and organize_target. The replacement uses unittest.TestCase style and covers a narrower surface (mostly run_inference with mocks + a few file-type checks).

These existing tests caught real regressions. Removing them without re-implementing equivalent coverage (especially map_path, validate_pair, organize_target, find_assets edge cases like empty dirs and missing inputs) would be a net loss. I'd suggest adding the new inference tests alongside the existing ones rather than replacing them.

6. Scope overlap with existing PRs

The --start-frame feature overlaps with PR #160 (--skip-existing) which provides resumable renders through a different mechanism (checking for existing output files). Worth coordinating to avoid merge conflicts and decide which approach best serves the use case.


The engine_override dependency injection pattern and the assertValueError change are both solid ideas worth keeping. The _read_frame extraction is a good direction too — it just needs the fpath reference fixed and the unused parameter addressed. Happy to re-review once the issues above are resolved.

@HYP3R00T
Copy link
Contributor

I tested your PR locally and got the error.

Starting Corridor Key locally...
Target: "C:\Users\Rajes\Downloads\sample_inputs"
Uninstalled 2 packages in 1.59s
░░░░░░░░░░░░░░░░░░░░ [0/2] Installing wheels...                                                                                                                              warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance.
         If the cache and target directories are on different filesystems, hardlinking may not be supported.
         If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning.
Installed 2 packages in 1m 29s
[22:19:28] INFO     Auto-selected device: cuda                                                                                                             device_utils.py:22
           INFO     Using device: cuda                                                                                                                 corridorkey_cli.py:218
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ CORRIDOR KEY — SMART WIZARD                                                                                                                                               │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Windows Path: C:\Users\Rajes\Downloads\sample_inputs
Running locally: C:\Users\Rajes\Downloads\sample_inputs

Found 1 potential clip folders.
Found 1 folders needing setup:
  • sample_inputs

Organize clips & create hint folders? [y/n] (n): y
[22:21:33] INFO     Organizing Target: C:\Users\Rajes\Downloads\sample_inputs                                                                             clip_manager.py:880
Organization complete.
                    Status Report
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Category                   ┃ Count ┃ Clips         ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Ready (AlphaHint)          │     1 │ sample_inputs │
├────────────────────────────┼───────┼───────────────┤
│ Masked (VideoMaMaMaskHint) │     0 │ —             │
├────────────────────────────┼───────┼───────────────┤
│ Raw (Input only)           │     0 │ —             │
└────────────────────────────┴───────┴───────────────┘
╭───────────────────────────────────────────────────────────────────────────────── Actions ─────────────────────────────────────────────────────────────────────────────────╮
│ i — Run Inference (1 ready clips)                                                                                                                                         │
│ r — Re-scan folders                                                                                                                                                       │
│ q — Quit                                                                                                                                                                  │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Select action [v/g/b/i/r/q] (q): i
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Corridor Key Inference                                                                                                                                                    │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Inference Settings                                                                                                                                                        │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Input colorspace [linear/srgb] (srgb): linear
Despill strength (0–10, 10 = max despill) (5):
Enable auto-despeckle (removes tracking dots)? [y/n] (y):
Despeckle size (min pixels for a spot) (400):
Refiner strength multiplier (experimental) (1.0):
[22:21:52] INFO     Found 1 clips ready for inference.                                                                                                    clip_manager.py:643
           INFO     Not Apple Silicon — using torch backend                                                                                                     backend.py:53
           INFO     Torch engine loaded: CorridorKey.pth (device=cuda)                                                                                         backend.py:238
           INFO     Loading CorridorKey from D:\CorridorKey\CorridorKeyModule\checkpoints\CorridorKey.pth                                              inference_engine.py:68
           INFO     Initializing hiera_base_plus_224.mae_in1k_ft_in1k (img_size=2048)                                                                model_transformer.py:159
[22:21:53] INFO     Skipped downloading base weights (relying on custom checkpoint)                                                                  model_transformer.py:164
           INFO     Patched input layer: 3 → 4 channels (extra initialized to 0)                                                                     model_transformer.py:240
           INFO     Feature channels: [112, 224, 448, 896]                                                                                           model_transformer.py:177
W0314 22:22:15.718000 24040 Lib\site-packages\torch\_inductor\utils.py:1250] [0/0] Not enough SMs to use max_autotune_gemm mode
[22:24:52] INFO     Running Inference on: sample_inputs                                                                                                   clip_manager.py:665
           INFO       Input frames: 15, Alpha frames: 15 -> Processing form frame 0 up to 15                                                              clip_manager.py:684
⠼ sample_inputs ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/15 0:00:00
╭──────────────────────────────────────────────────────────────────── Traceback (most recent call last) ────────────────────────────────────────────────────────────────────╮
│ D:\CorridorKey\corridorkey_cli.py:328 in wizard                                                                                                                           │
│                                                                                                                                                                           │
│   325 │   path: Annotated[str, typer.Argument(help="Target path (Windows or local)")],                                                                                    │
│   326 ) -> None:                                                                                                                                                          │
│   327 │   """Interactive wizard for organizing clips and running the pipeline."""                                                                                         │
│ ❱ 328 │   interactive_wizard(path, device=ctx.obj["device"])                                                                                                              │
│   329                                                                                                                                                                     │
│   330                                                                                                                                                                     │
│   331 # ---------------------------------------------------------------------------                                                                                       │
│                                                                                                                                                                           │
│ D:\CorridorKey\corridorkey_cli.py:551 in interactive_wizard                                                                                                               │
│                                                                                                                                                                           │
│   548 │   │   │   try:                                                                                                                                                    │
│   549 │   │   │   │   settings = _prompt_inference_settings()                                                                                                             │
│   550 │   │   │   │   with ProgressContext() as ctx_progress:                                                                                                             │
│ ❱ 551 │   │   │   │   │   run_inference(                                                                                                                                  │
│   552 │   │   │   │   │   │   ready,                                                                                                                                      │
│   553 │   │   │   │   │   │   device=device,                                                                                                                              │
│   554 │   │   │   │   │   │   settings=settings,                                                                                                                          │
│                                                                                                                                                                           │
│ D:\CorridorKey\clip_manager.py:737 in run_inference                                                                                                                       │
│                                                                                                                                                                           │
│    734 │   │   │   │   img_srgb = _read_frame(None, input_files, clip.input_asset.path, i, inpu                                                                           │
│    735 │   │   │   │   input_stem = os.path.splitext(input_files[i])[0]                                                                                                   │
│    736 │   │   │   │                                                                                                                                                      │
│ ❱  737 │   │   │   │   is_exr = fpath.lower().endswith(".exr")                                                                                                            │
│    738 │   │   │   │   if is_exr:                                                                                                                                         │
│    739 │   │   │   │   │   img_srgb = read_image_frame(fpath, gamma_correct_exr=not input_is_li                                                                           │
│    740 │   │   │   │   │   if img_srgb is None:                                                                                                                           │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
UnboundLocalError: cannot access local variable 'fpath' where it is not associated with a value
Press any key to continue . . .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants