diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 00000000..cd3724c8 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,165 @@ +# CorridorKey — AGENTS.md + +> Agent-facing project guide following the [AGENTS.md open format](https://agents.md). +> For deeper architectural detail, see [`docs/LLM_HANDOVER.md`](docs/LLM_HANDOVER.md). + +## Project Overview + +**CorridorKey** is a neural-network-based green screen removal tool built for professional VFX pipelines. Unlike traditional keyers that produce binary masks, CorridorKey physically unmixes the foreground from the green screen at every pixel — including semi-transparent regions like motion blur, hair, and out-of-focus edges. + +**Core inputs:** + +- **RGB image** — the green screen plate (sRGB color gamut). +- **Coarse Alpha Hint** — a rough black-and-white mask isolating the subject (does not need to be precise). + +**Core outputs:** + +- **Alpha** — a clean, linear alpha channel. +- **Foreground Straight** — the un-multiplied straight color of the foreground element (sRGB), with the green screen contribution removed. + +**License:** [CC-BY-NC-SA-4.0](LICENSE) + +## Architecture & Dataflow + +### GreenFormer Architecture + +The core model is called the **GreenFormer**: + +- **Backbone:** A `timm` **Hiera** vision-transformer (`hiera_base_plus_224.mae_in1k_ft_in1k`), patched to accept 4 input channels (RGB + Coarse Alpha Hint). +- **Decoders:** Multiscale feature fusion heads predicting coarse Alpha (1 ch) and Foreground (3 ch) logits. +- **Refiner (`CNNRefinerModule`):** A custom CNN head with dilated residual blocks. It takes the original RGB input and the coarse predictions, outputting purely additive delta logits applied to the backbone outputs before final Sigmoid activation. + +### Dataflow Rules + +1. **Tensor range:** Model input and output are strictly `[0.0, 1.0]` float tensors. The foreground is sRGB; the alpha is linear. +2. **EXR pipeline:** To build the `Processed` EXR output, the sRGB foreground is converted via the piecewise `srgb_to_linear()` function, then premultiplied by the linear alpha, and saved as half-float EXR (`cv2.IMWRITE_EXR_TYPE_HALF`). +3. **Inference resizing:** The engine is trained on **2048×2048** crops. `inference_engine.py` uses OpenCV **Lanczos4** to resize arbitrary input to 2048×2048, runs inference, then resizes predictions back to the original resolution. +4. **Despill:** A luminance-preserving `despill()` function removes residual green contamination from the foreground. + +> ⚠️ **Gamma 2.2 warning:** Never apply a pure mathematical gamma 2.2 curve. Always use the piecewise sRGB transfer functions defined in `color_utils.py`. A naive power-law curve will produce incorrect results in the toe region and break compositing math. + +## Key File Map + +| Path | Responsibility | +|---|---| +| `CorridorKeyModule/core/model_transformer.py` | GreenFormer PyTorch architecture (Hiera backbone, decoders, CNNRefinerModule) | +| `CorridorKeyModule/inference_engine.py` | `CorridorKeyEngine` class — loads weights, handles 2048×2048 resize and frame processing API | +| `CorridorKeyModule/core/color_utils.py` | Pure math for compositing: `srgb_to_linear()`, `linear_to_srgb()`, `premultiply()`, `despill()` | +| `clip_manager.py` | User-facing CLI wizard — directory scanning, inference settings, piping data to the engine | +| `device_utils.py` | Compute device detection and selection (CUDA / MPS / CPU), backend resolution | +| `backend/` | FastAPI-based backend service: job queue, project management, FFmpeg tools, frame I/O | + +## Dev Environment Setup + +**Prerequisites:** Python ≥ 3.10 and [uv](https://docs.astral.sh/uv/). + +uv handles Python installation, virtual environment creation, and package management — no manual `pip install` or virtualenv setup required. + +```bash +git clone https://github.com/nikopueringer/CorridorKey.git +cd CorridorKey +uv sync --group dev # installs all dependencies + dev tools (pytest, ruff, hypothesis) +``` + +## Build & Test Commands + +```bash +uv run pytest # run all tests +uv run pytest -v # verbose output +uv run pytest -m "not gpu" # skip GPU-dependent tests +uv run ruff check # lint check +uv run ruff format --check # formatting check (no changes) +uv run ruff format # auto-format +``` + +Tests that require a CUDA GPU are marked with `@pytest.mark.gpu` and are automatically skipped when no GPU is available. CI runs `pytest -m "not gpu"` to exclude them. + +## Code Style + +The project uses **[Ruff](https://docs.astral.sh/ruff/)** for both linting and formatting. + +| Setting | Value | +|---|---| +| Lint rules | `E`, `F`, `W`, `I`, `B` | +| Line length | 120 | +| Target version | `py311` | +| Excluded dirs | `gvm_core/`, `VideoMaMaInferenceModule/` | + +`gvm_core/` and `VideoMaMaInferenceModule/` are third-party research code kept close to upstream — they are excluded from lint enforcement. + +## Platform-Specific Caveats + +### Apple Silicon (macOS) + +- **MPS operator fallback:** Some PyTorch operations are not yet implemented for MPS. Enable CPU fallback: + ```bash + export PYTORCH_ENABLE_MPS_FALLBACK=1 + ``` + +### Windows + +- **CUDA 12.8:** GPU acceleration on Windows requires NVIDIA drivers supporting **CUDA 12.8** or higher. Older drivers will cause a silent fallback to CPU. + +## Prohibited Actions + +1. **Do not apply a pure gamma 2.2 curve.** Always use the piecewise sRGB transfer functions in `color_utils.py`. A naive `pow(x, 2.2)` breaks the toe region and produces incorrect compositing results. +2. **Do not modify files inside `gvm_core/` or `VideoMaMaInferenceModule/`.** These are third-party research modules kept close to upstream. Changes should be made in wrapper code or upstream PRs. + +## PR Workflow & GitHub Templates + +### Workflow + +1. Fork the repo and create a branch for your change. +2. Make your changes. +3. Run `uv run pytest` and `uv run ruff check` to verify everything passes. +4. Open a pull request against `main`. + +PR descriptions should focus on **why** the change was made, not just what changed. If fixing a bug, describe the symptoms. If adding a feature, explain the use case. + +Before preparing any pull request, check `.github/` for PR templates, issue templates, and CI workflows. + +### PR Template + +The repository includes a PR template (`.github/pull_request_template.md`) with the following structure: + +- **"What does this change?"** — Explain the motivation and scope of the change. +- **"How was it tested?"** — Describe specific test steps or commands run to verify correctness. +- **Checklist:** + - `uv run pytest` passes + - `uv run ruff check` passes + - `uv run ruff format --check` passes + +Fill in all sections thoroughly. The "What does this change?" section should explain motivation, and "How was it tested?" should describe specific test steps or commands run. + +### CI Workflow (`ci.yml`) + +Runs on every push and pull request to `main`: + +- **Lint job:** `ruff format --check` + `ruff check`. +- **Test job:** `pytest -v --tb=short -m "not gpu"` on Python **3.10** and **3.13**. GPU tests are excluded via the `-m "not gpu"` marker filter. + +### Docs Workflow (`docs.yml`) + +Triggers on pushes to `main` that change files matching `docs/**` or `zensical.toml`. Builds and deploys the documentation site to GitHub Pages via **Zensical**. + +## Documentation Accuracy + +When making code changes, evaluate whether the change affects the accuracy of existing documentation. If a code change alters behavior, CLI flags, file paths, or configuration described in the docs, flag or update the outdated documentation. + +Documentation files to check: + +- `README.md` +- `CONTRIBUTING.md` +- `AGENTS.md` +- `docs/LLM_HANDOVER.md` +- All pages under `docs/` + +## AI Directives + +- **Skip basic tutorials.** The user is a VFX professional and coder. Dive straight into advanced implementation guidance, but document math thoroughly. +- **Prioritize performance.** This is video processing — every `.numpy()` transfer or `cv2.resize` matters in a loop running on 4K footage. +- **Check sRGB-to-linear conversion order.** If the user reports "crushed shadows" or "dark fringes", the problem is almost certainly an sRGB-to-linear conversion step happening in the wrong order inside `color_utils.py`. + +## Further Reading + +- [`docs/LLM_HANDOVER.md`](docs/LLM_HANDOVER.md) — Detailed architecture walkthrough, dataflow properties, inference pipeline, and AI directives for the CorridorKey codebase. diff --git a/README.md b/README.md index 933bbd49..c40909ee 100644 --- a/README.md +++ b/README.md @@ -3,6 +3,7 @@ https://github.com/user-attachments/assets/1fb27ea8-bc91-4ebc-818f-5a3b5585af08 +> 📖 **[Full Documentation](https://nikopueringer.github.io/CorridorKey)** — Installation guides, usage instructions, and developer docs. When you film something against a green screen, the edges of your subject inevitably blend with the green background. This creates pixels that are a mix of your subject's color and the green screen's color. Traditional keyers struggle to untangle these colors, forcing you to spend hours building complex edge mattes or manually rotoscoping. Even modern "AI Roto" solutions typically output a harsh binary mask, completely destroying the delicate, semi-transparent pixels needed for a realistic composite. diff --git a/docs/_snippets/apple-silicon-note.md b/docs/_snippets/apple-silicon-note.md new file mode 100644 index 00000000..d1403cce --- /dev/null +++ b/docs/_snippets/apple-silicon-note.md @@ -0,0 +1,18 @@ +!!! note "Apple Silicon (MPS / MLX)" + CorridorKey runs on Apple Silicon Macs using unified memory. Two backend + options are available: + + - **MPS** — PyTorch's Metal Performance Shaders backend. Works out of the + box but some operators may require the CPU fallback flag: + + ```bash + export PYTORCH_ENABLE_MPS_FALLBACK=1 + ``` + + - **MLX** — Native Apple Silicon acceleration via the + [MLX framework](https://github.com/ml-explore/mlx). Avoids PyTorch's MPS + layer entirely and typically runs faster. Requires installing the MLX + extras (`uv sync --extra mlx`) and obtaining `.safetensors` weights. + + Because Apple Silicon shares memory between the CPU and GPU, the full + system RAM is available to the model — no separate VRAM budget applies. diff --git a/docs/_snippets/model-download.md b/docs/_snippets/model-download.md new file mode 100644 index 00000000..8e49cc9a --- /dev/null +++ b/docs/_snippets/model-download.md @@ -0,0 +1,14 @@ +**Download the CorridorKey checkpoint (~300 MB):** + +[Download CorridorKey_v1.0.pth from Hugging Face](https://huggingface.co/nikopueringer/CorridorKey_v1.0/resolve/main/CorridorKey_v1.0.pth){ .md-button } + +Place the file inside `CorridorKeyModule/checkpoints/` and rename it to +**`CorridorKey.pth`** so the final path is: + +``` +CorridorKeyModule/checkpoints/CorridorKey.pth +``` + +!!! warning + The engine will not start without this checkpoint. Make sure the filename + is exactly `CorridorKey.pth` (not `CorridorKey_v1.0.pth`). diff --git a/docs/_snippets/optional-weights.md b/docs/_snippets/optional-weights.md new file mode 100644 index 00000000..18a00384 --- /dev/null +++ b/docs/_snippets/optional-weights.md @@ -0,0 +1,25 @@ +!!! tip "Optional — GVM and VideoMaMa weights" + These modules generate Alpha Hints automatically but have large model files + and extreme hardware requirements. Installing them is **completely optional**; + you can always provide your own Alpha Hints from other software. + + **GVM** (~80 GB VRAM required): + + ```bash + uv run hf download geyongtao/gvm --local-dir gvm_core/weights + ``` + + **VideoMaMa** (originally 80 GB+ VRAM; community optimisations bring it + under 24 GB, though not yet fully integrated here): + + ```bash + # Fine-tuned VideoMaMa weights + uv run hf download SammyLim/VideoMaMa \ + --local-dir VideoMaMaInferenceModule/checkpoints/VideoMaMa + + # Stable Video Diffusion base model (VAE + image encoder, ~2.5 GB) + # Accept the licence at stabilityai/stable-video-diffusion-img2vid-xt first + uv run hf download stabilityai/stable-video-diffusion-img2vid-xt \ + --local-dir VideoMaMaInferenceModule/checkpoints/stable-video-diffusion-img2vid-xt \ + --include "feature_extractor/*" "image_encoder/*" "vae/*" "model_index.json" + ``` diff --git a/docs/_snippets/uv-install.md b/docs/_snippets/uv-install.md new file mode 100644 index 00000000..429e4221 --- /dev/null +++ b/docs/_snippets/uv-install.md @@ -0,0 +1,16 @@ +This project uses **[uv](https://docs.astral.sh/uv/)** to manage Python and all +dependencies. uv is a fast, modern replacement for pip that automatically +handles Python versions, virtual environments, and package installation in a +single step. You do **not** need to install Python yourself — uv does it for +you. + +Install uv if you don't already have it: + +```bash +curl -LsSf https://astral.sh/uv/install.sh | sh +``` + +!!! tip + On Windows the automated `.bat` installers handle uv installation for you. + If you open a new terminal after installing uv and see `'uv' is not + recognized`, close and reopen the terminal so the updated PATH takes effect. diff --git a/docs/ai-assisted-development.md b/docs/ai-assisted-development.md new file mode 100644 index 00000000..c8e9583d --- /dev/null +++ b/docs/ai-assisted-development.md @@ -0,0 +1,282 @@ +# AI-Assisted Development + +CorridorKey ships two files designed to give AI coding assistants deep +context about the project: + +- **`AGENTS.md`** — a structured project guide at the repo root. +- **`docs/LLM_HANDOVER.md`** — a detailed architecture walkthrough and + AI directive reference. + +Together these files cover the codebase layout, dataflow rules, dev +commands, code style, and common pitfalls. Most AI tools can consume +them directly — the sections below explain what each file provides and +how tools discover them. + +--- + +## Context Sources + +### `AGENTS.md` + +`AGENTS.md` sits at the repository root and follows the open +[AGENTS.md standard](https://agents.md). It gives any AI assistant a +compact overview of the project: architecture summary, key file map, +build and test commands, code style settings, platform caveats, and +prohibited actions. + +Because the format is an open standard, multiple AI coding tools — +including GitHub Copilot, Windsurf, and Kiro — read it natively +without extra configuration. Dropping into the repo and opening a +session is often enough to get useful context. + +!!! tip "Read the source" + The full file is at the repo root: + [`AGENTS.md`](../AGENTS.md). + Refer to it directly rather than relying on summaries here. + +### `LLM_HANDOVER.md` + +`LLM_HANDOVER.md` lives in the `docs/` directory and provides a much +deeper technical handover. It covers the GreenFormer architecture in +detail, critical dataflow properties (color space and gamma math), +inference pipeline internals, and AI-specific directives for working +with the codebase. + +If `AGENTS.md` is the quick-reference card, `LLM_HANDOVER.md` is the +full briefing document. Point your assistant at it when you need help +with inference code, compositing math, or EXR pipeline work. + +!!! tip "Read the source" + The full handover document is at + [`docs/LLM_HANDOVER.md`](LLM_HANDOVER.md). + It is the authoritative deep-dive — this page only summarises + what it contains. + +--- + +## Quick Start + +Get a working dev environment and point your AI assistant at the +project context — this works with any tool. + +```bash +git clone https://github.com/nikopueringer/CorridorKey.git +cd CorridorKey +uv sync --group dev # installs all dependencies + dev tools +``` + +Once the repo is cloned, open `AGENTS.md` in your AI assistant as +the first step. It gives the assistant the project layout, key +rules, and common pitfalls in one read. For deeper architecture +context, also point it at `docs/LLM_HANDOVER.md`. + +Core dev commands to keep handy: + +```bash +uv run pytest # run all tests +uv run ruff check # check for lint errors +uv run ruff format --check # check formatting (no changes) +``` + +--- + +## Tool Configuration + +Each AI coding assistant has its own way of loading project context. +Pick your tool below for CorridorKey-specific setup instructions. + +=== "Kiro" + + Kiro uses **steering files** stored in `.kiro/steering/*.md` to + provide persistent project context. Each file is a Markdown + document that Kiro loads according to one of three inclusion + modes: + + | Mode | Behaviour | + |------|-----------| + | **Always-on** (default) | Loaded at the start of every session automatically. | + | **Conditional** | Loaded only when the active file matches a `fileMatch` glob pattern (e.g., `*.py`). | + | **Manual** | User provides the file explicitly via `#` in the chat prompt. | + + To give Kiro the full CorridorKey context, create a steering file + that references both `AGENTS.md` and `LLM_HANDOVER.md`: + + ```markdown title=".kiro/steering/corridorkey-context.md" + # CorridorKey Project Context + + This steering file gives Kiro persistent context about the + CorridorKey codebase. + + ## Primary References + + - Read `AGENTS.md` at the repo root for the project overview, + key file map, build commands, code style, and prohibited + actions. + - Read `docs/LLM_HANDOVER.md` for the deep architecture + walkthrough, dataflow rules, and AI-specific directives. + + ## Key Rules + + - Tensor range is strictly [0.0, 1.0] float. + - Never use pow(x, 2.2) for gamma — use piecewise sRGB + transfer functions in `color_utils.py`. + - Do not modify files in `gvm_core/` or + `VideoMaMaInferenceModule/`. + ``` + +=== "Claude Code" + + Claude Code loads a **`CLAUDE.md`** file from the repository root + automatically at the start of every session. This is the primary + way to give Claude Code persistent project context. + + Create a `CLAUDE.md` that points Claude Code at the existing + context files: + + ```markdown title="CLAUDE.md" + # CorridorKey — Claude Code Context + + Read these files for full project context: + + - `AGENTS.md` — project overview, key file map, build/test + commands, code style, prohibited actions. + - `docs/LLM_HANDOVER.md` — deep architecture walkthrough, + dataflow rules, inference pipeline, AI directives. + + Key rules: + - Tensors are [0.0, 1.0] float. Foreground is sRGB, alpha is + linear. + - Use piecewise sRGB transfer functions, never pow(x, 2.2). + - Do not modify gvm_core/ or VideoMaMaInferenceModule/. + ``` + + Claude Code reads `CLAUDE.md` once at session start, so any + updates require restarting the session to take effect. + +=== "Cursor" + + Cursor uses **rule files** stored in `.cursor/rules/*.md` to + inject project context into the assistant. Each rule file + supports a frontmatter block that controls when it activates: + + | Mode | Frontmatter | Behaviour | + |------|-------------|-----------| + | **Always-on** | `alwaysApply: true` | Loaded in every chat and Cmd-K session. | + | **Glob-based** | `globs: ["*.py"]` | Loaded when the active file matches the pattern. | + | **Manual** | `manual: true` | User includes it explicitly via `@rules`. | + | **Model-decision** | `agentRequested: true` | The model decides whether to load it based on the task description. | + + Example rule file for CorridorKey: + + ```markdown title=".cursor/rules/corridorkey.md" + --- + description: CorridorKey project context and coding rules + alwaysApply: true + --- + + # CorridorKey Context + + Read `AGENTS.md` at the repo root for the project overview, + key file map, and build commands. + + Read `docs/LLM_HANDOVER.md` for the deep architecture + walkthrough and dataflow rules. + + Key rules: + - Tensors are [0.0, 1.0] float. Foreground sRGB, alpha linear. + - Use piecewise sRGB functions, never pow(x, 2.2). + - Do not modify gvm_core/ or VideoMaMaInferenceModule/. + ``` + +=== "GitHub Copilot" + + GitHub Copilot supports project-level instructions via a + **`.github/copilot-instructions.md`** file. This file is + automatically included in Copilot Chat requests to provide + project-specific guidance. + + Copilot also reads **`AGENTS.md`** natively, so CorridorKey's + existing `AGENTS.md` already provides baseline context without + any extra configuration. The instructions file is useful for + adding Copilot-specific guidance beyond what `AGENTS.md` covers. + + ```markdown title=".github/copilot-instructions.md" + # CorridorKey — Copilot Instructions + + This project already has an `AGENTS.md` at the repo root that + Copilot reads automatically. For deeper context, also refer to + `docs/LLM_HANDOVER.md`. + + Key rules: + - Tensors are [0.0, 1.0] float. Foreground sRGB, alpha linear. + - Use piecewise sRGB functions in color_utils.py, never + pow(x, 2.2). + - Do not modify gvm_core/ or VideoMaMaInferenceModule/. + ``` + +=== "Windsurf" + + Windsurf uses **`.windsurf/rules/`** for project-level context + files. Rules placed in this directory are loaded automatically + during coding sessions. + + Windsurf also reads **`AGENTS.md`** files natively with + directory-based auto-scoping — it picks up `AGENTS.md` at the + repo root and applies its content as project-wide context. This + means CorridorKey's existing `AGENTS.md` works out of the box. + + For additional Windsurf-specific rules, create a file in the + rules directory: + + ```markdown title=".windsurf/rules/corridorkey.md" + # CorridorKey Context + + AGENTS.md at the repo root is loaded automatically. For the + deep architecture walkthrough, also read + docs/LLM_HANDOVER.md. + + Key rules: + - Tensors are [0.0, 1.0] float. Foreground sRGB, alpha linear. + - Use piecewise sRGB functions, never pow(x, 2.2). + - Do not modify gvm_core/ or VideoMaMaInferenceModule/. + ``` + +=== "Gemini CLI" + + Gemini CLI uses a **`GEMINI.md`** file at the repository root + for project-level context. It supports hierarchical context + loading across three levels: + + | Level | Location | Scope | + |-------|----------|-------| + | **Global** | `~/.gemini/GEMINI.md` | Applied to all projects on the machine. | + | **Project** | `GEMINI.md` (repo root) | Applied to the current project. | + | **Subdirectory** | `GEMINI.md` in any subdirectory | Applied when working within that directory. | + + Gemini CLI merges context from all three levels, with more + specific files taking precedence. For CorridorKey, a project-level + file is sufficient: + + ```markdown title="GEMINI.md" + # CorridorKey — Gemini CLI Context + + Read these files for full project context: + + - `AGENTS.md` — project overview, key file map, build/test + commands, code style, prohibited actions. + - `docs/LLM_HANDOVER.md` — deep architecture walkthrough, + dataflow rules, inference pipeline, AI directives. + + Key rules: + - Tensors are [0.0, 1.0] float. Foreground sRGB, alpha linear. + - Use piecewise sRGB functions, never pow(x, 2.2). + - Do not modify gvm_core/ or VideoMaMaInferenceModule/. + ``` + +--- + +## Community Contributions + +PRs adding configuration guides for AI tools not yet covered here +are welcome. See the [Contributing](contributing.md) page for the +PR workflow and submission guidelines. diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 00000000..92eea84d --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,112 @@ +# Architecture + +CorridorKey is a neural-network-based green screen removal tool. It takes an +RGB image and a "Coarse Alpha Hint" (a rough mask isolating the subject) and +produces mathematically perfect, physically unmixed Alpha and Foreground +Straight color — with the green screen unmixed from semi-transparent pixels. + +For the full technical handover document aimed at AI assistants, see the +[LLM Handover](LLM_HANDOVER.md) page. + +--- + +## The GreenFormer Model + +The core architecture is called the **GreenFormer**. It combines a +vision-transformer backbone with a convolutional refiner head. + +### Backbone — Hiera + +The backbone is a [timm](https://github.com/huggingface/pytorch-image-models) +implementation of `hiera_base_plus_224.mae_in1k_ft_in1k`. The first layer is +patched to accept **4 input channels** (RGB + Coarse Alpha Hint) instead of the +standard 3. + +### Decoders + +Multiscale feature-fusion heads sit on top of the backbone and predict: + +- **Coarse Alpha** (1 channel) +- **Coarse Foreground** (3 channels) + +### CNN Refiner (`CNNRefinerModule`) + +A custom CNN head built from dilated residual blocks. It receives the original +RGB input together with the coarse predictions from the backbone and outputs +purely **additive Delta Logits**. These deltas are applied directly to the +backbone's outputs before the final Sigmoid activation, refining edge detail +without replacing the backbone's predictions. + +--- + +## Critical Dataflow Properties + +The biggest challenge in this codebase is **color space** and **gamma math**. +When debugging compositing issues, check these rules first. + +### 1. Model Input / Output — Strictly `[0.0, 1.0]` Float Tensors + +- The model assumes inputs are **sRGB**. +- The predicted **Foreground** (`res['fg']`) is natively sRGB — the model is + trained to predict the un-multiplied straight-color foreground element. +- The predicted **Alpha** (`res['alpha']`) is inherently **Linear**. + +### 2. EXR Handling (the `Processed` Output Pass) + +EXR files store Linear float data, premultiplied. To build the `Processed` EXR: + +1. Take the sRGB foreground. +2. Convert it through `srgb_to_linear()` (the piecewise real sRGB transfer + function defined in `color_utils.py` — **not** a pure mathematical + γ = 2.2 curve). +3. Premultiply by the Linear Alpha. +4. Save via OpenCV with `cv2.IMWRITE_EXR_TYPE_HALF`. + +!!! warning "Bug History" + Do **not** apply a pure γ 2.2 curve. Always use the piecewise sRGB + transfer functions in `color_utils.py`. + +### 3. Inference Resizing (`img_size`) + +The engine is strictly trained on **2048 × 2048** crops. In +`inference_engine.py`, `process_frame()` uses OpenCV (Lanczos4) to +upscale/downscale the user's arbitrary input resolution to 2048 × 2048, feeds +the model, and then resizes the predictions back to the original resolution. + +--- + +## Key Source Files + +| File | Responsibility | +|------|----------------| +| `CorridorKeyModule/core/model_transformer.py` | PyTorch architecture definition — Hiera backbone + CNN Refiner head. | +| `CorridorKeyModule/inference_engine.py` | `CorridorKeyEngine` class — loads weights, handles resize API, packs output passes. | +| `CorridorKeyModule/core/color_utils.py` | Pure-math compositing utilities: `srgb_to_linear()`, `premultiply()`, luminance-preserving `despill()`, morphological matte cleaning. | +| `clip_manager.py` | User-facing CLI wizard — scans directories, prompts for settings, pipes frames into the engine. | + +--- + +## Inference Pipeline Overview + +Users typically launch the system via the shell scripts +(`CorridorKey_DRAG_CLIPS_HERE_local.bat` / `.sh`) which boot the +`clip_manager.py` wizard. + +1. **Scan** — Looks for folders containing an `Input` sequence (RGB) and an + `AlphaHint` sequence (BW). +2. **Config** — Prompts for settings (gamma space, despill strength, + auto-despeckle threshold, refiner strength). +3. **Execution** — Loops frame-by-frame, passing `[H, W, 3]` NumPy arrays to + `engine.process_frame()`. +4. **Export** — Writes four output folders: + + | Folder | Format | Color Space | + |--------|--------|-------------| + | `FG/` | Half-float EXR, RGB | sRGB (convert to linear before compositing) | + | `Matte/` | Half-float EXR, Grayscale | Linear | + | `Processed/` | Half-float EXR, RGBA | Linear, Premultiplied | + | `Comp/` | 8-bit PNG | sRGB composite over checkerboard (preview) | + +For deeper implementation details, see the +[CorridorKeyModule README](https://github.com/nikopueringer/CorridorKey/tree/main/CorridorKeyModule) +and the [LLM Handover](LLM_HANDOVER.md) document. diff --git a/docs/contributing.md b/docs/contributing.md new file mode 100644 index 00000000..eebf3618 --- /dev/null +++ b/docs/contributing.md @@ -0,0 +1,153 @@ +# Contributing + +Thanks for your interest in improving CorridorKey! Whether you're a VFX artist, +a pipeline TD, or a machine learning researcher, contributions of all kinds are +welcome — bug reports, feature ideas, documentation fixes, and code. + +## Legal Agreement + +By contributing to this project you agree that your contributions will be +licensed under the project's +**[CorridorKey Licence](https://github.com/nikopueringer/CorridorKey/blob/main/LICENSE)**. + +By submitting a Pull Request you specifically acknowledge and agree to the terms +set forth in **Section 6 (CONTRIBUTIONS)** of the license. This ensures that +Corridor Digital maintains the full right to use, distribute, and sublicense +this codebase, including PR contributions. + +--- + +## Prerequisites + +- Python 3.10 or newer +- [uv](https://docs.astral.sh/uv/) for dependency management + +--8<-- "docs/_snippets/uv-install.md" + +--- + +## Dev Setup + +```bash +git clone https://github.com/nikopueringer/CorridorKey.git +cd CorridorKey +uv sync --group dev # installs all dependencies + dev tools (pytest, ruff) +``` + +That's it. No manual virtualenv creation, no `pip install` — uv handles +everything. + +--- + +## Running Tests + +```bash +uv run pytest # run all tests +uv run pytest -v # verbose (shows each test name) +uv run pytest -m "not gpu" # skip tests that need a CUDA GPU +uv run pytest --cov # show test coverage +``` + +Most tests run in a few seconds and don't need a GPU or model weights. Tests +that require CUDA are marked with `@pytest.mark.gpu` and will be skipped +automatically if no GPU is available. + +--- + +## Apple Silicon (Mac) Notes + +--8<-- "docs/_snippets/apple-silicon-note.md" + +If you are contributing on an Apple Silicon Mac, there are a few extra things to +be aware of. + +### `uv.lock` Drift + +Running `uv run pytest` on macOS regenerates `uv.lock` with macOS-specific +dependency markers. **Do not commit this file.** Before staging your changes, +always run: + +```bash +git restore uv.lock +``` + +### Backend Selection + +CorridorKey auto-detects MPS on Apple Silicon. To test with the MLX backend or +force CPU, set the environment variable before running: + +```bash +export CORRIDORKEY_BACKEND=mlx # use native MLX on Apple Silicon +export CORRIDORKEY_DEVICE=cpu # force CPU (useful for isolating device bugs) +``` + +### MPS Operator Fallback + +If PyTorch raises an error about an unsupported MPS operator, enable CPU +fallback for those ops: + +```bash +export PYTORCH_ENABLE_MPS_FALLBACK=1 +``` + +--- + +## Linting and Formatting + +The project uses [ruff](https://docs.astral.sh/ruff/) for both linting and +formatting. + +```bash +uv run ruff check # check for lint errors +uv run ruff format --check # check formatting (no changes) +uv run ruff format # auto-format your code +``` + +| Setting | Value | +|---------|-------| +| Lint rules | `E, F, W, I, B` (basic style, unused imports, import sorting, common bug patterns) | +| Line length | 120 characters | +| Excluded dirs | `gvm_core/`, `VideoMaMaInferenceModule/` (third-party research code kept close to upstream) | + +CI runs both checks on every pull request. Running them locally before pushing +saves a round-trip. + +--- + +## Making Changes + +### Pull Request Workflow + +1. Fork the repo and create a branch for your change. +2. Make your changes. +3. Run `uv run pytest` and `uv run ruff check` to make sure everything passes. +4. Open a pull request against `main`. + +In your PR description, focus on **why** you made the change, not just what +changed. If you're fixing a bug, describe the symptoms. If you're adding a +feature, explain the use case. A couple of sentences is plenty. + +### What Makes a Good Contribution + +- **Bug fixes** — especially for edge cases in EXR/linear workflows, color + space handling, or platform-specific issues. +- **Tests** — more test coverage is always welcome, particularly for + `clip_manager.py` and `inference_engine.py`. +- **Documentation** — better explanations, usage examples, or clarifying + comments in tricky code. +- **Performance** — reducing GPU memory usage, speeding up frame processing, or + optimizing I/O. + +### Model Weights + +The model checkpoint (`CorridorKey_v1.0.pth`) and optional GVM/VideoMaMa +weights are **not** in the git repo. Most tests don't need them. If you're +working on inference code and need the weights, follow the download instructions +in the [Installation](installation.md) guide. + +--- + +## Questions? + +Join the [Discord](https://discord.gg/zvwUrdWXJm) — it's the fastest way to +get help or discuss ideas before opening a PR. diff --git a/docs/device-and-backend-selection.md b/docs/device-and-backend-selection.md new file mode 100644 index 00000000..c4bfa541 --- /dev/null +++ b/docs/device-and-backend-selection.md @@ -0,0 +1,145 @@ +# Device and Backend Selection + +## Device Selection + +By default, CorridorKey auto-detects the best available compute device in this +priority order: + +**CUDA → MPS → CPU** + +### Override via CLI Flag + +```bash +uv run python clip_manager.py --action wizard --win_path "V:\..." --device mps +uv run python clip_manager.py --action run_inference --device cpu +``` + +### Override via Environment Variable + +```bash +export CORRIDORKEY_DEVICE=cpu +uv run python clip_manager.py --action wizard --win_path "V:\..." +``` + +!!! info "Resolution order" + `--device` flag > `CORRIDORKEY_DEVICE` env var > auto-detect. + +--8<-- "docs/_snippets/apple-silicon-note.md" + +## Backend Selection + +CorridorKey supports two inference backends: + +| Backend | Platforms | Notes | +|---|---|---| +| **Torch** (default on Linux / Windows) | CUDA, MPS, CPU | Standard PyTorch inference. | +| **MLX** (Apple Silicon) | Metal | Native Apple Silicon acceleration — avoids PyTorch's MPS layer entirely and typically runs faster. | + +### Override via CLI Flag + +```bash +uv run python corridorkey_cli.py --action wizard --win_path "/path/to/clips" --backend mlx +uv run python corridorkey_cli.py --action run_inference --backend torch +``` + +### Override via Environment Variable + +```bash +export CORRIDORKEY_BACKEND=mlx +uv run python corridorkey_cli.py --action run_inference +``` + +!!! info "Resolution order" + `--backend` flag > `CORRIDORKEY_BACKEND` env var > auto-detect. + + Auto mode prefers MLX on Apple Silicon when the package is installed. + + +## MLX Setup (Apple Silicon) + +Follow these steps to use the native MLX backend on an M1+ Mac. + +### 1. Install the MLX Extras + +```bash +uv sync --extra mlx +``` + +### 2. Obtain MLX Weights (`.safetensors`) + +=== "Option A — Download Pre-Converted Weights (simplest)" + + ```bash + # Download weights from GitHub Releases into a local cache directory + uv run python -m corridorkey_mlx weights download + + # Print the cached path, then copy to the checkpoints folder + WEIGHTS=$(uv run python -m corridorkey_mlx weights download --print-path) + cp "$WEIGHTS" CorridorKeyModule/checkpoints/corridorkey_mlx.safetensors + ``` + +=== "Option B — Convert from an Existing `.pth` Checkpoint" + + ```bash + # Clone the MLX repo (contains the conversion script) + git clone https://github.com/nikopueringer/corridorkey-mlx.git + cd corridorkey-mlx + uv sync + + # Convert (point --checkpoint at your CorridorKey.pth) + uv run python scripts/convert_weights.py \ + --checkpoint ../CorridorKeyModule/checkpoints/CorridorKey_v1.0.pth \ + --output ../CorridorKeyModule/checkpoints/corridorkey_mlx.safetensors + cd .. + ``` + +Either way, the final file must be at: + +``` +CorridorKeyModule/checkpoints/corridorkey_mlx.safetensors +``` + +### 3. Run + +```bash +CORRIDORKEY_BACKEND=mlx uv run python clip_manager.py --action run_inference +``` + +MLX uses `img_size=2048` by default (same as Torch). + +## Troubleshooting + +### MPS (PyTorch Metal) + +**Confirm MPS is active** — run with verbose logging to see which device was +selected: + +```bash +uv run python clip_manager.py --action list 2>&1 | grep -i "device\|backend\|mps" +``` + +**MPS operator errors** (`NotImplementedError: ... not implemented for 'MPS'`): +Some PyTorch operations are not yet supported on MPS. Enable CPU fallback: + +```bash +export PYTORCH_ENABLE_MPS_FALLBACK=1 +uv run python corridorkey_cli.py --action wizard --win_path "/path/to/clips" +``` + +!!! tip "Make the fallback permanent" + Add `export PYTORCH_ENABLE_MPS_FALLBACK=1` to your shell profile + (`~/.zshrc`) so it is always active. Without it, MPS may silently fall back + to CPU, making runs much slower. + +**Use native MLX instead of PyTorch MPS** — MLX avoids PyTorch's MPS layer +entirely and typically runs faster on Apple Silicon. See the +[MLX Setup](#mlx-setup-apple-silicon) section above. + +### MLX + +| Symptom | Fix | +|---|---| +| `No .safetensors checkpoint found` | Place MLX weights in `CorridorKeyModule/checkpoints/`. | +| `corridorkey_mlx not installed` | Run `uv sync --extra mlx`. | +| `MLX requires Apple Silicon` | MLX only works on M1+ Macs. | +| Auto picked Torch unexpectedly | Set `CORRIDORKEY_BACKEND=mlx` explicitly. | diff --git a/docs/hardware-requirements.md b/docs/hardware-requirements.md new file mode 100644 index 00000000..e5bcb7b0 --- /dev/null +++ b/docs/hardware-requirements.md @@ -0,0 +1,39 @@ +# Hardware Requirements + +CorridorKey was designed and built on a Linux workstation equipped with an +NVIDIA RTX Pro 6000 (96 GB VRAM). The community is actively optimising it for +consumer GPUs — the most recent build should work on cards with **6–8 GB of +VRAM**, and it can run on most Mac systems with unified memory. + +## Core Engine (CorridorKey) + +| Spec | Minimum | Recommended | +|---|---|---| +| GPU VRAM | 6 GB | 8 GB+ | +| Compute | CUDA, MPS, or CPU | CUDA (NVIDIA) | +| System RAM | 8 GB | 16 GB+ | + +The engine dynamically scales inference to its native 2048×2048 backbone, so +more VRAM allows larger plates to be processed without tiling. + +!!! warning "Windows CUDA driver requirement" + To run GPU acceleration natively on Windows, your system **must** have + NVIDIA drivers that support **CUDA 12.8 or higher**. If your drivers only + support older CUDA versions, the installer will likely fall back to the CPU. + +## Optional Modules + +GVM and VideoMaMa are optional Alpha Hint generators with significantly higher +hardware requirements. You do **not** need them — you can always provide your +own Alpha Hints from other software. + +--8<-- "docs/_snippets/optional-weights.md" + +| Module | VRAM Required | Notes | +|---|---|---| +| **GVM** | ~80 GB | Uses massive Stable Video Diffusion models. | +| **VideoMaMa** | 80 GB+ (native) / <24 GB (community optimised) | Community tweaks reduce VRAM, but extreme optimisations are not yet fully integrated in this repo. | + +## Apple Silicon + +--8<-- "docs/_snippets/apple-silicon-note.md" diff --git a/docs/index.md b/docs/index.md index e69de29b..2f8b0bca 100644 --- a/docs/index.md +++ b/docs/index.md @@ -0,0 +1,64 @@ +# CorridorKey + +> 📖 **[GitHub Repository](https://github.com/nikopueringer/CorridorKey)** — Source code, issues, and releases. + +When you film something against a green screen, the edges of your subject +inevitably blend with the green background — creating pixels that mix your +subject's true color with the screen. Traditional keyers struggle to untangle +these colors, and even modern "AI Roto" solutions typically output a harsh +binary mask, destroying the delicate semi-transparent pixels needed for a +realistic composite. + +CorridorKey solves this *unmixing* problem. You input a raw green screen frame, +and the neural network completely separates the foreground object from the green +screen. For every single pixel — even highly transparent ones like motion blur +or out-of-focus edges — the model predicts the true, un-multiplied straight +color of the foreground element alongside a clean, linear alpha channel. + +No more fighting with garbage mattes or agonizing over "core" vs "edge" keys. +Give CorridorKey a hint of what you want, and it separates the light for you. + +## Features + +- **Physically Accurate Unmixing** — Clean extraction of straight color + foreground and linear alpha channels, preserving hair, motion blur, and + translucency. +- **Resolution Independent** — The engine dynamically scales inference to + handle 4K plates while predicting using its native 2048×2048 high-fidelity + backbone. +- **VFX Standard Outputs** — Natively reads and writes 16-bit and 32-bit + Linear float EXR files, preserving true color math for integration in Nuke, + Fusion, or Resolve. +- **Auto-Cleanup** — Includes a morphological cleanup system to automatically + prune any tracking markers or tiny background features that slip through + detection. + +## Get Started + +