Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 165 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# CorridorKey — AGENTS.md

> Agent-facing project guide following the [AGENTS.md open format](https://agents.md).
> For deeper architectural detail, see [`docs/LLM_HANDOVER.md`](docs/LLM_HANDOVER.md).

## Project Overview

**CorridorKey** is a neural-network-based green screen removal tool built for professional VFX pipelines. Unlike traditional keyers that produce binary masks, CorridorKey physically unmixes the foreground from the green screen at every pixel — including semi-transparent regions like motion blur, hair, and out-of-focus edges.

**Core inputs:**

- **RGB image** — the green screen plate (sRGB color gamut).
- **Coarse Alpha Hint** — a rough black-and-white mask isolating the subject (does not need to be precise).

**Core outputs:**

- **Alpha** — a clean, linear alpha channel.
- **Foreground Straight** — the un-multiplied straight color of the foreground element (sRGB), with the green screen contribution removed.

**License:** [CC-BY-NC-SA-4.0](LICENSE)

## Architecture & Dataflow

### GreenFormer Architecture

The core model is called the **GreenFormer**:

- **Backbone:** A `timm` **Hiera** vision-transformer (`hiera_base_plus_224.mae_in1k_ft_in1k`), patched to accept 4 input channels (RGB + Coarse Alpha Hint).
- **Decoders:** Multiscale feature fusion heads predicting coarse Alpha (1 ch) and Foreground (3 ch) logits.
- **Refiner (`CNNRefinerModule`):** A custom CNN head with dilated residual blocks. It takes the original RGB input and the coarse predictions, outputting purely additive delta logits applied to the backbone outputs before final Sigmoid activation.

### Dataflow Rules

1. **Tensor range:** Model input and output are strictly `[0.0, 1.0]` float tensors. The foreground is sRGB; the alpha is linear.
2. **EXR pipeline:** To build the `Processed` EXR output, the sRGB foreground is converted via the piecewise `srgb_to_linear()` function, then premultiplied by the linear alpha, and saved as half-float EXR (`cv2.IMWRITE_EXR_TYPE_HALF`).
3. **Inference resizing:** The engine is trained on **2048×2048** crops. `inference_engine.py` uses OpenCV **Lanczos4** to resize arbitrary input to 2048×2048, runs inference, then resizes predictions back to the original resolution.
4. **Despill:** A luminance-preserving `despill()` function removes residual green contamination from the foreground.

> ⚠️ **Gamma 2.2 warning:** Never apply a pure mathematical gamma 2.2 curve. Always use the piecewise sRGB transfer functions defined in `color_utils.py`. A naive power-law curve will produce incorrect results in the toe region and break compositing math.

## Key File Map

| Path | Responsibility |
|---|---|
| `CorridorKeyModule/core/model_transformer.py` | GreenFormer PyTorch architecture (Hiera backbone, decoders, CNNRefinerModule) |
| `CorridorKeyModule/inference_engine.py` | `CorridorKeyEngine` class — loads weights, handles 2048×2048 resize and frame processing API |
| `CorridorKeyModule/core/color_utils.py` | Pure math for compositing: `srgb_to_linear()`, `linear_to_srgb()`, `premultiply()`, `despill()` |
| `clip_manager.py` | User-facing CLI wizard — directory scanning, inference settings, piping data to the engine |
| `device_utils.py` | Compute device detection and selection (CUDA / MPS / CPU), backend resolution |
| `backend/` | FastAPI-based backend service: job queue, project management, FFmpeg tools, frame I/O |

## Dev Environment Setup

**Prerequisites:** Python ≥ 3.10 and [uv](https://docs.astral.sh/uv/).

uv handles Python installation, virtual environment creation, and package management — no manual `pip install` or virtualenv setup required.

```bash
git clone https://github.com/nikopueringer/CorridorKey.git
cd CorridorKey
uv sync --group dev # installs all dependencies + dev tools (pytest, ruff, hypothesis)
```

## Build & Test Commands

```bash
uv run pytest # run all tests
uv run pytest -v # verbose output
uv run pytest -m "not gpu" # skip GPU-dependent tests
uv run ruff check # lint check
uv run ruff format --check # formatting check (no changes)
uv run ruff format # auto-format
```

Tests that require a CUDA GPU are marked with `@pytest.mark.gpu` and are automatically skipped when no GPU is available. CI runs `pytest -m "not gpu"` to exclude them.

## Code Style

The project uses **[Ruff](https://docs.astral.sh/ruff/)** for both linting and formatting.

| Setting | Value |
|---|---|
| Lint rules | `E`, `F`, `W`, `I`, `B` |
| Line length | 120 |
| Target version | `py311` |
| Excluded dirs | `gvm_core/`, `VideoMaMaInferenceModule/` |

`gvm_core/` and `VideoMaMaInferenceModule/` are third-party research code kept close to upstream — they are excluded from lint enforcement.

## Platform-Specific Caveats

### Apple Silicon (macOS)

- **MPS operator fallback:** Some PyTorch operations are not yet implemented for MPS. Enable CPU fallback:
```bash
export PYTORCH_ENABLE_MPS_FALLBACK=1
```

### Windows

- **CUDA 12.8:** GPU acceleration on Windows requires NVIDIA drivers supporting **CUDA 12.8** or higher. Older drivers will cause a silent fallback to CPU.

## Prohibited Actions

1. **Do not apply a pure gamma 2.2 curve.** Always use the piecewise sRGB transfer functions in `color_utils.py`. A naive `pow(x, 2.2)` breaks the toe region and produces incorrect compositing results.
2. **Do not modify files inside `gvm_core/` or `VideoMaMaInferenceModule/`.** These are third-party research modules kept close to upstream. Changes should be made in wrapper code or upstream PRs.

## PR Workflow & GitHub Templates

### Workflow

1. Fork the repo and create a branch for your change.
2. Make your changes.
3. Run `uv run pytest` and `uv run ruff check` to verify everything passes.
4. Open a pull request against `main`.

PR descriptions should focus on **why** the change was made, not just what changed. If fixing a bug, describe the symptoms. If adding a feature, explain the use case.

Before preparing any pull request, check `.github/` for PR templates, issue templates, and CI workflows.

### PR Template

The repository includes a PR template (`.github/pull_request_template.md`) with the following structure:

- **"What does this change?"** — Explain the motivation and scope of the change.
- **"How was it tested?"** — Describe specific test steps or commands run to verify correctness.
- **Checklist:**
- `uv run pytest` passes
- `uv run ruff check` passes
- `uv run ruff format --check` passes

Fill in all sections thoroughly. The "What does this change?" section should explain motivation, and "How was it tested?" should describe specific test steps or commands run.

### CI Workflow (`ci.yml`)

Runs on every push and pull request to `main`:

- **Lint job:** `ruff format --check` + `ruff check`.
- **Test job:** `pytest -v --tb=short -m "not gpu"` on Python **3.10** and **3.13**. GPU tests are excluded via the `-m "not gpu"` marker filter.

### Docs Workflow (`docs.yml`)

Triggers on pushes to `main` that change files matching `docs/**` or `zensical.toml`. Builds and deploys the documentation site to GitHub Pages via **Zensical**.

## Documentation Accuracy

When making code changes, evaluate whether the change affects the accuracy of existing documentation. If a code change alters behavior, CLI flags, file paths, or configuration described in the docs, flag or update the outdated documentation.

Documentation files to check:

- `README.md`
- `CONTRIBUTING.md`
- `AGENTS.md`
- `docs/LLM_HANDOVER.md`
- All pages under `docs/`

## AI Directives

- **Skip basic tutorials.** The user is a VFX professional and coder. Dive straight into advanced implementation guidance, but document math thoroughly.
- **Prioritize performance.** This is video processing — every `.numpy()` transfer or `cv2.resize` matters in a loop running on 4K footage.
- **Check sRGB-to-linear conversion order.** If the user reports "crushed shadows" or "dark fringes", the problem is almost certainly an sRGB-to-linear conversion step happening in the wrong order inside `color_utils.py`.

## Further Reading

- [`docs/LLM_HANDOVER.md`](docs/LLM_HANDOVER.md) — Detailed architecture walkthrough, dataflow properties, inference pipeline, and AI directives for the CorridorKey codebase.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

https://github.com/user-attachments/assets/1fb27ea8-bc91-4ebc-818f-5a3b5585af08

> 📖 **[Full Documentation](https://nikopueringer.github.io/CorridorKey)** — Installation guides, usage instructions, and developer docs.

When you film something against a green screen, the edges of your subject inevitably blend with the green background. This creates pixels that are a mix of your subject's color and the green screen's color. Traditional keyers struggle to untangle these colors, forcing you to spend hours building complex edge mattes or manually rotoscoping. Even modern "AI Roto" solutions typically output a harsh binary mask, completely destroying the delicate, semi-transparent pixels needed for a realistic composite.

Expand Down
18 changes: 18 additions & 0 deletions docs/_snippets/apple-silicon-note.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
!!! note "Apple Silicon (MPS / MLX)"
CorridorKey runs on Apple Silicon Macs using unified memory. Two backend
options are available:

- **MPS** — PyTorch's Metal Performance Shaders backend. Works out of the
box but some operators may require the CPU fallback flag:

```bash
export PYTORCH_ENABLE_MPS_FALLBACK=1
```

- **MLX** — Native Apple Silicon acceleration via the
[MLX framework](https://github.com/ml-explore/mlx). Avoids PyTorch's MPS
layer entirely and typically runs faster. Requires installing the MLX
extras (`uv sync --extra mlx`) and obtaining `.safetensors` weights.

Because Apple Silicon shares memory between the CPU and GPU, the full
system RAM is available to the model — no separate VRAM budget applies.
14 changes: 14 additions & 0 deletions docs/_snippets/model-download.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
**Download the CorridorKey checkpoint (~300 MB):**

[Download CorridorKey_v1.0.pth from Hugging Face](https://huggingface.co/nikopueringer/CorridorKey_v1.0/resolve/main/CorridorKey_v1.0.pth){ .md-button }

Place the file inside `CorridorKeyModule/checkpoints/` and rename it to
**`CorridorKey.pth`** so the final path is:

```
CorridorKeyModule/checkpoints/CorridorKey.pth
```

!!! warning
The engine will not start without this checkpoint. Make sure the filename
is exactly `CorridorKey.pth` (not `CorridorKey_v1.0.pth`).
25 changes: 25 additions & 0 deletions docs/_snippets/optional-weights.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
!!! tip "Optional — GVM and VideoMaMa weights"
These modules generate Alpha Hints automatically but have large model files
and extreme hardware requirements. Installing them is **completely optional**;
you can always provide your own Alpha Hints from other software.

**GVM** (~80 GB VRAM required):

```bash
uv run hf download geyongtao/gvm --local-dir gvm_core/weights
```

**VideoMaMa** (originally 80 GB+ VRAM; community optimisations bring it
under 24 GB, though not yet fully integrated here):

```bash
# Fine-tuned VideoMaMa weights
uv run hf download SammyLim/VideoMaMa \
--local-dir VideoMaMaInferenceModule/checkpoints/VideoMaMa

# Stable Video Diffusion base model (VAE + image encoder, ~2.5 GB)
# Accept the licence at stabilityai/stable-video-diffusion-img2vid-xt first
uv run hf download stabilityai/stable-video-diffusion-img2vid-xt \
--local-dir VideoMaMaInferenceModule/checkpoints/stable-video-diffusion-img2vid-xt \
--include "feature_extractor/*" "image_encoder/*" "vae/*" "model_index.json"
```
16 changes: 16 additions & 0 deletions docs/_snippets/uv-install.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
This project uses **[uv](https://docs.astral.sh/uv/)** to manage Python and all
dependencies. uv is a fast, modern replacement for pip that automatically
handles Python versions, virtual environments, and package installation in a
single step. You do **not** need to install Python yourself — uv does it for
you.

Install uv if you don't already have it:

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

!!! tip
On Windows the automated `.bat` installers handle uv installation for you.
If you open a new terminal after installing uv and see `'uv' is not
recognized`, close and reopen the terminal so the updated PATH takes effect.
Loading
Loading