Summary
Add hardware-accelerated AV1 video encoding and decoding to StreamKit, leveraging GPU capabilities via VA-API, NVIDIA NVENC/NVDEC, and (future) Vulkan Video. This builds on the CPU AV1 foundation (#216 ) and the GPU compositing work (#194 ) to move toward a zero-copy GPU media pipeline.
Motivation
CPU AV1 encoding (rav1e) at real-time speeds trades significant quality; HW encoders deliver better quality at the same latency
GPU compositing (GPU-accelerated compositing via wgpu #194 ) already keeps frames on GPU — HW codecs reduce CPU↔GPU roundtrips
AV1 has universal HW encode support: Intel (Arc+), AMD (RDNA 3+), NVIDIA (RTX 40xx+)
The ultimate goal is a zero-copy pipeline: GPU decode → GPU composite → GPU encode with no CPU copies
Prerequisites
HW Acceleration Options
Option A: VA-API via cros-codecs / cros-libva
Property
Detail
Crate
cros-codecs v0.0.6 (BSD-3-Clause)
AV1 decode
Supported (VA-API)
AV1 encode
Supported (VA-API)
VP9 decode/encode
Also supported
HW support
Intel, AMD, NVIDIA on Linux (via Mesa/intel-media-driver)
Maturity
Solid — developed for ChromeOS/crosvm, 546k downloads
wgpu interop
No native interop — requires DMA-BUF → Vulkan VkImage → wgpu texture bridge (custom work)
Option B: NVIDIA NVENC/NVDEC (native SDK bindings)
Property
Detail
SDK
NVIDIA Video Codec SDK
AV1 encode
Supported (RTX 40xx+, Ada Lovelace)
AV1 decode
Supported (RTX 30xx+, Ampere)
Rust bindings
Would need custom FFI bindings or use existing crates like nvidia-video-codec-sdk
HW support
NVIDIA GPUs only
Performance
Best-in-class for NVIDIA hardware, lower latency than VA-API on NVIDIA
wgpu interop
CUDA↔Vulkan interop is possible but complex
Option C: Vulkan Video via vk-video
Property
Detail
Crate
vk-video v0.2.1 (MIT)
AV1 support
Not yet — currently H.264 only
wgpu interop
Native! — built on wgpu, decoded frames are wgpu::Textures
Maturity
Very early (1.2k downloads), developed by Software Mansion for Smelter
Vulkan spec
VK_KHR_video_decode_av1 ratified Feb 2024, VK_KHR_video_encode_av1 ratified Nov 2024
Potential
Best long-term option for zero-copy pipeline, but needs AV1 support added
Proposed Phased Approach
Phase 2a: VA-API HW acceleration (most practical today)
// crates/nodes/src/video/vaapi.rs (new)
/// VA-API accelerated AV1 encoder using cros-codecs.
pub struct VaapiAv1EncoderNode { config : VaapiAv1EncoderConfig }
// Input: I420/NV12 raw frames
// Output: AV1 encoded packets
// Uses cros_codecs::encoder for VA-API AV1 encoding
/// VA-API accelerated AV1 decoder using cros-codecs.
pub struct VaapiAv1DecoderNode { config : VaapiAv1DecoderConfig }
// Input: AV1 encoded packets
// Output: NV12 frames (VA surface → mapped to CPU buffer)
/// Runtime capability detection
pub fn probe_vaapi_capabilities ( ) -> VaapiCapabilities { ... }
// Checks available VA-API profiles/entrypoints for AV1 encode/decode
Feature: vaapi = ["dep:cros-codecs"]
Fallback: auto-detect VA-API at runtime, fall back to CPU rav1e/rav1d if unavailable
Works on Intel/AMD/NVIDIA on Linux
Docker: requires --gpus flag + VA-API drivers
Phase 2b: NVIDIA NVENC/NVDEC (optional, for best NVIDIA performance)
Feature: nvenc = ["dep:nvidia-video-codec-sdk"] (or custom FFI bindings)
Only worth pursuing if VA-API performance on NVIDIA is insufficient
NVENC typically has lower latency and better rate control than VA-API on NVIDIA
Phase 2c: Vulkan Video zero-copy pipeline (future, when ecosystem matures)
AV1 packets → Vulkan Video decode → wgpu::Texture (NV12)
→ GPU compositor (wgpu, already merged) → wgpu::Texture (output)
→ Vulkan Video encode → AV1 packets
No CPU roundtrips — frames never leave GPU memory
Requires vk-video crate to add AV1 support, OR building Vulkan Video AV1 directly using ash + wgpu Vulkan interop
The wgpu Device and Queue are shared between compositor and codec
This is the ultimate performance target
Tasks
Open Questions
VA-API vs NVENC priority — Start with VA-API (cross-vendor) or NVENC (best NVIDIA perf)?
DMA-BUF → wgpu bridge — Build custom interop for Phase 2a, or wait for Vulkan Video (Phase 2c)?
Auto-selection — Should the runtime automatically choose HW > CPU, or let the user configure?
Docker story — Separate Dockerfile.gpu-vaapi / Dockerfile.gpu-nvenc, or unified with runtime detection?
Summary
Add hardware-accelerated AV1 video encoding and decoding to StreamKit, leveraging GPU capabilities via VA-API, NVIDIA NVENC/NVDEC, and (future) Vulkan Video. This builds on the CPU AV1 foundation (#216) and the GPU compositing work (#194) to move toward a zero-copy GPU media pipeline.
Motivation
GPU decode → GPU composite → GPU encodewith no CPU copiesPrerequisites
av1feature,Av1EncoderNode/Av1DecoderNodeinterfaces, and MoQ codec negotiation)HW Acceleration Options
Option A: VA-API via
cros-codecs/cros-libvacros-codecsv0.0.6 (BSD-3-Clause)Option B: NVIDIA NVENC/NVDEC (native SDK bindings)
nvidia-video-codec-sdkOption C: Vulkan Video via
vk-videovk-videov0.2.1 (MIT)wgpu::TexturesVK_KHR_video_decode_av1ratified Feb 2024,VK_KHR_video_encode_av1ratified Nov 2024Proposed Phased Approach
Phase 2a: VA-API HW acceleration (most practical today)
vaapi = ["dep:cros-codecs"]--gpusflag + VA-API driversPhase 2b: NVIDIA NVENC/NVDEC (optional, for best NVIDIA performance)
nvenc = ["dep:nvidia-video-codec-sdk"](or custom FFI bindings)Phase 2c: Vulkan Video zero-copy pipeline (future, when ecosystem matures)
vk-videocrate to add AV1 support, OR building Vulkan Video AV1 directly usingash+ wgpu Vulkan interopDeviceandQueueare shared between compositor and codecTasks
cros-codecs(PoC)VaapiAv1EncoderNodeandVaapiAv1DecoderNodevk-videoAV1 progress for future zero-copy integrationOpen Questions
Dockerfile.gpu-vaapi/Dockerfile.gpu-nvenc, or unified with runtime detection?