feat(nodes): add HW video codec backends (Vulkan Video H.264, VA-API AV1, NVENC/NVDEC AV1)#279
feat(nodes): add HW video codec backends (Vulkan Video H.264, VA-API AV1, NVENC/NVDEC AV1)#279staging-devin-ai-integration[bot] wants to merge 20 commits intomainfrom
Conversation
…AV1, NVENC/NVDEC AV1) Implement hardware-accelerated video encoding and decoding for StreamKit, targeting Linux with Intel and NVIDIA GPUs (issue #217). Three backends behind optional feature flags: vulkan_video — H.264 encode/decode via Vulkan Video (vk-video v0.3). Cross-vendor (Intel ANV, NVIDIA, AMD RADV). Includes lazy encoder creation on first frame for resolution detection, NV12/I420 input support, and configurable bitrate/framerate/keyframe interval. vaapi — AV1 encode/decode via VA-API (cros-codecs v0.0.6). Primarily Intel (intel-media-driver), also AMD. Uses GBM surfaces for zero-copy VA-API buffer management. Includes stride-aware NV12 plane read/write helpers with odd-width correctness. nvcodec — AV1 encode/decode via NVENC/NVDEC (shiguredo_nvcodec v2025.2). NVIDIA only (RTX 30xx+ decode, RTX 40xx+ AV1 encode). Dynamic CUDA loading — no build-time CUDA Toolkit required for the host binary. All backends share: - HwAccelMode enum (auto/force_hw/force_cpu) for graceful fallback - ProcessorNode trait integration with health reporting - Consistent config structs with serde deny_unknown_fields validation - Comprehensive unit tests (mock-based, no GPU required) Closes #217 Signed-off-by: Devin AI <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
The self-hosted GPU runner (skit-demo-eu-gpu) has an NVIDIA GPU but the CI workflow wasn't exercising the nvcodec feature tests. Add the missing cargo test invocation so NVENC/NVDEC AV1 tests run alongside the existing GPU compositor tests. Signed-off-by: Devin AI <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
The shiguredo_nvcodec build script requires cuda.h at compile time. Install nvidia-cuda-toolkit on the self-hosted GPU runner if CUDA headers aren't already present. Signed-off-by: Devin AI <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Ubuntu's nvidia-cuda-toolkit installs cuda.h to /usr/include, but shiguredo_nvcodec's build script defaults to /usr/local/cuda/include. Set CUDA_INCLUDE_PATH=/usr/include so the build finds the headers. Signed-off-by: Devin AI <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
| let encode_task = tokio::task::spawn_blocking(move || { | ||
| // Encoder and device are lazily initialised on the first frame | ||
| // so we know the actual resolution. | ||
| let mut encoder: Option<vk_video::BytesEncoder> = None; | ||
| let mut device: Option<Arc<vk_video::VulkanDevice>> = None; | ||
| let mut current_dimensions: Option<(u32, u32)> = None; | ||
|
|
||
| while let Some((frame, metadata)) = encode_rx.blocking_recv() { | ||
| if result_tx.is_closed() { | ||
| return; | ||
| } | ||
|
|
||
| let dims = (frame.width, frame.height); | ||
|
|
||
| // (Re-)create encoder when dimensions change. | ||
| if current_dimensions != Some(dims) { | ||
| tracing::info!( | ||
| "VulkanVideoH264EncoderNode: (re)creating encoder for {}×{}", | ||
| dims.0, | ||
| dims.1, | ||
| ); | ||
|
|
||
| let dev = match init_vulkan_encode_device(device.as_ref()) { | ||
| Ok(d) => d, | ||
| Err(err) => { | ||
| let _ = result_tx.blocking_send(Err(err)); | ||
| return; | ||
| }, | ||
| }; | ||
|
|
||
| let max_bitrate = u64::from( | ||
| config.max_bitrate.unwrap_or_else(|| config.bitrate.saturating_mul(4)), | ||
| ); | ||
|
|
||
| let output_params = match dev.encoder_output_parameters_high_quality( | ||
| vk_video::parameters::RateControl::VariableBitrate { | ||
| average_bitrate: u64::from(config.bitrate), | ||
| max_bitrate, | ||
| virtual_buffer_size: Duration::from_secs(2), | ||
| }, | ||
| ) { | ||
| Ok(p) => p, | ||
| Err(err) => { | ||
| let _ = result_tx.blocking_send(Err(format!( | ||
| "failed to get encoder output parameters: {err}" | ||
| ))); | ||
| return; | ||
| }, | ||
| }; | ||
|
|
||
| let width = NonZeroU32::new(dims.0).unwrap_or(NonZeroU32::MIN); | ||
| let height = NonZeroU32::new(dims.1).unwrap_or(NonZeroU32::MIN); | ||
|
|
||
| let enc = | ||
| match dev.create_bytes_encoder(vk_video::parameters::EncoderParameters { | ||
| input_parameters: vk_video::parameters::VideoParameters { | ||
| width, | ||
| height, | ||
| target_framerate: config.framerate.into(), | ||
| }, | ||
| output_parameters: output_params, | ||
| }) { | ||
| Ok(e) => e, | ||
| Err(err) => { | ||
| let _ = result_tx.blocking_send(Err(format!( | ||
| "failed to create BytesEncoder: {err}" | ||
| ))); | ||
| return; | ||
| }, | ||
| }; | ||
|
|
||
| device = Some(dev); | ||
| encoder = Some(enc); | ||
| current_dimensions = Some(dims); | ||
| } | ||
|
|
||
| let Some(enc) = encoder.as_mut() else { | ||
| let _ = result_tx.blocking_send(Err("encoder not initialised".to_string())); | ||
| return; | ||
| }; | ||
|
|
||
| // Convert I420 → NV12 if necessary. | ||
| let nv12_data = match frame.pixel_format { | ||
| PixelFormat::Nv12 => frame.data.as_slice().to_vec(), | ||
| PixelFormat::I420 => i420_to_nv12(&frame), | ||
| other => { | ||
| let _ = result_tx.blocking_send(Err(format!( | ||
| "VulkanVideoH264EncoderNode: unsupported pixel format {other:?}, \ | ||
| expected NV12 or I420" | ||
| ))); | ||
| continue; | ||
| }, | ||
| }; | ||
|
|
||
| let force_keyframe = metadata.as_ref().and_then(|m| m.keyframe).unwrap_or(false); | ||
|
|
||
| let input_frame = vk_video::InputFrame { | ||
| data: vk_video::RawFrameData { | ||
| frame: nv12_data, | ||
| width: frame.width, | ||
| height: frame.height, | ||
| }, | ||
| pts: metadata.as_ref().and_then(|m| m.timestamp_us), | ||
| }; | ||
|
|
||
| let encode_start = Instant::now(); | ||
| let result = enc.encode(&input_frame, force_keyframe); | ||
| encode_duration_histogram.record(encode_start.elapsed().as_secs_f64(), &[]); | ||
|
|
||
| match result { | ||
| Ok(encoded_chunk) => { | ||
| // Always propagate the keyframe flag, even when | ||
| // the input had no metadata. Without this, | ||
| // downstream RTMP/MoQ transport cannot detect | ||
| // keyframes for stream initialisation. | ||
| let out_meta = match metadata { | ||
| Some(mut m) => { | ||
| m.keyframe = Some(encoded_chunk.is_keyframe); | ||
| Some(m) | ||
| }, | ||
| None => Some(PacketMetadata { | ||
| timestamp_us: None, | ||
| duration_us: None, | ||
| sequence: None, | ||
| keyframe: Some(encoded_chunk.is_keyframe), | ||
| }), | ||
| }; | ||
|
|
||
| let output = EncoderOutput { | ||
| data: Bytes::from(encoded_chunk.data), | ||
| metadata: out_meta, | ||
| }; | ||
| if result_tx.blocking_send(Ok(output)).is_err() { | ||
| return; | ||
| } | ||
| }, | ||
| Err(err) => { | ||
| let _ = result_tx | ||
| .blocking_send(Err(format!("Vulkan Video H.264 encode error: {err}"))); | ||
| }, | ||
| } | ||
| } | ||
| }); |
There was a problem hiding this comment.
🚩 Vulkan Video encoder uses custom encode task instead of shared EncoderNodeRunner pattern
The VulkanVideoH264EncoderNode implements its own blocking encode task inline (lines 496-638) rather than using the StandardVideoEncoder / EncoderNodeRunner trait pattern used by all other encoders (nv_av1.rs:475-493, vaapi_av1.rs:824-842, vp9.rs, av1.rs, openh264.rs). This means it misses the shared infrastructure in encoder_trait.rs including: end-of-stream flush, flush-on-dimension-change, RGBA8 rejection, and the FrameBudgetMonitor for detecting real-time encoding lag. The missing flush is reported as a bug; the missing budget monitor is an additional gap that could hide performance issues in production.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Acknowledged — this is a valid finding. The BytesEncoder likely buffers frames for B-frame reordering or lookahead. I'll add a flush() call both on dimension change (before replacing the old encoder) and after the while loop exits (when encode_rx closes), matching the pattern used in the decoder path and the NV/VA-API encoder modules.
| let mut coded_width: u32 = 0; | ||
| let mut coded_height: u32 = 0; | ||
|
|
||
| while let Some((data, metadata)) = decode_rx.blocking_recv() { | ||
| if result_tx.is_closed() { | ||
| return; | ||
| } | ||
|
|
||
| let decode_start = Instant::now(); | ||
| let timestamp = metadata.as_ref().and_then(|m| m.timestamp_us).unwrap_or(0); | ||
|
|
||
| // Feed bitstream to the decoder. The decoder may process it in | ||
| // multiple chunks and may require event handling between calls. | ||
| let mut offset = 0usize; | ||
| let bitstream = data.as_ref(); | ||
| let mut eagain_empty_retries: u32 = 0; | ||
|
|
||
| while offset < bitstream.len() { | ||
| let gbm_ref = Arc::clone(&gbm); | ||
| let cw = coded_width; | ||
| let ch = coded_height; | ||
| let mut alloc_cb = move || { | ||
| gbm_ref | ||
| .clone() | ||
| .new_frame( | ||
| nv12_fourcc(), | ||
| CrosResolution { width: cw, height: ch }, | ||
| CrosResolution { width: cw, height: ch }, | ||
| GbmUsage::Decode, | ||
| ) | ||
| .ok() | ||
| }; |
There was a problem hiding this comment.
📝 Info: VA-API decoder initial coded dimensions are 0×0 until FormatChanged fires
In the VA-API decode loop (vaapi_av1.rs:537-538), coded_width and coded_height start at 0. The GBM allocation callback at line 559-568 captures these values. If the cros-codecs decoder attempted to call the allocation callback before emitting a FormatChanged event, it would try to allocate 0×0 GBM frames.
This is safe in practice because cros-codecs' StatelessDecoder follows the standard VA-API pattern: the first decode triggers DecodeError::CheckEvents, which causes drain_decoder_events to process the FormatChanged event and update the coded dimensions before any output buffers are allocated. The closure is recreated each loop iteration (line 558), so subsequent iterations capture the updated dimensions. Confirmed this is a standard decoder initialization pattern, not a bug.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Acknowledged. The alloc_cb closure does return Ok(None) via .ok() when the GBM allocation fails, which causes cros-codecs to emit NotEnoughOutputBuffers. The decoder loop handles this by retrying after yielding. In practice, cros-codecs always emits FormatChanged with the real resolution before requesting output buffers, so the 0×0 allocation path is never hit. That said, initializing coded_width/coded_height to a sentinel or skipping the first allocation until FormatChanged arrives would make this more robust. Will address in a follow-up if the maintainer agrees.
Remove conditional nvidia-cuda-toolkit install (already pre-installed on the self-hosted runner) and add BINDGEN_EXTRA_CLANG_ARGS to point bindgen at the LLVM 18 clang builtin includes so stddef.h is found. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
The streamkit-engine GPU test binary segfaults (SIGSEGV) during cleanup after all 25 tests pass — this is a pre-existing issue likely related to wgpu/Vulkan teardown. Move the nvcodec node tests before the engine GPU tests so they are not blocked by the crash. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
The force_cpu_encoder_rejected test was constructing NvAv1EncoderConfig with all fields explicitly but missed the new framerate field added in the review-fix round. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…ove dead code - Add cfg-gated registration calls for vulkan_video, vaapi, and nvcodec nodes in register_video_nodes() — without these, users enabling the features would get 'node not found' errors at runtime. - Fix i420_to_nv12 in vulkan_video.rs to use div_ceil(2) for chroma dimensions instead of truncating integer division (h/2, w/2), matching the correct implementation in nv_av1.rs. - Update HwAccelMode::Auto doc comment to accurately reflect that HW-only nodes do not implement CPU fallback — Auto and ForceHw behave identically; CPU fallback is achieved by selecting a different (software) node at the pipeline level. - Remove dead default_quality() and default_framerate() functions in vaapi_av1.rs (unused — the struct uses a manual Default impl). - Add registration regression tests to nv_av1 and vaapi_av1 modules. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
| } | ||
| }); |
There was a problem hiding this comment.
🔴 Vulkan Video H.264 encoder never flushes — loses trailing frames at end-of-stream
The blocking encode task in VulkanVideoH264EncoderNode exits the while let Some(...) = encode_rx.blocking_recv() loop (line 637) without flushing the encoder. H.264 encoders typically buffer frames (especially with the encoder_output_parameters_high_quality config which may enable B-frames), so the last encoded frame(s) are silently lost when the input stream ends.
The decoder in the same file correctly calls decoder.flush() at crates/nodes/src/video/vulkan_video.rs:245. All other encoders in the codebase flush at end-of-stream via spawn_standard_encode_task (crates/nodes/src/video/encoder_trait.rs:436-451). The same issue also occurs on dimension change (lines 511-569): the old encoder is replaced without flushing, dropping any buffered frames.
Missing flush after encode loop (compare with decoder flush at line 245)
After line 637 (end of while loop), there should be flush logic similar to:
if let Some(enc) = encoder.as_mut() {
if !result_tx.is_closed() {
// flush remaining frames from enc
}
}Prompt for agents
The Vulkan Video H.264 encoder's blocking encode task (spawn_blocking closure starting at line 496) never flushes the encoder. When the encode_rx channel closes (line 637, end of while loop), the encoder is simply dropped.
Two fixes are needed:
1. End-of-stream flush: After the while loop exits (line 637), add flush logic. The BytesDecoder in the same file uses decoder.flush() at line 245 — the BytesEncoder likely has an analogous method. Flush remaining frames and send them through result_tx, matching the pattern in encoder_trait.rs lines 436-451.
2. Dimension-change flush: Before replacing the encoder on dimension change (line 567-568), flush the old encoder first. The other HW encoders (NvAv1Encoder, VaapiAv1Encoder) handle this via StandardVideoEncoder::flush_on_dimension_change() -> true in spawn_standard_encode_task. Since VulkanVideoH264EncoderNode uses a custom encode task, this must be done manually.
Consider refactoring VulkanVideoH264EncoderNode to use the StandardVideoEncoder / EncoderNodeRunner trait pattern like the NV AV1 and VA-API AV1 encoders do, which would get both flush behaviors for free.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Valid finding. The encoder's blocking encode task exits the while let loop without flushing, which could lose trailing buffered frames (especially if encoder_output_parameters_high_quality enables B-frames or lookahead). The decoder in the same file correctly calls decoder.flush() at line 245.
Two fixes needed:
- End-of-stream flush: After the
whileloop (line 637), call the encoder's flush method and forward any remaining frames. - Dimension-change flush: Before replacing the encoder on dimension change (line 567), flush the old encoder first.
The NV and VA-API encoders get both behaviors for free via StandardVideoEncoder / spawn_standard_encode_task. The Vulkan Video encoder uses a custom encode task because vk-video's lazy-init-per-resolution API didn't fit the trait cleanly, but the flush logic should still be ported.
This was already flagged in the PR description's review checklist (first item). Will fix if requested.
…plane offsets - vulkan_video.rs: document that vk-video 0.3.0 BytesEncoder has no flush() method (unlike BytesDecoder); frame-at-a-time, no B-frames - nv_av1.rs: reject cuda_device > i32::MAX at construction time instead of silently wrapping via 'as i32' cast - vaapi_av1.rs: use gbm_frame.get_plane_offset() for FrameLayout instead of manually computing y_stride * coded_height; also fix stride fallback to use coded_width instead of display width Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…reamkit-nodes Without these forwarding features, `just extra_features="--features vulkan_video" skit` would silently ignore the feature since streamkit-server didn't know about it. Adds vulkan_video, vaapi, and nvcodec feature forwarding, matching the existing pattern for svt_av1 and dav1d. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Add oneshot and dynamic (MoQ) sample pipelines for each HW video codec backend: - Vulkan Video H.264: video_vulkan_video_h264_colorbars (oneshot + MoQ) - VA-API AV1: video_vaapi_av1_colorbars (oneshot + MoQ) - NVENC AV1: video_nv_av1_colorbars (oneshot + MoQ) Each oneshot pipeline generates SMPTE color bars, HW-encodes, muxes into a container (MP4 for H.264, WebM for AV1), and outputs via HTTP. Each dynamic pipeline generates color bars, HW-encodes, and streams via MoQ for live playback in the browser. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
get_plane_offset() is private in cros-codecs 0.0.6. Fall back to computing the UV plane offset from pitch × coded_height, which is correct for linear NV12 allocations used by VA-API encode surfaces. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Add vaapi_h264 module with VaapiH264EncoderNode and VaapiH264DecoderNode using cros-codecs StatelessEncoder/StatelessDecoder for H.264 via VA-API. - Encoder: CQP rate control, Main profile, macroblock-aligned coding - Decoder: stateless H.264 decode with format-change handling - Reuses shared helpers from vaapi_av1 (GBM/NV12 I/O, device detection) - Registration: video::vaapi::h264_encoder, video::vaapi::h264_decoder - Sample pipelines: oneshot MP4 + dynamic MoQ for VA-API H.264 Supported on Intel (Sandy Bridge+), AMD, and NVIDIA (decode only). Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Add vaapi_h264 module with VaapiH264EncoderNode and VaapiH264DecoderNode using cros-codecs StatelessEncoder/StatelessDecoder for H.264 via VA-API. - Encoder: CQP rate control, Main profile, macroblock-aligned coding - Decoder: stateless H.264 decode with format-change handling - Reuses shared helpers from vaapi_av1 (GBM/NV12 I/O, device detection) - Registration: video::vaapi::h264_encoder, video::vaapi::h264_decoder - Sample pipelines: oneshot MP4 + dynamic MoQ for VA-API H.264 Supported on Intel (Sandy Bridge+), AMD, and NVIDIA (decode only). Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Modern Intel GPUs (Gen 9+ / Skylake onwards) only expose the low-power fixed-function encoder (VAEntrypointEncSliceLP), not the full encoder (VAEntrypointEncSlice). Query the driver for supported entrypoints and auto-select the correct one instead of hardcoding low_power=false. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Replace GBM-backed frame allocation with direct VA surface creation and Image API uploads for both H.264 and AV1 VA-API encoders. The cros-codecs GBM allocator uses GBM_BO_USE_HW_VIDEO_ENCODER, a flag that Mesa's iris driver does not support for NV12 on some hardware (e.g. Intel Tiger Lake with Mesa 23.x), causing 'Error allocating contiguous buffer' failures. By using libva Surface<()> handles instead: - Surfaces are created via vaCreateSurfaces (no GBM needed) - NV12 data is uploaded via the VA Image API (vaCreateImage + vaPutImage) - The encoder's import_picture passthrough accepts Surface<()> directly - Pitches/offsets come from the VA driver's VAImage, not GBM This also adds two new shared helpers in vaapi_av1.rs: - open_va_display(): opens VA display without GBM device - write_nv12_to_va_surface(): uploads NV12/I420 frame data to a VA surface using the Image API, returning driver pitches/offsets Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…upload write_nv12_to_va_surface used truncating integer division (w / 2, h / 2) for chroma plane dimensions, which would corrupt chroma data for frames with odd width or height. VideoLayout::packed uses (width + 1) / 2 for chroma dimensions, so the upload function must match. Changes: - NV12 path: use (h+1)/2 for uv_h, ((w+1)/2)*2 for chroma row bytes - I420 path: use (w+1)/2 for uv_w, (h+1)/2 for uv_h This matches the existing write_nv12_to_mapping (which uses div_ceil) and i420_to_nv12_buffer in nv_av1.rs. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
| let (pitches, offsets) = write_nv12_to_va_surface(&self.display, &surface, frame)?; | ||
|
|
||
| let is_keyframe = metadata.as_ref().and_then(|m| m.keyframe).unwrap_or(false); | ||
| let timestamp = metadata.as_ref().and_then(|m| m.timestamp_us).unwrap_or(self.frame_count); |
There was a problem hiding this comment.
🚩 VA-API encoder uses frame_count as timestamp fallback, not PTS
In VaapiAv1Encoder::encode at vaapi_av1.rs:1071 (and identically in vaapi_h264.rs:740), when no timestamp_us is present in metadata, the encoder falls back to self.frame_count as the timestamp passed to cros-codecs. This means the encoder passes a simple incrementing counter (0, 1, 2, ...) rather than a microsecond-scale timestamp. Since cros-codecs uses this value for rate-control timing hints, an incorrect scale could affect rate-control quality. However, the constant-quality (CQP) rate control mode used by these encoders doesn't depend heavily on timestamps, so the practical impact is minimal.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Acknowledged — this is intentional. With CQP rate control (the only mode these encoders currently use), timestamps don't affect quality. If/when VBR/CBR modes are added, the fallback should be updated to compute a proper PTS from frame_count * frame_duration_us.
For odd-width frames, chroma_row_bytes (e.g. 642 for w=641) is the correct number of bytes per UV row in VideoLayout::packed format. Clamping to .min(w) would drop the last V sample on every UV row. Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Signed-off-by: StreamKit Devin <devin@streamkit.dev> Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Summary
Add four hardware video codec backends behind feature gates:
vaapiVaapiAv1EncoderNodeVaapiAv1DecoderNodevaapiVaapiH264EncoderNodeVaapiH264DecoderNodevulkan_videoVulkanVideoH264EncoderNodeVulkanVideoH264DecoderNodenvcodecNvAv1EncoderNodeNvAv1DecoderNodeKey implementation details
libva::Surface<()>with the VA Image API (vaCreateImage+vaPutImage) to upload NV12 data directly, avoiding theGBM_BO_USE_HW_VIDEO_ENCODERflag that Mesa's iris driver doesn't support for NV12 on Intel Tiger Lake (and possibly other hardware). The decoder path still uses GBM for output frames.VAEntrypointEncSliceLP(low-power fixed-function encoder on modern Intel GPUs) before falling back toVAEntrypointEncSlice.vaapi_av1.rs:open_va_display(),write_nv12_to_va_surface(),open_va_and_gbm(),read_nv12_from_mapping(),write_nv12_to_mapping()— reused by both VA-API encoder modules.EncoderNodeRunner/StandardVideoEncodertrait pattern.Chroma dimension handling
Uses ceiling division
(w+1)/2for chroma plane dimensions, matchingVideoLayout::packedandi420_to_nv12_buffer. This correctly handles odd-dimension frames without truncating the last chroma sample.Review & Testing Checklist for Human
curl -X POST http://localhost:4545/api/v1/process -F pipeline=@samples/pipelines/oneshot/video_vaapi_av1_colorbars.yml > output.mp4write_nv12_to_va_surface— the core data upload path. Verify bounds checks are correct for both NV12 and I420 input formats.vaapi_h264.rs— ensure fallback fromEncSliceLPtoEncSliceis correct.Notes
GbmVideoFrame) for output sinceGBM_BO_USE_DECODEis widely supported. Only the encoder path needed theSurface<()>bypass.frame_countis used as timestamp fallback whentimestamp_usis absent — acceptable for CQP rate control but should be revisited if VBR/CBR modes are added.Link to Devin session: https://staging.itsdev.in/sessions/a6cecab926d64a46ab31002c843a5552
Requested by: @streamer45