Skip to content

[Discussion] Video Return Format and Transmission Overhead in Inference Performance #289

@wu6u3tw

Description

@wu6u3tw

Motivation

Background

Adding video generation workloads to the endpoints framework, we need to decide how video transmission between client and server should be handled, and whether that transmission/compression cost should be included in inference performance measurement.
Metrics: throughput
Single stream: Videos-per-second
~220 MB/video as raw bytes tensor (or 5-10MB mp4 videos if compressed)
Concurrency = 1 (GB200x4) --> need to scale up to GB200/GB300x72
Output is a blob video, output do not do frame by frame output.

Problem Statement

The endpoints framework requires video transmission between client and server.
Several design questions:

  • Does transmission need to be counted into inference perf? i.e. click to download/play video or pass video directly?

248 videos for Accuracy Mode
50 videos for Performance Mode (But we currently collect perf with 100 videos)

Key Questions

  1. Does video transmission count as inference performance?
    - Option A: Measure only the model inference time; transmission/compression is out-of-scope
    - Option B: Include transmission in the latency/throughput measurement (click-to-download or click to play with video streaming)
    - Pass path/hash only or a blob video file.
  2. What is the API response-complete signal?
    - When is a request considered "done" — when the model finishes generating, or when the encoded video is available for download?
  3. Does MP4 compression count in inference perf?
    - MP4 is required for VBench accuracy scoring
    - Compression could be folded into the accuracy phase (download → compress → score), keeping it out of the performance-phase critical path
  4. Hardware path for encoding:
    - Is there a GPU-accelerated path for encoding/decoding (e.g., on B200)?
    - Could compression be offloaded to a separate hardware unit?

Proposed Solution

Check above.

Alternatives Considered

No response

Additional Context

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions