libpgs NDJSON Streaming & Encoding Reference

Overview

The libpgs stream command extracts PGS (Presentation Graphic Stream) subtitles from MKV and M2TS containers and outputs structured data as newline-delimited JSON (NDJSON) to stdout. The libpgs encode command reads the same format from stdin and writes .sup files. Together they enable full round-trip workflows: extract, transform with any language, and write back.

Each NDJSON line is a self-contained JSON object. This enables any language to consume and produce PGS data via subprocess pipes — no temp files, no waiting for full extraction, no PGS format knowledge required.

Usage

libpgs stream <file>                                 # All tracks
libpgs stream <file> -t 3                            # Single track
libpgs stream <file> -t 3 -t 5                       # Multiple tracks
libpgs stream <file> --raw-payloads                  # Include base64 raw segment bytes
libpgs stream <file> --start 0:05:00                 # From 5 minutes to end of file
libpgs stream <file> --start 0:05:00 --end 0:10:00   # 5-minute window only
libpgs stream <file> --with-header                   # Prepend manifest header (.sup only)

Timestamps accept HH:MM:SS.ms, MM:SS.ms, SS.ms, or plain seconds (e.g., 300). When --start or --end is specified, libpgs seeks directly to the estimated byte offset — data before the start point is not read. If no display sets fall within the range, the stream outputs the tracks header followed by EOF (no error).

Output is flushed after every line. Closing the pipe (e.g., head -n 10) causes a clean exit.

Protocol

The output consists of up to three types of JSON lines:

header — optional, emitted only for .sup inputs when --with-header is passed. When present, it is the very first line and carries total display-set counts so consumers can show a progress denominator immediately.
tracks — always emitted (first line for containers, second line for .sup).
display_set — one per subtitle event, for the remainder of the stream.

Check the "type" field to distinguish them.

Manifest Header (`.sup` only, opt-in)

When --with-header is passed on a .sup input, libpgs runs a pre-scan of the file and prepends a single header line with total display-set counts. The flag is opt-in because the pre-scan adds an upfront latency before the first display_set line is emitted; consumers that don't need a progress denominator should omit the flag. Containers (MKV, M2TS) ignore the flag — counting there would require a full demux, and MKV already surfaces per-track display_set_count via the tracks line when Tags are present.

{
  "type": "header",
  "total_display_sets": 1823,
  "total_content_display_sets": 1456,
  "total_clear_display_sets": 367
}

Field	Type	Description
`total_display_sets`	number	All display sets (count of END segments).
`total_content_display_sets`	number	PCSes with at least one composition object — visible subtitle frames.
`total_clear_display_sets`	number	PCSes with zero composition objects — "remove from screen" display sets.

total_content_display_sets + total_clear_display_sets == total_display_sets.

The pre-scan reads only 13-byte segment headers (and tiny PCS payloads) while seeking over other payloads — ~1–2% of file bytes, completing in well under a second on multi-GB files.

Track Discovery

The first line (or the second, after the header on .sup inputs) describes all PGS tracks found in the container.

{
  "type": "tracks",
  "tracks": [
    {
      "track_id": 3,
      "language": "en",
      "container": "Matroska",
      "name": "English Subtitles",
      "is_default": true,
      "is_forced": false,
      "display_set_count": 1234,
      "indexed": true
    }
  ]
}

Track fields

Field	Type	Description
`track_id`	`number`	Unique track identifier within the container
`language`	`string \| null`	BCP 47 language code (e.g., `"en"`, `"ja"`). Uses ISO 639-1 (2-letter) where available, ISO 639-2/T (3-letter) otherwise.
`container`	`string`	Source format: `"Matroska"`, `"M2TS"`, `"TransportStream"`, or `"SUP"`
`name`	`string \| null`	Track name from container metadata (MKV TrackName). `null` for M2TS.
`is_default`	`boolean \| null`	Whether this track is flagged as default. `null` for M2TS.
`is_forced`	`boolean \| null`	Whether this track is flagged as forced. `null` for M2TS.
`display_set_count`	`number \| null`	Expected number of display sets (from MKV Tags). `null` if unknown.
`indexed`	`boolean \| null`	Whether the container has a seek index for this track, enabling fast random access. `null` for M2TS.

Display Sets

Each subsequent line represents one display set — a complete subtitle composition event.

PGS background

A PGS display set defines a single screen update. It contains:

A composition that describes what to show and where (screen dimensions, object placements)
Windows — rectangular screen regions where objects are drawn
Palettes — color lookup tables (YCrCbA format, up to 256 entries)
Objects — RLE-compressed bitmap images

Display sets appear in three states:

epoch_start — A completely new display. Contains everything needed to render from scratch.
acquisition_point — A refresh point. Contains full replacement data for all objects. Used for mid-stream joining (e.g., seeking into a video).
normal — An incremental update. Only contains what changed since the last composition. Commonly used to clear the screen (0 composition objects).

Full example

{
  "type": "display_set",
  "track_id": 3,
  "index": 42,
  "pts": 92863980,
  "pts_ms": 1031822.0,
  "composition": {
    "number": 430,
    "state": "epoch_start",
    "video_width": 1920,
    "video_height": 1080,
    "palette_only": false,
    "palette_id": 0,
    "objects": [
      {
        "object_id": 0,
        "window_id": 0,
        "x": 773,
        "y": 108,
        "crop": null
      },
      {
        "object_id": 1,
        "window_id": 1,
        "x": 739,
        "y": 928,
        "crop": null
      }
    ]
  },
  "windows": [
    { "id": 0, "x": 773, "y": 108, "width": 377, "height": 43 },
    { "id": 1, "x": 739, "y": 928, "width": 472, "height": 43 }
  ],
  "palettes": [
    {
      "id": 0,
      "version": 0,
      "entries": [
        { "id": 0, "luminance": 16, "cr": 128, "cb": 128, "alpha": 0 },
        { "id": 1, "luminance": 235, "cr": 128, "cb": 128, "alpha": 255 },
        { "id": 2, "luminance": 16, "cr": 128, "cb": 128, "alpha": 255 }
      ]
    }
  ],
  "objects": [
    {
      "id": 0,
      "version": 0,
      "sequence": "complete",
      "data_length": 8635,
      "width": 377,
      "height": 43,
      "bitmap": "<base64 palette indices, 377*43 = 16211 bytes>"
    },
    {
      "id": 1,
      "version": 0,
      "sequence": "complete",
      "data_length": 5210,
      "width": 472,
      "height": 43,
      "bitmap": "<base64 palette indices, 472*43 = 20296 bytes>"
    }
  ]
}

Display set fields

Field	Type	Description
`type`	`string`	Always `"display_set"`
`track_id`	`number`	Matches a `track_id` from the tracks header
`index`	`number`	0-based sequence number, counted per track
`pts`	`number`	Presentation timestamp in 90 kHz ticks
`pts_ms`	`number`	Presentation timestamp in milliseconds (`pts / 90`)
`composition`	`object \| null`	Composition data (from PCS segment). `null` if payload was malformed.
`windows`	`array`	Window definitions (from WDS segments). Empty array if none present.
`palettes`	`array`	Palette definitions (from PDS segments). Empty array if none present.
`objects`	`array`	Object definitions (from ODS segments). Empty array if none present.

Composition object

The composition field contains the presentation composition — the "control plane" of the display set.

Field	Type	Description
`number`	`number`	Composition number, incremented per graphics update
`state`	`string`	`"epoch_start"`, `"acquisition_point"`, or `"normal"`
`video_width`	`number`	Video frame width in pixels (e.g., 1920)
`video_height`	`number`	Video frame height in pixels (e.g., 1080)
`palette_only`	`boolean`	If `true`, this update only changes the palette — no new objects or positions
`palette_id`	`number`	ID of the palette used for this composition
`objects`	`array`	Placement instructions — where to draw each object on screen

Composition object placements

Each entry in composition.objects is a placement instruction: "draw object X in window Y at position (x, y)."

Field	Type	Description
`object_id`	`number`	References an object in the top-level `objects` array by `id`
`window_id`	`number`	References a window in the `windows` array by `id`
`x`	`number`	Horizontal pixel offset from the top-left corner of the screen
`y`	`number`	Vertical pixel offset from the top-left corner of the screen
`crop`	`object \| null`	Cropping rectangle, or `null` if not cropped

Crop object (when present)

Field	Type	Description
`x`	`number`	Horizontal crop offset within the object
`y`	`number`	Vertical crop offset within the object
`width`	`number`	Crop width in pixels
`height`	`number`	Crop height in pixels

Cropping is used for progressive subtitle reveal (e.g., showing a few words first, then the rest).

Window definitions

Each entry in windows defines a rectangular screen region where objects are drawn.

Field	Type	Description
`id`	`number`	Window ID (referenced by `composition.objects[].window_id`)
`x`	`number`	Horizontal pixel offset from top-left of screen
`y`	`number`	Vertical pixel offset from top-left of screen
`width`	`number`	Window width in pixels
`height`	`number`	Window height in pixels

Palette definitions

Each entry in palettes defines a color lookup table. Object bitmaps reference palette entries by ID to determine pixel color.

Field	Type	Description
`id`	`number`	Palette ID (referenced by `composition.palette_id`)
`version`	`number`	Palette version within the current epoch
`entries`	`array`	Color entries (up to 256)

Palette entry

Colors are in YCrCb color space with alpha transparency.

Field	Type	Description
`id`	`number`	Entry index (0-255). Object bitmap pixels reference this ID.
`luminance`	`number`	Luminance / Y component (0-255)
`cr`	`number`	Chrominance red (0-255)
`cb`	`number`	Chrominance blue (0-255)
`alpha`	`number`	Transparency (0 = fully transparent, 255 = fully opaque)

Color conversion (YCrCb to RGB):

R = luminance + 1.402 * (cr - 128)
G = luminance - 0.344136 * (cb - 128) - 0.714136 * (cr - 128)
B = luminance + 1.772 * (cb - 128)

Object definitions

Each entry in objects defines a subtitle image. The RLE-compressed bitmap data is automatically decoded into a flat buffer of palette indices.

Field	Type	Description
`id`	`number`	Object ID (referenced by `composition.objects[].object_id`)
`version`	`number`	Object version within the current epoch
`sequence`	`string`	`"complete"`, `"reassembled"`, `"first"`, `"last"`, or `"continuation"`
`data_length`	`number`	Total object data length in bytes (includes 4 bytes for width+height)
`width`	`number`	Image width in pixels
`height`	`number`	Image height in pixels
`bitmap`	`string \| null`	Base64-encoded palette indices (1 byte per pixel, row-major). `null` if decoding failed.

Bitmap format

The bitmap field contains the decoded subtitle image as a base64-encoded buffer of palette entry indices. Each byte is an index (0–255) into the palettes[].entries[] array. Pixels are stored in row-major order (left to right, top to bottom). The decoded buffer is exactly width * height bytes.

To render the image, look up each pixel's palette entry to get its YCrCb color and alpha value. libpgs does not perform color conversion — consumers choose their own color space handling.

Object fragmentation

Large objects in the PGS format may be split across multiple ODS segments. libpgs automatically reassembles fragments within each display set and decodes the combined bitmap. Reassembled objects have "sequence": "reassembled" to distinguish them from single-segment "complete" objects.

Value	Meaning
`"complete"`	Single-segment object (most common)
`"reassembled"`	Multiple fragments were combined into one object

With --raw-payloads, the payload field of a reassembled object contains the concatenated raw payloads of all fragments.

Cross-references

The data model uses ID-based cross-references between sections:

composition.objects[].object_id  -->  objects[].id
composition.objects[].window_id  -->  windows[].id
composition.palette_id           -->  palettes[].id

A composition object placement says: "draw the bitmap from objects[id=X] using colors from palettes[id=Y] inside the screen region windows[id=Z] at pixel position (x, y)."

Raw payloads (`--raw-payloads`)

By default, only structured data is output. Pass --raw-payloads to include the raw PGS segment bytes as base64-encoded strings.

When enabled, each item gains a "payload" field:

{
  "composition": { "...": "...", "payload": "<base64>" },
  "windows": [{ "...": "...", "payload": "<base64>" }],
  "palettes": [{ "...": "...", "payload": "<base64>" }],
  "objects": [{ "...": "...", "payload": "<base64>" }]
}

The payload contains the raw segment payload bytes (after the PGS header). For ODS objects, this includes the RLE-compressed bitmap data. Use this if you need to:

Write .sup files
Decode RLE bitmaps yourself
Pass raw data to another PGS-aware tool

If a segment's structured data could not be parsed (malformed payload), the semantic fields will be null but the raw payload is still included.

Common patterns

Get subtitle timing

libpgs stream movie.mkv | jq -r 'select(.type == "display_set") | "\(.pts_ms)ms track=\(.track_id) state=\(.composition.state)"'

Get object positions and sizes

libpgs stream movie.mkv | jq 'select(.type == "display_set") | .composition.objects[] | {object_id, x, y, window_id}'

Count display sets per track

libpgs stream movie.mkv | jq -s '[.[] | select(.type == "display_set")] | group_by(.track_id) | map({track: .[0].track_id, count: length})'

Filter epoch starts only

libpgs stream movie.mkv | jq 'select(.type == "display_set" and .composition.state == "epoch_start")'

Stream a specific time range

# Get subtitles between 1:30:00 and 1:35:00
libpgs stream movie.mkv --start 1:30:00 --end 1:35:00

# Pipe a 5-minute window to a Python consumer
libpgs stream movie.mkv -t 3 --start 0:05:00 --end 0:10:00 | python process.py

Extract palette colors as RGB

libpgs stream movie.mkv | jq 'select(.type == "display_set") | .palettes[].entries[] | select(.alpha > 0)'

Render bitmap to image (Python)

import json, base64, sys
from PIL import Image

for line in sys.stdin:
    msg = json.loads(line)
    if msg["type"] != "display_set":
        continue
    palette = msg["palettes"][0]["entries"] if msg["palettes"] else []
    for obj in msg["objects"]:
        if not obj.get("bitmap"):
            continue
        w, h = obj["width"], obj["height"]
        indices = base64.b64decode(obj["bitmap"])
        img = Image.new("RGBA", (w, h))
        for i, idx in enumerate(indices):
            entry = palette[idx] if idx < len(palette) else {"luminance": 0, "cr": 128, "cb": 128, "alpha": 0}
            y_val, cr, cb, a = entry["luminance"], entry["cr"], entry["cb"], entry["alpha"]
            r = max(0, min(255, int(y_val + 1.402 * (cr - 128))))
            g = max(0, min(255, int(y_val - 0.344136 * (cb - 128) - 0.714136 * (cr - 128))))
            b = max(0, min(255, int(y_val + 1.772 * (cb - 128))))
            img.putpixel((i % w, i // w), (r, g, b, a))
        img.save(f"subtitle_{obj['id']}.png")
        break  # first object only
    break  # first display set only

Encoding (NDJSON → .sup)

The libpgs encode command reads the same NDJSON format that stream produces and writes a .sup file. This closes the round-trip loop — extract, transform with any language, and write back:

libpgs stream movie.mkv | python modify.py | libpgs encode -o modified.sup

Usage

libpgs encode -o <output.sup>       # Reads NDJSON from stdin

Field handling

The encode command consumes display_set lines and ignores tracks lines (and blank lines). Each display set is rebuilt from its structured fields using DisplaySetBuilder, which handles RLE encoding and ODS fragmentation automatically.

Field	Handling
`pts`	Primary timestamp source (90 kHz ticks). Used as-is.
`pts_ms`	Fallback: if `pts` is absent, computes `pts = round(pts_ms * 90)`.
`track_id`	Honored. Multiple track IDs produce separate output files.
`index`	Ignored. Display sets are written in input order.
`composition`	Required. Display sets with `null` composition are skipped with a stderr warning.
`composition.state`	Required. Must be `"epoch_start"`, `"acquisition_point"`, or `"normal"`.
`composition.objects[]`	Honored, including optional `crop` fields.
`windows`	Optional. Passed through to WDS segments when present.
`palettes`	Optional. All entries honored (id, luminance, cr, cb, alpha).
`objects`	Optional. The `bitmap` field (base64 palette indices) is re-encoded to RLE.
`objects[].bitmap`	Required per object. Base64-decoded, then RLE-encoded and fragmented as needed.
`data_length`	Ignored. Recomputed from the re-encoded bitmap.
`sequence`	Ignored. Recomputed based on re-encoded size and fragmentation.

Multi-track output

If all display sets share the same track_id (or none is specified), the output is written directly to the -o path. If multiple track_id values appear, encode splits the output into separate files:

output.sup          → output_track3.sup, output_track5.sup, ...

Round-trip example (Python)

import subprocess, json, base64, sys

# Stream from source
stream = subprocess.Popen(
    ["libpgs", "stream", "movie.mkv"],
    stdout=subprocess.PIPE, text=True
)

# Encode to output
encode = subprocess.Popen(
    ["libpgs", "encode", "-o", "modified.sup"],
    stdin=subprocess.PIPE, text=True
)

for line in stream.stdout:
    msg = json.loads(line)
    if msg["type"] == "display_set":
        # Example: brighten all palette entries
        for palette in msg.get("palettes", []):
            for entry in palette["entries"]:
                entry["luminance"] = min(255, entry["luminance"] + 20)
    encode.stdin.write(json.dumps(msg) + "\n")

encode.stdin.close()
encode.wait()
stream.wait()

Error handling

Errors include 1-based line numbers for easy debugging:

line 42: missing field 'composition'
line 108: 'pts' is not a number
line 203: palette entry missing 'luminance'

Display sets with null composition are skipped with a stderr warning rather than aborting, so partially malformed input can still produce output for the valid display sets.

Notes

Timestamps use a 90 kHz clock (standard for MPEG transport streams). Divide by 90 to get milliseconds, or use the pre-computed pts_ms field.
Palette colors are in YCrCb, not RGB. See the conversion formula above.
Up to 2 objects can be shown simultaneously per composition (e.g., top and bottom subtitle lines), though the PGS spec supports up to 64 per epoch.
Normal-state display sets with 0 composition objects are "clear screen" events — they signal that the previous subtitle should be removed.
Palette-only updates (palette_only: true) change colors without replacing objects. The screen content changes appearance but the bitmap data stays the same.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libpgs NDJSON Streaming & Encoding Reference

Overview

Usage

Protocol

Manifest Header (`.sup` only, opt-in)

Track Discovery

Track fields

Display Sets

PGS background

Full example

Display set fields

Composition object

Composition object placements

Crop object (when present)

Window definitions

Palette definitions

Palette entry

Object definitions

Bitmap format

Object fragmentation

Cross-references

Raw payloads (`--raw-payloads`)

Common patterns

Get subtitle timing

Get object positions and sizes

Count display sets per track

Filter epoch starts only

Stream a specific time range

Extract palette colors as RGB

Render bitmap to image (Python)

Encoding (NDJSON → .sup)

Usage

Field handling

Multi-track output

Round-trip example (Python)

Error handling

Notes

FilesExpand file tree

NDJSON.md

Latest commit

History

NDJSON.md

File metadata and controls

libpgs NDJSON Streaming & Encoding Reference

Overview

Usage

Protocol

Manifest Header (.sup only, opt-in)

Track Discovery

Track fields

Display Sets

PGS background

Full example

Display set fields

Composition object

Composition object placements

Crop object (when present)

Window definitions

Palette definitions

Palette entry

Object definitions

Bitmap format

Object fragmentation

Cross-references

Raw payloads (--raw-payloads)

Common patterns

Get subtitle timing

Get object positions and sizes

Count display sets per track

Filter epoch starts only

Stream a specific time range

Extract palette colors as RGB

Render bitmap to image (Python)

Encoding (NDJSON → .sup)

Usage

Field handling

Multi-track output

Round-trip example (Python)

Error handling

Notes

Manifest Header (`.sup` only, opt-in)

Raw payloads (`--raw-payloads`)