Streaming image content analyzer for adaptive codec pipelines. One pass over a
zenpixels::PixelSlice extracts the numeric features that decision trees,
selectors, and per-image encoder configurators consume — variance, edge density,
chroma sharpness, palette population, DCT energy, alpha statistics, and (behind
opt-in cargo features) classifier-style content-class likelihoods plus
source-direct HDR / wide-gamut / bit-depth signals that codecs use to detect
when a descriptor over-promises and the actual pixel content is encodable in
something smaller.
[dependencies]
zenanalyze = "0.1"
# Opt in to source-direct HDR / wide-gamut / descriptor-gap signals:
# zenanalyze = { version = "0.1", features = ["experimental"] }
# Opt in to classifier-style composite scores (text / screen / natural /
# line-art likelihoods). The raw signals these consume are stable; the
# composite coefficients are calibration-driven and may drift in 0.1.x:
# zenanalyze = { version = "0.1", features = ["composites"] }| feature | what it gates | stability |
|---|---|---|
| (default) | Stable raw signals: variance, edge density, chroma sharpness, DCT energy, alpha, palette, distinct-color bins, etc. | Numeric drift in 0.1.x bounded by the threshold contract; signatures frozen |
experimental |
Research-stage signals: PatchFraction, AqMapMean/Std, NoiseFloorY/UV, GradientFraction, source-direct HDR / wide-gamut / bit-depth | Metric definition or scale may change; opt in only if you re-validate per patch |
composites |
Classifier-style scores: TextLikelihood, ScreenContentLikelihood, NaturalLikelihood, LineArtScore | Hand-tuned weighted combinators; the combination coefficients drift as the corpus calibration matures. The raw signals they consume are stable. |
Modern adaptive codecs (zenjpeg, zenwebp, zenpng, zenavif, zenjxl) all want the same handful of cheap content features to pick a quality knob — is this a photograph or a screenshot? does it have alpha? is it palette-friendly? is the HDR flag stale? does this Rec.2020 file actually use the wider gamut? — before they run their expensive encoder. Re-deriving that from scratch in each codec means three copies of the same Tier 1 SIMD scan, three slightly different threshold contracts, and three independent oracle retrains every time the math moves.
zenanalyze is the shared single-pass scanner: codecs ask for the feature set they care about, the orchestrator unions the requests, and one walk over the image returns every signal. Tiers gate themselves out when their outputs aren't needed.
use zenanalyze::{
analyze_features,
feature::{AnalysisFeature, AnalysisQuery, FeatureSet},
};
const JPEG_FEATURES: FeatureSet = FeatureSet::new()
.with(AnalysisFeature::Variance)
.with(AnalysisFeature::EdgeDensity)
.with(AnalysisFeature::HighFreqEnergyRatio);
const WEBP_FEATURES: FeatureSet = FeatureSet::new()
.with(AnalysisFeature::Variance)
.with(AnalysisFeature::AlphaPresent)
.with(AnalysisFeature::AlphaUsedFraction);
let needed = JPEG_FEATURES.union(WEBP_FEATURES);
let results = analyze_features(slice, &AnalysisQuery::new(needed))?;
let variance = results.get_f32(AnalysisFeature::Variance);
let alpha = results.get(AnalysisFeature::AlphaPresent)
.and_then(|v| v.as_bool());AnalysisFeature is #[non_exhaustive] with stable u16 discriminants —
retired ids stay reserved, new ids are sequential, the wire format never breaks.
FeatureSet has full const fn set math (union, intersect, difference)
so per-codec presets compose at compile time. AnalysisQuery is intentionally
opaque: sampling budgets are crate invariants, not per-call knobs.
For a packed RGB8 buffer the convenience entry skips the PixelSlice ceremony:
// Panicking — for known-good inputs (freshly decoded buffers).
let r = zenanalyze::analyze_features_rgb8(&rgb_bytes, w, h, &q);
// Fallible — for untrusted input. Returns AnalyzeError::InvalidInput on
// length / stride mismatch and AnalyzeError::OutOfMemory on (future)
// fallible-allocation paths.
let r = zenanalyze::try_analyze_features_rgb8(&rgb_bytes, w, h, &q)?;Codecs read the source PixelDescriptor directly off the result for encode-side
metadata decisions:
let descriptor = results.source_descriptor();
match descriptor.primaries {
zenpixels::ColorPrimaries::Bt709 => /* sRGB-class */,
zenpixels::ColorPrimaries::DisplayP3 => /* P3 wide-gamut */,
zenpixels::ColorPrimaries::Bt2020 => /* Rec.2020 / HDR */,
zenpixels::ColorPrimaries::AdobeRgb => /* AdobeRGB */,
_ => /* unknown / future */,
}FeatureSet::iter() walks the contained features in AnalysisFeature::id()
order — convenient for sidecars, harnesses, and Python fitters that need to
enumerate the surface without hand-listing variants.
Default features (every codec gets these):
| Feature | Type | Description |
|---|---|---|
Variance |
f32 | Luma variance on the BT.601 [0, 255] scale. |
EdgeDensity |
f32 | Fraction of sampled interior pixels with |∇L| > 20. |
ChromaComplexity |
f32 | √(Var(Cb) + Var(Cr)) over sampled pixels. |
CbSharpness / CrSharpness |
f32 | Mean per-axis chroma gradient. |
Uniformity |
f32 | Fraction of 8×8 blocks with luma variance < 25. |
FlatColorBlockRatio |
f32 | Fraction of 8×8 blocks with R/G/B ranges all ≤ 4. |
DistinctColorBins |
u32 | Distinct 5-bit-per-channel RGB bins observed. |
Cb*Sharpness / Cr*Sharpness (Horiz/Vert/Peak) |
f32 | Per-channel per-axis chroma sharpness. |
HighFreqEnergyRatio |
f32 | DCT AC energy ratio over sampled 8×8 luma blocks. |
LumaHistogramEntropy |
f32 | Shannon entropy of a 32-bin luma histogram (bits). |
AlphaPresent / AlphaUsedFraction / AlphaBimodalScore |
bool / f32 / f32 | Straight-alpha statistics. |
Behind the composites cargo feature (calibration-driven, may drift in 0.1.x):
| Feature | Type | Description |
|---|---|---|
TextLikelihood / ScreenContentLikelihood / NaturalLikelihood |
f32 | Soft [0, 1] content-class scores. Combine raw signals (variance, edge density, distinct-color bins, etc.) via hand-tuned weights. |
LineArtScore |
f32 | Soft [0, 1] rendered-line-art score from luma histogram bimodality + entropy. |
Behind the experimental cargo feature, organised by what they drive:
| Feature | Drives |
|---|---|
GrayscaleScore |
zenjpeg ColorMode::Grayscale, AVIF Yuv400, png/jxl gray paths (~30–40% smaller for B&W) |
AqMapMean / AqMapStd |
zenjpeg hybrid trellis λ, webp segments + sns_strength, avif vaq |
NoiseFloorY / NoiseFloorUV |
zenjpeg pre_blur, jxl noise/denoise, webp sns_strength, zenrav1e film_grain |
LineArtScore (behind composites) |
webp Preset::Drawing, jxl splines / patches, png palette preference |
GradientFraction |
jxl with_force_strategy (DCT16 / DCT32 selection), zenrav1e deblock strength |
SkinToneFraction |
photo-vs-other dispatch (one-direction signal, AUC 0.80) — webp Preset::Photo, jxl perceptual presets, jpeg chroma-aware quant |
EdgeSlopeStdev |
screen-vs-photo dispatch (AUC 0.84, second only to PatchFraction) — webp Preset::Drawing vs Photo, jxl modular vs VarDCT |
These read source samples without going through RowConverter, since
RowConverter doesn't tonemap — a 4000-nit PQ source and a 100-nit-clipped
SDR source would otherwise produce byte-identical RGB8 streams.
| Feature | Drives |
|---|---|
PeakLuminanceNits / P99LuminanceNits |
AVIF clli, JXL intensity_target, HDR encoder peak |
HdrHeadroomStops / HdrPixelFraction |
HDR vs SDR encode-mode selection |
WideGamutPeak / WideGamutFraction |
"Linear value > 1.0" detection |
GamutCoverageSrgb / GamutCoverageP3 |
Descriptor-gap signal — if a Rec.2020 source's pixels all live in the sRGB sub-gamut, encode at sRGB primaries and save the wide-gamut metadata + encoder modes |
EffectiveBitDepth |
AVIF / JXL bit_depth, png near_lossless_bits (catches u8-promoted u16) |
HdrPresent |
Composite "transfer claims HDR AND pixels are actually bright" — catches stale HDR flags |
Colourfulness, LaplacianVariance, VarianceSpread, PaletteDensity,
DctCompressibilityY, DctCompressibilityUV, PatchFraction,
PaletteLog2Size, PaletteFitsIn256. Numeric scale or definition may
change in 0.1.x patches.
The analyzer's job is to spot the gap between what the descriptor promises and what the data actually carries, so encoders don't bloat files paying for capacity the source doesn't need.
| Gap | Signal |
|---|---|
| RGB declared, content is grayscale | GrayscaleScore ≥ 0.99 |
| Wider primaries declared, content fits sRGB | GamutCoverageSrgb ≥ 0.99 |
| Rec.2020 declared, content fits Display P3 | GamutCoverageP3 ≥ 0.99 |
| HDR transfer declared, content is SDR | HdrPresent == false |
| u16 declared, content is u8-promoted | EffectiveBitDepth == 8 |
| RGBA declared, alpha is constant 1.0 | AlphaUsedFraction == 0 |
| Standard 8×8 transforms, content is smooth | GradientFraction ≥ 0.5 (use larger DCTs) |
| HDR flag set, peak is dim | PeakLuminanceNits < 200 && HdrPresent (mis-tagged HDR) |
Each one is a place where a codec orchestrator can downcast metadata + encoder modes before encoding, saving real bytes on real corpora.
analyze_features accepts every layout zenpixels-convert::RowConverter can
ingest — RGB8 / RGBA8 / BGRA8, RGB16 / RGBA16, RGB-F32 / RGBA-F32 (linear, sRGB,
PQ, HLG), grayscale variants, all primaries (sRGB / Display P3 / Rec.2020 /
AdobeRGB). One entry, no opt-in step. The principle: per-image codec decisions
don't usually break on a few LSBs of luma drift, they break on the analyzer
refusing to run.
u8-promotion invariance is locked by tests. An RGB8 image promoted to
RGB16 via the standard u8 * 257 doubling, or to RGBF32 via u8 / 255.0,
produces bit-identical features to the original RGB8 source. Codecs that
upgrade from u8 to wider formats internally don't see different analyzer
answers. (Verified by garb's exact-identity narrowing
(u16 * 255 + 32768) >> 16 for u16 = u8 * 257.)
Wide gamut adapts the values, not the API. RGB8 with Display P3 / Rec.2020
/ AdobeRGB primaries passes through the zero-copy Native row path with its
bytes intact. The standard tiers pick the right luma matrix per source
primaries: BT.601 weights for sRGB / BT.709 (preserving the trained-threshold
baseline — coefficient's existing thresholds were calibrated against this
matrix on sRGB content), BT.2020 weights for Rec.2020 sources, the Y row of
each primary set's RGB→XYZ matrix for Display P3 / AdobeRGB. Fixed-point
integer-luma scales are normalised to the same sum-220 libwebp baseline so a
pure-white pixel hits the same histogram bin regardless of source primaries —
what differs is the per-channel weight that lands it there. No conversion, no
clipping, just the right matrix. See src/luma.rs.
HDR f32 / linear inputs. Standard tiers see what an SDR display would
show — RowConverter clips out-of-[0, 1] linear values, applies the sRGB
OETF, and narrows to u8. That's the legitimate input for SDR-calibrated
thresholds; tonemapping a 4 000-nit highlight into a visible mid-tone
before measuring "high-frequency-energy ratio" would just lie about what's
there. The above-clip signal lives in tier_depth, which reads the source
samples directly via PixelSlice::row (bypassing RowConverter entirely)
and decodes through the descriptor's transfer function — sRGB / BT.709 /
Gamma 2.2 / Linear / PQ / HLG — to linear nits. Two views of the same
source: the SDR-display view for trained thresholds, the source-direct view
for HDR / wide-gamut signal.
The tier_depth reference convention is stable across 0.1.x:
| Transfer | Linear 1.0 maps to | Convention |
|---|---|---|
Srgb / Bt709 / Gamma22 / Linear |
80 nits | sRGB display reference (IEC 61966-2-1) |
Pq |
10 000 nits | SMPTE ST 2084 absolute |
Hlg |
1 000 nits | nominal HLG broadcast |
The standard tiers' threshold contract is calibrated on display-space RGB8 bytes; the depth tier surfaces the additional metadata-gap and HDR signals that the RGB8 narrowing destroys.
#[non_exhaustive]
pub enum AnalyzeError {
Convert(String), // RowConverter setup failed
InvalidInput(String), // user-supplied bad layout / length
OutOfMemory { bytes_requested: Option<usize> }, // future fallible-alloc path
Internal(String), // unexpected
}Production code handling untrusted images should pattern-match on
InvalidInput / OutOfMemory explicitly. Today every internal allocation is
infallible (so OutOfMemory is reserved, never returned by current builds);
the variant is part of the public surface so a future minor that flips
internals to Vec::try_reserve doesn't break anyone's match.
Five passes, each gated by what the requested FeatureSet actually needs:
| Pass | Iterates over | Reads | Cost (4 MP) | Drives |
|---|---|---|---|---|
| Tier 1 | Stripe-sampled rows | RGB8 | ~1 ms | luma stats, edges, chroma, uniformity, grayscale |
| Tier 2 | 3-row sliding window | RGB8 | ~2 ms | per-axis Cb/Cr sharpness |
| Tier 3 | Sampled 8×8 DCT blocks | RGB8 | ~3 ms | DCT energy, entropy, AQ map, noise floor, line-art, gradient, patch fraction |
| Palette | Full image | RGB8 | ~1 ms | distinct color bins |
| Alpha | Stride-sampled rows | Source bytes | ~0.3 ms | alpha presence / used / bimodal |
tier_depth (experimental) |
Stride-sampled rows | Source bytes | ~0.5 ms HDR, ~0 SDR-fast-path | HDR / wide-gamut / bit-depth / gamut-coverage |
Tier 1/2/3 + Palette read RGB8 via RowStream, which has three internal paths:
- Native (zero-copy) — RGB8-byte-layout-compatible inputs. Sub-slice straight from the source.
- StripAlpha8 (zero RowConverter, scratch-only) — RGBA8 / BGRA8 / Rgbx8 / Bgrx8. Tight strip-and-maybe-swap into the row scratch. Skips the RowConverter alloc + plan + per-row CPU work.
- Convert — everything else (16-bit, f32, grayscale, CMYK, …) goes through
RowConverterrow-by-row.
Alpha and tier_depth always read source bytes directly, never through
RowStream — a load-bearing detail for HDR (RowConverter doesn't tonemap;
its narrowing clips PQ / HLG into sRGB-display).
Release build, AVX2, no target-cpu=native, full FeatureSet::SUPPORTED:
| Input | 4 MP | RowStream path |
|---|---|---|
| RGB8 / Rgbx8 with sRGB / wide-gamut primaries | 9.5 ms | Native (zero-copy slice subindex) |
| RGBA8 | 10.9 ms | StripAlpha8 (garb SIMD strip) |
| BGRA8 | 12.0 ms | StripAlpha8 (garb SIMD strip + swap) |
| RGB16 | 24.7 ms | Convert (zenpixels-convert RowConverter) |
| RGBA16 | 28.6 ms | Convert (zenpixels-convert handles strip + narrow) |
The RGBA8 strip uses garb::bytes::rgba_to_rgb / bgra_to_rgb
(SIMD-dispatched via archmage::incant!) — measured 7× faster than
the previous in-tree scalar strip on a 2048-px row, dropping the
RGBA8 overhead vs RGB8 baseline from +5.8 ms to +1.4 ms. The 16-bit
input paths cost more because RowConverter does transfer-function-
aware narrowing — that's the correct tool for genuinely
heterogeneous input.
Per-call working-set memory is ~265 KB across ~7 allocations (largest single
chunk is the Tier 1 stripe scratch at 9 × width × 3 = 108 KB at 4 K). All
allocations are infallible today; the OutOfMemory variant exists so a
future minor can flip them without API breakage.
Picked on a 219-image labeled corpus from coefficient
(benchmarks/classifier-eval/labels.tsv, spanning cid22-train/val,
clic2025-1024, gb82, gb82-sc, imageflow, kadid10k, qoi-benchmark). 174
photo, 36 screen, 9 illustration, 44 marked synthetic. F1 / AUC are
for binary screen-vs-photo classification.
For codec-orchestrator dispatch ("is this a screen or a photo?"):
| signal | threshold | F1 | AUC | notes |
|---|---|---|---|---|
line_art_score > 0 |
any nonzero | 0.978 | 0.750 | near-deterministic — line art ⇒ screen-like |
natural_likelihood >= 0.06 |
photo detection | 0.924 | 0.814 | high precision photo classifier |
patch_fraction >= 0.27 |
screen detection | 0.769 | 0.880 | strongest single screen discriminator |
edge_slope_stdev >= 35 |
screen detection | — | 0.844 | second-strongest screen discriminator — photos cluster 15–32, screens 32–58 |
screen_content_likelihood >= 0.60 |
screen detection | 0.750 | 0.831 | derived from flat blocks + palette + chroma |
flat_color_block_ratio >= 0.53 |
screen detection | 0.750 | 0.838 | raw — same F1 as the derived _likelihood |
skin_tone_fraction >= 0.05 |
photo detection | 0.824 | 0.799 | one-direction (presence ⇒ photo); pigmentation-invariant Chai-Ngan YCbCr |
text_likelihood >= 0.30 |
text detection | 0.682 | 0.774 | weaker but real |
grayscale_score >= 0.99 |
grayscale dispatch | — | — | encoder gap-filler, near-binary on real grayscale |
Note: the three *_likelihood features empirically saturate at
~0.70 (not 1.0) on real content, because each is a weighted sum of
clamped sub-components that don't simultaneously max on real images.
Don't threshold them at >= 0.8 — nothing will fire. Operating
points are in the 0.3–0.6 band. The exact corpus maxes are:
text_likelihoodmax 0.71screen_content_likelihoodmax 0.70natural_likelihoodmax 0.69
For descriptor-gap detection the thresholds are content-physical
(see the "Descriptor-gap detection" table above): GrayscaleScore >= 0.99,
GamutCoverageSrgb >= 0.99, etc. Those are spec-driven, not corpus-fit.
The full per-class distributions, ROC-AUC ranking for every feature,
Spearman redundancy matrix, and the recalibration findings that were
considered and rejected are recorded in
docs/calibration-corpus-2026-04-27.md.
That file is the pre-0.1.0-ship empirical baseline; subsequent 0.1.x
patches that drift numerics should compare against it.
Numeric thresholds and normalisation scales drift during 0.1.x. Downstream consumers that compile-in fitted models (oracle decision trees, content selectors) must pin to a specific zenanalyze patch version and re-validate when they bump it.
There is no 0.2.x. Every change in 0.1.x is additive — new variants on
#[non_exhaustive] enums, new parallel functions, never a signature change to
a shipped item. See CLAUDE.md.
130+ tests covering math invariants on synthetic inputs (solid colours,
horizontal bands, uniform luma distribution, palette-locked images, two-tone
line drawings, smooth gradients, pure noise), the full 16-arm dispatch matrix,
every supported pixel format (3 channel-types × 6 transfers × 4 primaries × 2
alpha = 144 sanity-matrix combinations), tier sizes from 1×1 to 4096×4096,
deterministic-input bit-equality (catches accumulator non-determinism),
u8-promotion bit-equality across u16 / f32 sources, HDR-survival (PQ ~1000-nit
content preserved end-to-end where standard tiers would have clipped to SDR),
gamut-coverage projections (saturated Rec.2020 green correctly fails sRGB
coverage), and AnalyzeError Display / source coverage. Math locks use
absolute tolerances chosen to clear ULP-level f32 noise from SIMD tree
reductions but catch any genuine architecture divergence.
Note on coverage tooling: the SIMD kernels in
tier1.rs,palette.rs,tier2_chroma.rs, andtier3.rsuse#[magetypes(... v4, v3, neon, wasm128, scalar)]to generate one source-level monomorphisation per architecture tier. At runtime archmage'sincant!dispatches to whichever the CPU supports; the other variants stay compiled but unreachable. Line- coverage tools count each variant separately, so the raw percentage on these files looks ≈30 % on x86_64. Real coverage of executable code paths (counted on the dispatched variant only) is ≥95 % across every module.
The repository hosts three sibling pieces that the codecs in
imazen/zenjpeg,
imazen/zenwebp, etc. compose with
zenanalyze:
| Path | Identity | Status |
|---|---|---|
zenpredict/ |
Rust runtime — zero-copy MLP loader (ZNPR v2 binary format), forward pass, masked argmin, typed metadata, OOD bounds, two-shot rescue policy. Used by codec pickers (zenjpeg/zenwebp/zenavif/zenjxl) and by zensim V0.4 perceptual scoring |
Crate, 0.1.x |
zenpicker/ |
Codec-family meta-picker — given features + target quality + an allowed-family mask, picks {jpeg, webp, jxl, avif, png, gif}; per-codec pickers then resolve the family into a concrete encoder config. Wraps zenpredict::Predictor |
Crate, 0.1.x |
zentrain/ |
Python training pipeline — pareto sweep harness, teacher fit, distill, ablation, holdout probes, safety reports, bake to ZNPR v2 (via tools/bake_picker.py → zenpredict-bake). Produces both meta-picker and per-codec bakes |
Tooling, in-repo |
The runtime crates (zenpredict, zenpicker) and the trainer (zentrain) version
independently; the binary format (ZNPR v2) is the contract between them.
See MIGRATION.md for the path from the previous
(unpublished) zenpicker Rust shell to the current layout.
Cross-codec defaults + data discipline: read
zentrain/PRINCIPLES.md before adopting or
re-baking a picker for any codec (zenjpeg / zenwebp / zenavif /
zenjxl / zenpng / zengif / zenpicker / zensim). It's the source of
truth for what's invariant — corpus shape, argmin objectives, time-
budget patterns, multi-metric bakes, OOD / reach gates, validation
gates that block release.
AGPL-3.0-only OR LicenseRef-Imazen-Commercial. Commercial licensing available
from imazen — contact lilith@imazen.io.