Skip to content

imazen/zenanalyze

Repository files navigation

zenanalyze CI crates.io lib.rs docs.rs license

Streaming image content analyzer for adaptive codec pipelines. One pass over a zenpixels::PixelSlice extracts the numeric features that decision trees, selectors, and per-image encoder configurators consume — variance, edge density, chroma sharpness, palette population, DCT energy, alpha statistics, and (behind opt-in cargo features) classifier-style content-class likelihoods plus source-direct HDR / wide-gamut / bit-depth signals that codecs use to detect when a descriptor over-promises and the actual pixel content is encodable in something smaller.

[dependencies]
zenanalyze = "0.1"
# Opt in to source-direct HDR / wide-gamut / descriptor-gap signals:
# zenanalyze = { version = "0.1", features = ["experimental"] }
# Opt in to classifier-style composite scores (text / screen / natural /
# line-art likelihoods). The raw signals these consume are stable; the
# composite coefficients are calibration-driven and may drift in 0.1.x:
# zenanalyze = { version = "0.1", features = ["composites"] }

Cargo features

feature what it gates stability
(default) Stable raw signals: variance, edge density, chroma sharpness, DCT energy, alpha, palette, distinct-color bins, etc. Numeric drift in 0.1.x bounded by the threshold contract; signatures frozen
experimental Research-stage signals: PatchFraction, AqMapMean/Std, NoiseFloorY/UV, GradientFraction, source-direct HDR / wide-gamut / bit-depth Metric definition or scale may change; opt in only if you re-validate per patch
composites Classifier-style scores: TextLikelihood, ScreenContentLikelihood, NaturalLikelihood, LineArtScore Hand-tuned weighted combinators; the combination coefficients drift as the corpus calibration matures. The raw signals they consume are stable.

Why

Modern adaptive codecs (zenjpeg, zenwebp, zenpng, zenavif, zenjxl) all want the same handful of cheap content features to pick a quality knob — is this a photograph or a screenshot? does it have alpha? is it palette-friendly? is the HDR flag stale? does this Rec.2020 file actually use the wider gamut? — before they run their expensive encoder. Re-deriving that from scratch in each codec means three copies of the same Tier 1 SIMD scan, three slightly different threshold contracts, and three independent oracle retrains every time the math moves.

zenanalyze is the shared single-pass scanner: codecs ask for the feature set they care about, the orchestrator unions the requests, and one walk over the image returns every signal. Tiers gate themselves out when their outputs aren't needed.

Quick start

use zenanalyze::{
    analyze_features,
    feature::{AnalysisFeature, AnalysisQuery, FeatureSet},
};

const JPEG_FEATURES: FeatureSet = FeatureSet::new()
    .with(AnalysisFeature::Variance)
    .with(AnalysisFeature::EdgeDensity)
    .with(AnalysisFeature::HighFreqEnergyRatio);

const WEBP_FEATURES: FeatureSet = FeatureSet::new()
    .with(AnalysisFeature::Variance)
    .with(AnalysisFeature::AlphaPresent)
    .with(AnalysisFeature::AlphaUsedFraction);

let needed = JPEG_FEATURES.union(WEBP_FEATURES);
let results = analyze_features(slice, &AnalysisQuery::new(needed))?;

let variance = results.get_f32(AnalysisFeature::Variance);
let alpha    = results.get(AnalysisFeature::AlphaPresent)
                      .and_then(|v| v.as_bool());

AnalysisFeature is #[non_exhaustive] with stable u16 discriminants — retired ids stay reserved, new ids are sequential, the wire format never breaks. FeatureSet has full const fn set math (union, intersect, difference) so per-codec presets compose at compile time. AnalysisQuery is intentionally opaque: sampling budgets are crate invariants, not per-call knobs.

For a packed RGB8 buffer the convenience entry skips the PixelSlice ceremony:

// Panicking — for known-good inputs (freshly decoded buffers).
let r = zenanalyze::analyze_features_rgb8(&rgb_bytes, w, h, &q);

// Fallible — for untrusted input. Returns AnalyzeError::InvalidInput on
// length / stride mismatch and AnalyzeError::OutOfMemory on (future)
// fallible-allocation paths.
let r = zenanalyze::try_analyze_features_rgb8(&rgb_bytes, w, h, &q)?;

Codecs read the source PixelDescriptor directly off the result for encode-side metadata decisions:

let descriptor = results.source_descriptor();
match descriptor.primaries {
    zenpixels::ColorPrimaries::Bt709     => /* sRGB-class */,
    zenpixels::ColorPrimaries::DisplayP3 => /* P3 wide-gamut */,
    zenpixels::ColorPrimaries::Bt2020    => /* Rec.2020 / HDR */,
    zenpixels::ColorPrimaries::AdobeRgb  => /* AdobeRGB */,
    _ => /* unknown / future */,
}

FeatureSet::iter() walks the contained features in AnalysisFeature::id() order — convenient for sidecars, harnesses, and Python fitters that need to enumerate the surface without hand-listing variants.

What it computes

Default features (every codec gets these):

Feature Type Description
Variance f32 Luma variance on the BT.601 [0, 255] scale.
EdgeDensity f32 Fraction of sampled interior pixels with |∇L| > 20.
ChromaComplexity f32 √(Var(Cb) + Var(Cr)) over sampled pixels.
CbSharpness / CrSharpness f32 Mean per-axis chroma gradient.
Uniformity f32 Fraction of 8×8 blocks with luma variance < 25.
FlatColorBlockRatio f32 Fraction of 8×8 blocks with R/G/B ranges all ≤ 4.
DistinctColorBins u32 Distinct 5-bit-per-channel RGB bins observed.
Cb*Sharpness / Cr*Sharpness (Horiz/Vert/Peak) f32 Per-channel per-axis chroma sharpness.
HighFreqEnergyRatio f32 DCT AC energy ratio over sampled 8×8 luma blocks.
LumaHistogramEntropy f32 Shannon entropy of a 32-bin luma histogram (bits).
AlphaPresent / AlphaUsedFraction / AlphaBimodalScore bool / f32 / f32 Straight-alpha statistics.

Behind the composites cargo feature (calibration-driven, may drift in 0.1.x):

Feature Type Description
TextLikelihood / ScreenContentLikelihood / NaturalLikelihood f32 Soft [0, 1] content-class scores. Combine raw signals (variance, edge density, distinct-color bins, etc.) via hand-tuned weights.
LineArtScore f32 Soft [0, 1] rendered-line-art score from luma histogram bimodality + entropy.

Behind the experimental cargo feature, organised by what they drive:

Codec-orchestrator gap-fillers

Feature Drives
GrayscaleScore zenjpeg ColorMode::Grayscale, AVIF Yuv400, png/jxl gray paths (~30–40% smaller for B&W)
AqMapMean / AqMapStd zenjpeg hybrid trellis λ, webp segments + sns_strength, avif vaq
NoiseFloorY / NoiseFloorUV zenjpeg pre_blur, jxl noise/denoise, webp sns_strength, zenrav1e film_grain
LineArtScore (behind composites) webp Preset::Drawing, jxl splines / patches, png palette preference
GradientFraction jxl with_force_strategy (DCT16 / DCT32 selection), zenrav1e deblock strength
SkinToneFraction photo-vs-other dispatch (one-direction signal, AUC 0.80) — webp Preset::Photo, jxl perceptual presets, jpeg chroma-aware quant
EdgeSlopeStdev screen-vs-photo dispatch (AUC 0.84, second only to PatchFraction) — webp Preset::Drawing vs Photo, jxl modular vs VarDCT

Source-direct HDR / wide-gamut / bit-depth tier

These read source samples without going through RowConverter, since RowConverter doesn't tonemap — a 4000-nit PQ source and a 100-nit-clipped SDR source would otherwise produce byte-identical RGB8 streams.

Feature Drives
PeakLuminanceNits / P99LuminanceNits AVIF clli, JXL intensity_target, HDR encoder peak
HdrHeadroomStops / HdrPixelFraction HDR vs SDR encode-mode selection
WideGamutPeak / WideGamutFraction "Linear value > 1.0" detection
GamutCoverageSrgb / GamutCoverageP3 Descriptor-gap signal — if a Rec.2020 source's pixels all live in the sRGB sub-gamut, encode at sRGB primaries and save the wide-gamut metadata + encoder modes
EffectiveBitDepth AVIF / JXL bit_depth, png near_lossless_bits (catches u8-promoted u16)
HdrPresent Composite "transfer claims HDR AND pixels are actually bright" — catches stale HDR flags

Research-stage

Colourfulness, LaplacianVariance, VarianceSpread, PaletteDensity, DctCompressibilityY, DctCompressibilityUV, PatchFraction, PaletteLog2Size, PaletteFitsIn256. Numeric scale or definition may change in 0.1.x patches.

Descriptor-gap detection

The analyzer's job is to spot the gap between what the descriptor promises and what the data actually carries, so encoders don't bloat files paying for capacity the source doesn't need.

Gap Signal
RGB declared, content is grayscale GrayscaleScore ≥ 0.99
Wider primaries declared, content fits sRGB GamutCoverageSrgb ≥ 0.99
Rec.2020 declared, content fits Display P3 GamutCoverageP3 ≥ 0.99
HDR transfer declared, content is SDR HdrPresent == false
u16 declared, content is u8-promoted EffectiveBitDepth == 8
RGBA declared, alpha is constant 1.0 AlphaUsedFraction == 0
Standard 8×8 transforms, content is smooth GradientFraction ≥ 0.5 (use larger DCTs)
HDR flag set, peak is dim PeakLuminanceNits < 200 && HdrPresent (mis-tagged HDR)

Each one is a place where a codec orchestrator can downcast metadata + encoder modes before encoding, saving real bytes on real corpora.

Wide gamut, HDR, and bit depth

analyze_features accepts every layout zenpixels-convert::RowConverter can ingest — RGB8 / RGBA8 / BGRA8, RGB16 / RGBA16, RGB-F32 / RGBA-F32 (linear, sRGB, PQ, HLG), grayscale variants, all primaries (sRGB / Display P3 / Rec.2020 / AdobeRGB). One entry, no opt-in step. The principle: per-image codec decisions don't usually break on a few LSBs of luma drift, they break on the analyzer refusing to run.

u8-promotion invariance is locked by tests. An RGB8 image promoted to RGB16 via the standard u8 * 257 doubling, or to RGBF32 via u8 / 255.0, produces bit-identical features to the original RGB8 source. Codecs that upgrade from u8 to wider formats internally don't see different analyzer answers. (Verified by garb's exact-identity narrowing (u16 * 255 + 32768) >> 16 for u16 = u8 * 257.)

Wide gamut adapts the values, not the API. RGB8 with Display P3 / Rec.2020 / AdobeRGB primaries passes through the zero-copy Native row path with its bytes intact. The standard tiers pick the right luma matrix per source primaries: BT.601 weights for sRGB / BT.709 (preserving the trained-threshold baseline — coefficient's existing thresholds were calibrated against this matrix on sRGB content), BT.2020 weights for Rec.2020 sources, the Y row of each primary set's RGB→XYZ matrix for Display P3 / AdobeRGB. Fixed-point integer-luma scales are normalised to the same sum-220 libwebp baseline so a pure-white pixel hits the same histogram bin regardless of source primaries — what differs is the per-channel weight that lands it there. No conversion, no clipping, just the right matrix. See src/luma.rs.

HDR f32 / linear inputs. Standard tiers see what an SDR display would show — RowConverter clips out-of-[0, 1] linear values, applies the sRGB OETF, and narrows to u8. That's the legitimate input for SDR-calibrated thresholds; tonemapping a 4 000-nit highlight into a visible mid-tone before measuring "high-frequency-energy ratio" would just lie about what's there. The above-clip signal lives in tier_depth, which reads the source samples directly via PixelSlice::row (bypassing RowConverter entirely) and decodes through the descriptor's transfer function — sRGB / BT.709 / Gamma 2.2 / Linear / PQ / HLG — to linear nits. Two views of the same source: the SDR-display view for trained thresholds, the source-direct view for HDR / wide-gamut signal.

The tier_depth reference convention is stable across 0.1.x:

Transfer Linear 1.0 maps to Convention
Srgb / Bt709 / Gamma22 / Linear 80 nits sRGB display reference (IEC 61966-2-1)
Pq 10 000 nits SMPTE ST 2084 absolute
Hlg 1 000 nits nominal HLG broadcast

The standard tiers' threshold contract is calibrated on display-space RGB8 bytes; the depth tier surfaces the additional metadata-gap and HDR signals that the RGB8 narrowing destroys.

Errors

#[non_exhaustive]
pub enum AnalyzeError {
    Convert(String),                                  // RowConverter setup failed
    InvalidInput(String),                             // user-supplied bad layout / length
    OutOfMemory { bytes_requested: Option<usize> },   // future fallible-alloc path
    Internal(String),                                 // unexpected
}

Production code handling untrusted images should pattern-match on InvalidInput / OutOfMemory explicitly. Today every internal allocation is infallible (so OutOfMemory is reserved, never returned by current builds); the variant is part of the public surface so a future minor that flips internals to Vec::try_reserve doesn't break anyone's match.

How it's organised

Five passes, each gated by what the requested FeatureSet actually needs:

Pass Iterates over Reads Cost (4 MP) Drives
Tier 1 Stripe-sampled rows RGB8 ~1 ms luma stats, edges, chroma, uniformity, grayscale
Tier 2 3-row sliding window RGB8 ~2 ms per-axis Cb/Cr sharpness
Tier 3 Sampled 8×8 DCT blocks RGB8 ~3 ms DCT energy, entropy, AQ map, noise floor, line-art, gradient, patch fraction
Palette Full image RGB8 ~1 ms distinct color bins
Alpha Stride-sampled rows Source bytes ~0.3 ms alpha presence / used / bimodal
tier_depth (experimental) Stride-sampled rows Source bytes ~0.5 ms HDR, ~0 SDR-fast-path HDR / wide-gamut / bit-depth / gamut-coverage

Tier 1/2/3 + Palette read RGB8 via RowStream, which has three internal paths:

  • Native (zero-copy) — RGB8-byte-layout-compatible inputs. Sub-slice straight from the source.
  • StripAlpha8 (zero RowConverter, scratch-only) — RGBA8 / BGRA8 / Rgbx8 / Bgrx8. Tight strip-and-maybe-swap into the row scratch. Skips the RowConverter alloc + plan + per-row CPU work.
  • Convert — everything else (16-bit, f32, grayscale, CMYK, …) goes through RowConverter row-by-row.

Alpha and tier_depth always read source bytes directly, never through RowStream — a load-bearing detail for HDR (RowConverter doesn't tonemap; its narrowing clips PQ / HLG into sRGB-display).

Performance

Release build, AVX2, no target-cpu=native, full FeatureSet::SUPPORTED:

Input 4 MP RowStream path
RGB8 / Rgbx8 with sRGB / wide-gamut primaries 9.5 ms Native (zero-copy slice subindex)
RGBA8 10.9 ms StripAlpha8 (garb SIMD strip)
BGRA8 12.0 ms StripAlpha8 (garb SIMD strip + swap)
RGB16 24.7 ms Convert (zenpixels-convert RowConverter)
RGBA16 28.6 ms Convert (zenpixels-convert handles strip + narrow)

The RGBA8 strip uses garb::bytes::rgba_to_rgb / bgra_to_rgb (SIMD-dispatched via archmage::incant!) — measured 7× faster than the previous in-tree scalar strip on a 2048-px row, dropping the RGBA8 overhead vs RGB8 baseline from +5.8 ms to +1.4 ms. The 16-bit input paths cost more because RowConverter does transfer-function- aware narrowing — that's the correct tool for genuinely heterogeneous input.

Per-call working-set memory is ~265 KB across ~7 allocations (largest single chunk is the Tier 1 stripe scratch at 9 × width × 3 = 108 KB at 4 K). All allocations are infallible today; the OutOfMemory variant exists so a future minor can flip them without API breakage.

Empirical operating thresholds

Picked on a 219-image labeled corpus from coefficient (benchmarks/classifier-eval/labels.tsv, spanning cid22-train/val, clic2025-1024, gb82, gb82-sc, imageflow, kadid10k, qoi-benchmark). 174 photo, 36 screen, 9 illustration, 44 marked synthetic. F1 / AUC are for binary screen-vs-photo classification.

For codec-orchestrator dispatch ("is this a screen or a photo?"):

signal threshold F1 AUC notes
line_art_score > 0 any nonzero 0.978 0.750 near-deterministic — line art ⇒ screen-like
natural_likelihood >= 0.06 photo detection 0.924 0.814 high precision photo classifier
patch_fraction >= 0.27 screen detection 0.769 0.880 strongest single screen discriminator
edge_slope_stdev >= 35 screen detection 0.844 second-strongest screen discriminator — photos cluster 15–32, screens 32–58
screen_content_likelihood >= 0.60 screen detection 0.750 0.831 derived from flat blocks + palette + chroma
flat_color_block_ratio >= 0.53 screen detection 0.750 0.838 raw — same F1 as the derived _likelihood
skin_tone_fraction >= 0.05 photo detection 0.824 0.799 one-direction (presence ⇒ photo); pigmentation-invariant Chai-Ngan YCbCr
text_likelihood >= 0.30 text detection 0.682 0.774 weaker but real
grayscale_score >= 0.99 grayscale dispatch encoder gap-filler, near-binary on real grayscale

Note: the three *_likelihood features empirically saturate at ~0.70 (not 1.0) on real content, because each is a weighted sum of clamped sub-components that don't simultaneously max on real images. Don't threshold them at >= 0.8 — nothing will fire. Operating points are in the 0.3–0.6 band. The exact corpus maxes are:

  • text_likelihood max 0.71
  • screen_content_likelihood max 0.70
  • natural_likelihood max 0.69

For descriptor-gap detection the thresholds are content-physical (see the "Descriptor-gap detection" table above): GrayscaleScore >= 0.99, GamutCoverageSrgb >= 0.99, etc. Those are spec-driven, not corpus-fit.

The full per-class distributions, ROC-AUC ranking for every feature, Spearman redundancy matrix, and the recalibration findings that were considered and rejected are recorded in docs/calibration-corpus-2026-04-27.md. That file is the pre-0.1.0-ship empirical baseline; subsequent 0.1.x patches that drift numerics should compare against it.

Threshold contract

Numeric thresholds and normalisation scales drift during 0.1.x. Downstream consumers that compile-in fitted models (oracle decision trees, content selectors) must pin to a specific zenanalyze patch version and re-validate when they bump it.

There is no 0.2.x. Every change in 0.1.x is additive — new variants on #[non_exhaustive] enums, new parallel functions, never a signature change to a shipped item. See CLAUDE.md.

Test surface

130+ tests covering math invariants on synthetic inputs (solid colours, horizontal bands, uniform luma distribution, palette-locked images, two-tone line drawings, smooth gradients, pure noise), the full 16-arm dispatch matrix, every supported pixel format (3 channel-types × 6 transfers × 4 primaries × 2 alpha = 144 sanity-matrix combinations), tier sizes from 1×1 to 4096×4096, deterministic-input bit-equality (catches accumulator non-determinism), u8-promotion bit-equality across u16 / f32 sources, HDR-survival (PQ ~1000-nit content preserved end-to-end where standard tiers would have clipped to SDR), gamut-coverage projections (saturated Rec.2020 green correctly fails sRGB coverage), and AnalyzeError Display / source coverage. Math locks use absolute tolerances chosen to clear ULP-level f32 noise from SIMD tree reductions but catch any genuine architecture divergence.

Note on coverage tooling: the SIMD kernels in tier1.rs, palette.rs, tier2_chroma.rs, and tier3.rs use #[magetypes(... v4, v3, neon, wasm128, scalar)] to generate one source-level monomorphisation per architecture tier. At runtime archmage's incant! dispatches to whichever the CPU supports; the other variants stay compiled but unreachable. Line- coverage tools count each variant separately, so the raw percentage on these files looks ≈30 % on x86_64. Real coverage of executable code paths (counted on the dispatched variant only) is ≥95 % across every module.

Companion crates in this repo

The repository hosts three sibling pieces that the codecs in imazen/zenjpeg, imazen/zenwebp, etc. compose with zenanalyze:

Path Identity Status
zenpredict/ Rust runtime — zero-copy MLP loader (ZNPR v2 binary format), forward pass, masked argmin, typed metadata, OOD bounds, two-shot rescue policy. Used by codec pickers (zenjpeg/zenwebp/zenavif/zenjxl) and by zensim V0.4 perceptual scoring Crate, 0.1.x
zenpicker/ Codec-family meta-picker — given features + target quality + an allowed-family mask, picks {jpeg, webp, jxl, avif, png, gif}; per-codec pickers then resolve the family into a concrete encoder config. Wraps zenpredict::Predictor Crate, 0.1.x
zentrain/ Python training pipeline — pareto sweep harness, teacher fit, distill, ablation, holdout probes, safety reports, bake to ZNPR v2 (via tools/bake_picker.pyzenpredict-bake). Produces both meta-picker and per-codec bakes Tooling, in-repo

The runtime crates (zenpredict, zenpicker) and the trainer (zentrain) version independently; the binary format (ZNPR v2) is the contract between them. See MIGRATION.md for the path from the previous (unpublished) zenpicker Rust shell to the current layout.

Cross-codec defaults + data discipline: read zentrain/PRINCIPLES.md before adopting or re-baking a picker for any codec (zenjpeg / zenwebp / zenavif / zenjxl / zenpng / zengif / zenpicker / zensim). It's the source of truth for what's invariant — corpus shape, argmin objectives, time- budget patterns, multi-metric bakes, OOD / reach gates, validation gates that block release.

License

AGPL-3.0-only OR LicenseRef-Imazen-Commercial. Commercial licensing available from imazen — contact lilith@imazen.io.

About

Image content analyzers for adaptive codec decisions: variance, edges, chroma sharpness, DCT energy, derived likelihoods. Published as zenanalyze on crates.io.

Resources

License

AGPL-3.0, Unknown licenses found

Licenses found

AGPL-3.0
LICENSE-AGPL3
Unknown
LICENSE-COMMERCIAL

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors