feat: direct raster backend (2-3x faster than plotters), fontdue text, Polars integration#31
feat: direct raster backend (2-3x faster than plotters), fontdue text, Polars integration#31jrmoynihan wants to merge 18 commits into
Conversation
Measured with `cargo bench` (Criterion) against main branch baseline.
SVG serialization:
- Replace all format!() with direct push_str()/write!() into the output
String, eliminating hundreds of intermediate heap allocations per render
- Add ryu crate for 2-5x faster float-to-string conversion
- Single-pass XML escaping instead of 5 chained .replace() calls
- Inline indent writing instead of closure-allocated Strings
SVG circles benchmark (pure serialization, no scene build):
1K circles: 171 µs → 53 µs (-69%)
10K circles: 1.71 ms → 0.53 ms (-69%)
100K circles: 17.6 ms → 5.3 ms (-70%)
1M circles: 199 ms → 71 ms (-64%)
Path builders:
- build_path/build_step_path pre-allocate and use ryu
- All inline path construction (add_band, violin KDE, stacked area,
draw_marker triangle/diamond, contour) rewritten the same way
Coordinate mapping:
- Pre-compute linear transform coefficients (scale + offset) in
ComputedLayout; map_x/map_y reduced from div+mul+add to mul+add
Colormap output:
- Replace format!("rgb({},{},{})") with hex lookup table (#rrggbb)
PNG backend:
- Cache system font database via OnceLock (loading fonts costs 100ms+)
Scene pre-allocation:
- Add Scene::with_capacity() and Plot::estimated_primitives()
Full pipeline benchmarks (scene build + SVG serialization):
scatter 100K pts: 54 ms → 27 ms (-49%)
scatter 1M pts: 667 ms → 316 ms (-51%)
line 100K pts: 104 ms → 44 ms (-58%)
line 1M pts: 1.28 s → 0.55 s (-57%)
violin 100K pts: 32 ms → 19 ms (-41%)
manhattan 1M pts: 434 ms → 288 ms (-34%)
heatmap 500x500: 125 ms → 56 ms (-55%)
heatmap 200x200: 19.5 ms → 9.1 ms (-53%)
heatmap 100x100: 4.4 ms → 2.2 ms (-49%)
Also fixes: unstable is_multiple_of() in render_utils.rs,
unnecessary Vec collection in bounds_from_2d().
Co-authored-by: James Moynihan <jrmoynihan@users.noreply.github.com>
The Path variant was the largest at ~120 bytes (3 Strings + 3 floats + 2 Options), forcing every Circle, Line, Rect, and GroupEnd to carry ~80 bytes of dead padding. For a 100K scatter plot this meant 12.8 MB of Vec storage with only 3.2 MB of useful Circle data — poor cache utilization since the CPU prefetcher loads 64-byte lines that are mostly padding. Boxing Path into Primitive::Path(Box<PathData>) shrinks the enum to ~88 bytes (dominated by Line), improving cache density by ~30% for Circle-heavy plots. The indirection cost is negligible: Path elements are unique (never shared) and are a small fraction of total elements in data-heavy plots (scatter emits 100K Circles vs ~1 Path). The benchmark numbers above already include this change. Co-authored-by: James Moynihan <jrmoynihan@users.noreply.github.com>
The existing PngBackend pipeline is:
Scene → SVG string → usvg parse → tiny_skia rasterize → PNG encode
For a 100K scatter plot, generating the SVG string takes ~27ms, but
parsing it back into a tree and re-rasterizing takes another ~150-250ms.
This round-trip is the fundamental gap vs plotters, which writes
directly into a pixel buffer.
RasterBackend renders Circles, Rects, Lines, and Paths directly via
tiny_skia's fill_path/stroke_path/fill_rect APIs, skipping SVG
serialization and parsing entirely. Text elements (axis labels, titles
— typically <1% of elements) are collected into a minimal SVG overlay
and composited via resvg for correct font shaping.
Usage:
use kuva::RasterBackend;
let bytes = RasterBackend::new()
.with_scale(2.0)
.render_scene(&scene)?;
// or the convenience function:
let bytes = kuva::render_to_raster(plots, layout, 2.0)?;
Available behind the existing `png` feature flag (no new dependencies —
tiny_skia is already in the tree via resvg).
Co-authored-by: James Moynihan <jrmoynihan@users.noreply.github.com>
Adds a Criterion benchmark (benches/vs_plotters.rs) comparing identical scatter, line, and heatmap workloads between kuva and plotters, both producing SVG output. Results on this hardware (Criterion, release profile): scatter 1K pts: kuva 132 µs plotters 451 µs (kuva 3.4x faster) scatter 10K pts: kuva 1.97 ms plotters 3.46 ms (kuva 1.8x faster) scatter 100K pts: kuva 19.9 ms plotters 33.6 ms (kuva 1.7x faster) line 1K pts: kuva 96 µs plotters 173 µs (kuva 1.8x faster) line 10K pts: kuva 926 µs plotters 627 µs (plotters 1.5x faster) line 100K pts: kuva 11.5 ms plotters 5.3 ms (plotters 2.2x faster) heatmap 50x50: kuva 589 µs plotters 975 µs (kuva 1.7x faster) heatmap 100x100: kuva 2.25 ms plotters 3.66 ms (kuva 1.6x faster) heatmap 200x200: kuva 9.18 ms plotters 14.2 ms (kuva 1.5x faster) Kuva is faster on per-element-heavy workloads (scatter, heatmap) thanks to ryu float formatting, direct push_str SVG writing, and hex colormaps. Plotters is faster on line charts because its streaming SVG backend writes path data in a single pass, while kuva's two-phase architecture (build Scene, then serialize) pays an extra copy for large path strings. Co-authored-by: James Moynihan <jrmoynihan@users.noreply.github.com>
Measures the full data → PNG bytes pipeline (the IPC-critical path). Results show clear tiers: plotters BitMapBackend: fastest (direct pixel writes, no intermediary) kuva RasterBackend: 2-8x slower (Scene intermediary + tiny_skia paths) kuva PngBackend: 3-13x slower (Scene → SVG → parse → raster → PNG) The Scene construction overhead is relatively small (5-10% of total), so the main optimization target is the raster backend's per-primitive draw loop and the Primitive enum's memory layout. Co-authored-by: James Moynihan <jrmoynihan@users.noreply.github.com>
Introduces render::color::Color — a 3-variant enum (Rgb, None, Css) that eliminates per-primitive heap String allocations for the common case: Color::Rgb(r,g,b) — 4 bytes inline, zero heap allocation Color::None — 1 byte, represents SVG fill="none" Color::Css(Box<str>) — fallback for unrecognized CSS strings From<&str> auto-parses #rrggbb, #rgb, rgb(r,g,b), "none", and 50+ named CSS colors into inline Rgb. From<String> does the same. Impact for a 100K scatter plot: eliminates ~100K heap String allocations (was: 24 bytes + heap per point for cloning the fill color). Impact for a 500x500 heatmap: eliminates ~250K allocations. Also adds CircleBatch and RectBatch SoA variants to Primitive for future use by scatter/heatmap renderers. All backends (SVG, raster, terminal) handle these new variants. The raster backend now converts Color→tiny_skia::Color directly (color_to_skia) instead of Color→String→parse→tiny_skia::Color, eliminating a string round-trip per primitive in the raster path. Co-authored-by: James Moynihan <jrmoynihan@users.noreply.github.com>
Wire up CircleBatch and RectBatch in the two hottest renderers:
add_scatter: uniform-circle scatter plots (the common case — same
marker, size, and color for all points) now emit a single
CircleBatch instead of N individual Primitive::Circle enums.
Coordinate transforms are parallelized with rayon::par_iter.
add_heatmap: all cells are packed into a RectBatch with contiguous
x/y/w/h/fill arrays. Colormap lookups and coordinate transforms
are parallelized across rows with rayon::par_iter.
Memory layout improvement for 100K scatter:
Before: 100K × Primitive::Circle (88 bytes each) = 8.6 MB
+ 100K heap String allocations for fill colors
After: 1 × CircleBatch with 2 × Vec<f64> (1.6 MB) + 1 × Color (4 bytes)
= 1.6 MB total, zero heap string allocations
The SVG and raster backends iterate batch arrays directly, avoiding
per-element enum dispatch and improving cache locality.
Co-authored-by: James Moynihan <jrmoynihan@users.noreply.github.com>
…aliasing) Replace tiny_skia path-based rendering with direct scanline algorithms: - pixel_circle: bounding-box scan with r² distance test - pixel_rect: scanline memcpy fill - pixel_line: Bresenham's algorithm (1px and thick variants) - Paths still fall back to tiny_skia for correct curve rendering - Text composited via resvg overlay (unchanged) This matches plotters' BitMapBackend approach: write RGBA bytes directly into the pixel buffer with no anti-aliasing overhead. Benchmark results (scatter 100K → PNG bytes, scale=1.0): Before (tiny_skia paths): 273 ms After (direct pixels): 12.5 ms (21.8x faster) plotters BitMapBackend: 29.6 ms kuva RasterBackend is now 2.4x FASTER than plotters for scatter plots. Full comparison table: scatter 1K: kuva 2.8 ms plotters 1.9 ms (plotters 1.5x) scatter 10K: kuva 5.1 ms plotters 4.6 ms (plotters 1.1x) scatter 100K: kuva 12.5 ms plotters 29.6 ms (kuva 2.4x faster) heatmap 50: kuva 3.2 ms plotters 1.2 ms (plotters 2.5x) heatmap 100: kuva 2.9 ms plotters 1.8 ms (plotters 1.6x) heatmap 200: kuva 3.7 ms plotters 3.3 ms (plotters 1.1x) kuva wins at high element counts; plotters wins at low counts due to lower fixed overhead (no scene intermediary, no text overlay pixmap). The crossover is around 10K elements. Co-authored-by: James Moynihan <jrmoynihan@users.noreply.github.com>
Adds a `polars` feature flag with ergonomic DataFrame → plot bindings.
Two usage patterns:
// Pattern 1: DataFrameExt trait on DataFrame
use kuva::dataframe::DataFrameExt;
let scatter = df.scatter("x", "y")?;
let histogram = df.histogram("values", 30)?;
let bar = df.bar("labels", "counts")?;
// Pattern 2: Builder methods on plot types
let scatter = ScatterPlot::new()
.with_xy(&df, "x", "y")?
.with_color("steelblue");
let volcano = VolcanoPlot::new()
.with_columns(&df, "gene", "log2fc", "pvalue")?;
let manhattan = ManhattanPlot::new()
.with_columns(&df, "chromosome", "pvalue")?;
Supports: ScatterPlot, LinePlot, BarPlot, Histogram, Heatmap,
ManhattanPlot, VolcanoPlot. Numeric columns are auto-cast to f64.
Null values produce clear PlotDataError messages.
Feature is optional and adds no compile-time cost when disabled.
Also exports RasterBackend and render_to_raster in the prelude.
Co-authored-by: James Moynihan <jrmoynihan@users.noreply.github.com>
The text overlay (resvg font shaping + full pixmap allocation + alpha composite) accounts for 50-80% of total render time for typical plots with axis labels. On macOS with many installed fonts, this can be 25ms+ for just 15 text elements. Add RasterBackend::with_skip_text(true) and render_to_raster_no_text() for callers that render labels in the frontend (e.g. webview overlay). With text skipped, the full pipeline for 18K scatter points is: pixel draw: 0.5 ms png encode: 0.7 ms total: ~3 ms (was 7-8ms with text, 30ms+ on macOS) Also adds diagnostic timing (behind eprintln) to render_scene() to help identify bottlenecks in integrator codebases. Co-authored-by: James Moynihan <jrmoynihan@users.noreply.github.com>
The resvg SVG text pipeline was the dominant bottleneck — 50-80% of
total render time for plots with axis labels:
Old path (resvg):
build SVG string → usvg XML parse → rustybuzz text shaping →
allocate full-size second Pixmap → tiny_skia render → alpha composite
= 3-14ms Linux, 25ms+ macOS
New path (fontdue):
load font once (cached) → rasterize glyphs → blit into pixel buffer
= 76-85µs (after first-call font load)
This is the same approach plotters uses — rasterize individual glyphs
directly into the pixel buffer. No SVG, no XML parser, no second pixmap.
First call pays ~28ms for font loading (finds system sans-serif font,
parses with fontdue, cached via OnceLock). All subsequent calls are <100µs.
Handles text anchoring (start/middle/end) and rotation.
Net effect on the full pipeline (18K scatter, 687x545 image):
Before: 7.9ms (of which 3.4ms was text)
After: 5.3ms (of which 0.08ms is text)
Text is no longer a bottleneck at any scale.
Co-authored-by: James Moynihan <jrmoynihan@users.noreply.github.com>
Adds render_scene_to_pixmap() for raw RGBA output without PNG encoding.
Benchmark covers all format × text combinations at 1K/10K/100K scatter.
Results summary (10K scatter, 800x600):
SVG output:
kuva: 1.04 ms
plotters: 3.36 ms — kuva 3.2x faster
PNG encoded bytes:
kuva (with text): 2.23 ms
kuva (no text): 2.15 ms
plotters (with text): 4.85 ms — kuva 2.2x faster
plotters (no text): 3.78 ms — kuva 1.8x faster
Raw pixel buffer (no PNG encoding):
kuva (with text): 1.02 ms
kuva (no text): 0.98 ms
plotters (with text): 3.60 ms — kuva 3.5x faster
plotters (no text): 3.01 ms — kuva 3.1x faster
At 100K scatter:
kuva raw buffer: 9.1 ms
plotters raw buffer: 30.0 ms — kuva 3.3x faster
kuva PNG: 10.1 ms
plotters PNG: 31.5 ms — kuva 3.1x faster
Text overhead is now negligible (~0.08ms) thanks to fontdue.
Co-authored-by: James Moynihan <jrmoynihan@users.noreply.github.com>
The 'png' feature flag now enables much more than PNG output: direct pixel-buffer rasterization via RasterBackend, fontdue text rendering, render_to_raster(), render_scene_to_pixmap(), etc. Rename to 'raster' to reflect the actual scope. The old 'png' name is kept as an alias (png = ["raster"]) so existing Cargo.toml lines like features=["png"] continue to work unchanged. [features] raster = ["dep:resvg", "dep:fontdue"] png = ["raster"] # backward-compat alias Co-authored-by: James Moynihan <jrmoynihan@users.noreply.github.com>
- Add render_to_rgba and render_to_rgba_no_text for raw RGBA output - Add render_to_png_direct and render_to_png_direct_no_text (clearer names) - Keep render_to_raster and render_to_raster_no_text as backward-compat aliases - Update docs with raster output options table - Remove Tauri-specific documentation Co-authored-by: James Moynihan <jrmoynihan@users.noreply.github.com>
Updated the output options in the API documentation to include 'raw RGBA bytes' and clarified the use case for 'render_to_png_direct_no_text'.
Keep PR Psy-Fer#29/30 performance optimizations and Color support in src/backend/svg.rs. Co-authored-by: James Moynihan <jrmoynihan@users.noreply.github.com>
- render_to_png_direct -> render_to_png_raster - render_to_png_direct_no_text -> render_to_png_raster_no_text - render_to_rgba -> render_to_rgba_bytes - render_to_rgba_no_text -> render_to_rgba_bytes_no_text Backward-compat aliases (render_to_raster, render_to_raster_no_text) now point to the _raster variants. Prelude and docs updated. Co-authored-by: James Moynihan <jrmoynihan@users.noreply.github.com>
|
I think this PR needs to be 2 different PRs
I think the API for how polars will integrate with the current builder pattern needs to be work-shopped a bit. for this tradeoff point
This is fine, we can just mention this in the docs. this point however
needs to be fixed. Any ideas? Even just wrapping in a What do you think? (i'm happy to do that after this PR separately if you like, and you can just make sure the Raster backend returns a Result with an error of not having a usable font). I think this PR needs the most work of the 3 you submitted. Good thing it's the last of the 3 so we can sort this out after the other 2 are done. Cheers, |
|
I agree completely. I'll take a shot at splitting it up. The Result is 100% the correct/better pattern to use, with some good messaging to the user in the event of not finding a system font to use. Maybe the message should suggest the If we go the embedding route, there's two options:
For the polars API, what do you think of these options for fitting the verbose/explicit naming scheme?:
I'm kind of torn on the |
|
Thanks for the detailed followup. Splitting the PR: yes please, that would be much appreciated. Raster backend first, the the stuff polars second. Font fallback: I'd skip the assets/DejaVuSans.ttf: ~700kB, only compiled into binaries with raster feature Load it via Polars API design: I'd avoid adding polars specific methods to the core plot structs at all. I was thinking maybe something like an extension trait on use kuva::polars::DataFrameExt; // this is the polars aware conversion layer
use kuva::prelude::*;
// Single plot takes plots + layout from dataframe
let scatter = df.to_scatter("x", "y")?.with_color("steelblue").with_legend("My Data");
let plots = vec![scatter.into()];
let layout = Layout::auto_from_plots(&plots);
let svg = render_to_svg(plots, layout);
// Mix polars and manual. Both are just Plot variants
let scatter = df.to_scatter("x", "y")?.with_color("steelblue");
let trend = LinePlot::new().with_data(vec![(0.0, 0.0), (10.0, 10.0)]);
let plots = vec![scatter.into(), trend.into()];
let layout = Layout::auto_from_plots(&plots);
let svg = render_to_svg(plots, layout);
// Inside a Figure panel.
let figure = Figure::new(2, 1)
.with_plot(0, 0, df_a.to_scatter("x", "y")?.into())
.with_plot(0, 1, df_b.to_histogram("value", 30)?.into());The key point: What do you think about this? (I have strong opinions on API, sorry, haha) In the mean time, i'm going to race ahead on dev with a bunch of fixes and some added features around legends, axes, and some plot specific fixes. So this current PR is going to diverge a bit from those and the changes I made when merging #30. Also it may be worth it to keep plotters and criterion out of the deps after the benchmarks are done. They just make everything so much more complicated than it needs to be, and we can mange adding them when we need to for 1 off benchmarking to prove something is actually faster than before or against another lib/method. Cheers, edit: realised that the code example would have a borrow checker issue |
|
I may have gone a little crazy with features and bug fixes 😆 I just had some time to add new plots and features, and crush some really annoying bugs. On the plus side, things are looking great! On the down side, it's a lot of changes for this PR to handle. Sorry 😢 |
What
A new rendering path for users who need raster output (PNG bytes, raw RGBA buffers) rather than SVG. Also adds optional Polars DataFrame integration. Depends on PRs #29 and #30.
Why
The existing
PngBackendgenerates SVG, parses it back with usvg, and re-rasterizes — a round-trip that costs 200-400ms for 100K points. Plotters avoids this by drawing directly into a pixel buffer. This PR brings the same approach to kuva.Changes
RasterBackend (
backend::raster)Pathelements (line charts, violin KDE) fall back to tiny_skia for correct curve renderingfontdueglyph rasterization directly into the buffer (~0.08ms for 15 labels vs 3-25ms with the old resvg SVG overlay)render_scene()→ PNG bytes;render_scene_to_pixmap()→ raw RGBA;render_scene_to_rgba()→(width, height, Vec<u8>)with_skip_text(true)for maximum throughput when the frontend overlays its own labelsrender_to_pngrender_to_png_rasterrender_to_png_raster_no_textrender_to_rgba_bytesUint8ClampedArray)render_to_rgba_bytes_no_textFeature rename
png→raster(backward-compat aliaspng = ["raster"]kept)features = ["png"]in downstream Cargo.toml continues to workPolars integration (
dataframemodule, behindpolarsfeature)DataFrameExttrait onDataFrame:df.scatter("x", "y"),df.histogram("col", 30), etc.ScatterPlot::new().with_xy(&df, "x", "y")PlotDataErrormessages for missing columns, wrong dtypes, nullsBenchmarks (vs plotters, Criterion)
Tradeoffs
fontdueadded as optional dep (behindrasterfeature). Pure Rust, ~5K lines, loads system fonts at first use (~28ms one-time cost, cached).polarsfeature uses polars 0.46 which requires Rust ≥1.82.plotters,plotters-svg,plotters-bitmap,imageas dev-dependencies for benchmark comparison only.Type of change
Checklist
Library (new plot type)
src/plot/<name>.rs— struct + builder methodssrc/plot/mod.rs—pub mod+ re-exportsrc/render/plots.rs—Plotenum variant +bounds()/colorbar_info()/set_color()src/render/render.rs—render_<name>(), added torender_multiple()match,skip_axesif pixel-spacesrc/render/layout.rs—auto_from_plots()extended if categories neededTests
tests/with ≥ basic render + SVG content + legend testscargo test --features cli,full— all existing tests still passCLI (if applicable)
src/bin/kuva/<name>.rs— Args struct (with/// doc comment) +run()src/bin/kuva/main.rs— module, Commands variant, match armscripts/smoke_tests.sh— at least one invocationtests/cli_basic.rs— SVG output test + content verification testdocs/src/cli/index.md— subcommand entryman/kuva.1— regenerated (./target/debug/kuva man > man/kuva.1)Documentation
examples/<name>.rs— Rust example for doc asset generationscripts/gen_docs.sh— invocations added;bash scripts/gen_docs.shruns cleandocs/src/plots/<name>.md— documentation page with embedded SVGsdocs/src/SUMMARY.md— link addeddocs/src/gallery.md— gallery card addedREADME.md— plot types table updatedVisual inspection
test_outputs/— new plot SVGs look correcttest_outputs/for layout regressionsbash scripts/smoke_tests.sh— all existing smoke test outputs still look correctHousekeeping
CHANGELOG.md— entry added under## [Unreleased]README.md— item marked done in TODO section if applicable