Skip to content

Port to Slug algorithm#62

Merged
behdad merged 69 commits intomasterfrom
slug
Mar 21, 2026
Merged

Port to Slug algorithm#62
behdad merged 69 commits intomasterfrom
slug

Conversation

@behdad
Copy link
Copy Markdown
Owner

@behdad behdad commented Mar 21, 2026

See: https://terathon.com/blog/decade-slug.html

On my setup, runs >2.5x faster than GLyphy algorithm and seems to consume slightly more shader memory. All rendering artifacts are gone!

This was done in two hours with Claude, from SLUG PDFs & shader code, to working demo. Insane.

@behdad behdad force-pushed the slug branch 2 times, most recently from 23b6098 to 69250ee Compare March 21, 2026 05:04
behdad and others added 13 commits March 20, 2026 23:22
Strip all arc, SDF, blob encoding, and shader API from glyphy.h.
Add glyphy_curve_t (quadratic Bezier with p1/p2/p3) and
glyphy_curve_accumulator_t that directly stores quadratic curves
from glyph outlines without arc approximation.

This is the first step toward Slug-style winding number rendering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Encodes quadratic Bezier curves into a RGBA16UI blob with:
- Horizontal and vertical bands for spatial indexing
- Curves sorted by descending max coordinate for early exit
- Em-space quantized int16 coordinates (configurable scale)
- Horizontal/vertical line filtering (can't intersect parallel rays)

Blob layout: [band headers] [curve index lists] [curve data]
All offsets are 1D from blob start; shader converts to 2D via atlas width.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Port SlugPixelShader.hlsl to GLSL 1.30 for our single isampler2D atlas:
- CalcRootCode: equivalence class via sign-bit extraction (floatBitsToUint)
- SolveHorizPoly/SolveVertPoly: quadratic root finding with discriminant clamping
- CalcCoverage: weighted combination of horizontal and vertical ray coverage
- glyphy_slug_render: main entry point, iterates H-bands and V-bands

Adapted for single-texture blob layout with 1D offsets via CalcBandLoc.
Curve data decoded from int16 quantized em-space coordinates.

Also: cap bands at 16, use signed int16 texels, update meson.build
to remove old arc/sdf/blob sources and add new files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New vertex shader passes em-space texcoords and flat per-glyph data
(band transform, atlas location, band counts) to fragment shader.
Fragment shader calls glyphy_slug_render() for coverage.

Updated glyph_info_t: replaced nominal_w/h with num_hbands/num_vbands.
Updated glyph_vertex_t: new layout with position, texcoord, band
transform, and glyph data fields for Slug rendering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Atlas: RGBA16I texture with isampler2D, texelFetch upload
- Font: use curve accumulator + blob encoder instead of arc pipeline
- Shader: assemble slug GLSL + demo fragment shader, new vertex layout
  with position, texcoord, bandTransform (flat), glyphData (flat)
- Buffer: set up 4 vertex attributes including glVertexAttribIPointer
  for integer glyph data
- GLState: simplified, removed old SDF uniforms
- HarfBuzz/FreeType headers: updated for curve accumulator,
  use hb_font_draw_glyph, stub cubic_to for now
- Removed demo-atlas.glsl (no longer needed)

Builds and links successfully.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use 16384 texels (vs 4096) to handle complex glyphs with many curves.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Bump to #version 330 (needed for floatBitsToUint)
- Remove #version from demo-fshader.glsl (prepended by demo-shader.cc)
- Remove demo-atlas.glsl from Makefile.am (no longer used)
- Update src/Makefile.am for new source files and shader
- Slug shader by Eric Lengyel, ported to GLSL by Behdad Esfahbod

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Delete all arc geometry, bezier-to-arc approximation, SDF calculation,
grid-based blob encoding, outline manipulation, and glyphy-validate.
Fix glyphy-extents.cc to not depend on deleted glyphy-common.hh.
Update both autotools and meson build files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add close_path callback to HarfBuzz draw funcs. Without it, the
  last contour's closing segment was never emitted, causing winding
  number errors (horizontal streak artifacts).
- Use floor() instead of (int) cast for band assignment to avoid
  off-by-one at band boundaries due to floating-point truncation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lines encoded as degenerate quadratics with p2=midpoint(p1,p3) had
a=p1-2*p2+p3=0 mathematically, but float32 rounding of the
intermediate p12.zw*2.0 in the shader made a nonzero (~1e-7).
This triggered the quadratic path with 1/a producing Inf,
corrupting winding numbers across entire scanlines.

Fix: encode lines as (p1,p1,p3) instead. This gives a=p3-p1
(full line span, well-conditioned) and b=0. The standard quadratic
solver handles this cleanly with no near-zero divisions and no
shader changes needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These features were part of the old SDF pipeline and are not
implemented in the Slug renderer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
behdad and others added 14 commits March 20, 2026 23:46
Replace the 2D RGBA16I texture atlas with a 1D buffer texture
(GL_TEXTURE_BUFFER). This eliminates:
- CalcBandLoc wrapping logic in the shader
- Atlas width constant (GLYPHY_ATLAS_WIDTH, GLYPHY_LOG_ATLAS_WIDTH)
- 2D coordinate math for atlas lookups
- item_w / item_h_quantum parameters

The shader now uses isamplerBuffer with plain integer offsets:
texelFetch(u_atlas, glyphLoc + offset) instead of
texelFetch(u_atlas, glyphy_calc_blob_loc(glyphLoc, offset), 0).

glyphLoc is a single int (1D offset) instead of ivec2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Store extents and band counts in a 2-texel header at the start of
each glyph's blob. The shader reads the header and computes the
band transform itself.

Client no longer needs to pass bandTransform (vec4) or band counts.
Per-glyph vertex data is now just: position, texcoord, and one
flat int (atlas_offset). The encoder API drops num_hbands/num_vbands
output parameters.

~1% performance cost for much simpler client integration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per-glyph data is now just one uint (atlas offset) instead of an
ivec4. Vertex size drops from 48 to 20 bytes. The shader API
takes uint for the glyph location.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Meson is the only build system now. Remove all autotools files
(configure.ac, Makefile.am, autogen.sh, git.mk) and stale
generated headers that caused shader compilation issues when
both build systems were used.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The encoder outputs extents in font design units. Previously the demo
divided by upem then the shader multiplied back. Now everything stays
in font units. Screen positions use font_size/upem as the scale factor.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adjacent curves in a contour share p3/p1. Instead of storing 2
texels per curve (2N total), store N+1 texels per contour:
  (p1,p2) (p3/p1,p2_next) ... (p3,0)

The shader reads curveLoc and curveLoc+1, unchanged. Band indices
point to each curve's first texel. Nearly halves curve data storage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ubuntu 24.04 has harfbuzz >= 4.0.0, no need to build from source.
Remove autotools steps, autotools deps, and harfbuzz cache.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
For each band, store curves sorted in both directions: descending
max (for rightward/upward ray) and ascending min (for leftward/
downward ray). A split value per band determines which direction
to use based on pixel position.

Pixels on the left side of the glyph use leftward ray with ascending
sort; pixels on the right use rightward ray with descending sort.
Each side exits early after processing only the nearby curves.

Coverage formula adjusted for leftward ray: saturate(0.5 - r)
instead of saturate(r + 0.5).

~15% speedup (1725 -> 1980 fps).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of using the glyph midpoint for all bands, compute the
optimal split per band by finding the x/y value that minimizes
max(left_count, right_count). This balances the work between
leftward and rightward rays more evenly, especially for
asymmetric glyphs.

~2% speedup (1980 -> 2020 fps).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dilate glyph quads by exactly half a pixel on screen, regardless
of zoom level. The vertex shader computes the dilation in clip
space and corrects the texcoord using the inverse 2x2 MVP Jacobian.

Per-vertex data adds normal direction and inverse Jacobian for the
object-to-em-space mapping. Viewport size passed as uniform.

Based on SlugDilate from the Slug reference vertex shader, simplified
for z=0 geometry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add glyphy-vertex.glsl with glyphy_dilate() that computes exact
half-pixel dilation using the MVP matrix and viewport size.
Client vertex shader just calls one function.

Rename shader files and API:
  glyphy-slug.glsl -> glyphy-fragment.glsl
  glyphy_slug_shader_source -> glyphy_fragment_shader_source
  glyphy_slug_render -> glyphy_render
  New: glyphy_vertex_shader_source, glyphy_dilate

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Drop "(MIT license)" from Slug attribution -- the algorithm is
referenced, not the code. Remove glyphy_*_shader_source_path()
functions and PKGDATADIR/SHADER_PATH since shaders are embedded.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
behdad and others added 17 commits March 21, 2026 11:14
Replace GLUT/FreeGLUT windowing with GLFW3:
- Explicit main loop instead of glutMainLoop
- GLFW native vsync via glfwSwapInterval (removes platform-specific
  CGLSetParameter/wglSwapIntervalEXT/glXSwapIntervalSGI code)
- Proper core profile context hints (no GLUT_3_2_CORE_PROFILE ifdef)
- Proper scroll callback instead of GLUT_WHEEL_UP/DOWN hack
- Split keyboard handling into key callback (special keys) and char
  callback (printable characters)
- Dirty-flag based redraw with glfwWaitEvents when idle

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
glewInit() calls glGetString(GL_EXTENSIONS) which generates
GL_INVALID_ENUM in a core profile context.  Ignore the return
value and clear the GL error; the GL_VERSION_3_3 check that
follows is the real gate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Set initial viewport from framebuffer size (not window size)
- Use window size for mouse coordinate math since GLFW cursor
  positions are in screen coordinates, not framebuffer pixels
- Poll events before first render

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Render an initial frame, poll events to let the Wayland compositor
configure the surface at the correct content scale, then render
a second frame at the right resolution.

Also set viewport from framebuffer size every frame to avoid
stale viewport state on HiDPI displays.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Track cursor position continuously from the cursor callback
instead of querying glfwGetCursorPos in the mouse button handler.
On Wayland, the two can disagree, causing a position jump on the
first drag.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Integer millisecond timing caused jitter at high frame rates
(vsync off) because frame deltas alternated between 0 and 1ms.
Switch to double-precision seconds throughout the animation
and FPS reporting code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@behdad
Copy link
Copy Markdown
Owner Author

behdad commented Mar 21, 2026

As for encoding speed, the new benchmark shows that this encodes NotoSansCJK in 750ms on my machine. Not bad... For reference, this is 8x slower than HarfBuzz extracting the outlines. Shade memory is bad though: over 400MB for this font.

glyphs: 65535 x 1 repeats = 65535 processed (65524 non-empty)
total blob size: 435617984 bytes (425408.19kb)
avg curves per glyph: 83.72
avg blob size per glyph: 6.49kb
outline:   91.773ms total, 1.400us/glyph, 714100 glyphs/s
encode:   764.360ms total, 11.663us/glyph, 85738 glyphs/s, 543.51 MiB/s
wall:     858.766ms total (outline + encode + loop overhead)

Roboto-Regular:

glyphs: 1294 x 1 repeats = 1294 processed (1275 non-empty)
total blob size: 3191680 bytes (3116.88kb)
avg curves per glyph: 21.36
avg blob size per glyph: 2.41kb
outline:    0.694ms total, 0.536us/glyph, 1864704 glyphs/s
encode:     4.303ms total, 3.325us/glyph, 300755 glyphs/s, 707.45 MiB/s
wall:       5.051ms total (outline + encode + loop overhead)

@behdad
Copy link
Copy Markdown
Owner Author

behdad commented Mar 21, 2026

This PR also modernizes the codebase, ports to GLFW, and many other improvements. I suggest we merge it after the license is sorted out.

@behdad
Copy link
Copy Markdown
Owner Author

behdad commented Mar 21, 2026

Okay, I received written (email) permission from Eric to use the shaders under HarfBuzz's Old MIT license:

Hi --

You have permission to use the Slug reference shaders (https://github.com/EricLengyel/Slug) under the Old MIT license.

-- Eric Lengyel

I'll go ahead and cherry-pick and update the license change from master, then this can be merged.

@behdad behdad merged commit 926b96d into master Mar 21, 2026
@wipfli
Copy link
Copy Markdown

wipfli commented Mar 21, 2026

Great! Don't forget to update the repo description:

GLyphy is a signed-distance-field (SDF) text renderer using OpenGL ES2 shading language. 
``|

@behdad
Copy link
Copy Markdown
Owner Author

behdad commented Mar 21, 2026

Thanks. Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants