Skip to content

Comments

feat: add flag-gated TOON compression (~10-39% payload reduction) + validate on Ollama Cloud GLM-5#54

Merged
veerareddyvishal144 merged 6 commits intoFast-Editor:mainfrom
Plaidmustache:codex/toon-pr-clean
Feb 19, 2026
Merged

feat: add flag-gated TOON compression (~10-39% payload reduction) + validate on Ollama Cloud GLM-5#54
veerareddyvishal144 merged 6 commits intoFast-Editor:mainfrom
Plaidmustache:codex/toon-pr-clean

Conversation

@Plaidmustache
Copy link
Contributor

@Plaidmustache Plaidmustache commented Feb 17, 2026

Summary

This PR adds TOON-based prompt-context compression behind safe flags and validates behavior on Ollama Cloud (glm-5:cloud).

TOON project: https://github.com/toon-format/toon

What’s included

  • TOON compression integration (flag-gated, default OFF)
  • Safe fallback behavior (TOON_FAIL_OPEN=true)
  • No protocol payload mutation (tool schemas/call envelopes remain unchanged)
  • Unit tests covering:
    • flag-off no-op behavior
    • fail-open on encode errors
    • protocol-safety boundaries
  • Runtime documentation updates with validation results
  • Startup bug fix in budget manager (dbPath scope issue)

Why

  • Reduce prompt payload size for large structured JSON context while preserving behavior.
  • Measured repeatedly in validation one shot "task planning run" : 6416 -> 5854 bytes (~8.76% reduction), i.e. roughly 9–10%.

Validation notes

Tested on:

  • MODEL_PROVIDER=ollama
  • OLLAMA_ENDPOINT=http://127.0.0.1:11434
  • OLLAMA_MODEL=glm-5:cloud

Observed:

  • TOON OFF: no TOON conversion logs
  • TOON ON: TOON conversion logs present + byte reduction stats
  • Provider routing remained ollama

Why TOON (external benchmark context)

From the TOON project’s published benchmarks:

TOON benchmark snapshot (from TOON docs)

Format Efficiency (acc% / 1K tok) Accuracy Tokens
TOON 26.9 73.9% 2,744
JSON compact 22.9 70.7% 3,081
YAML 18.6 69.0% 3,719
JSON 15.3 69.7% 4,545
XML 13.0 67.1% 5,167

Efficiency score formula: (Accuracy % ÷ Tokens) × 1,000 (higher is better).

Key takeaway from TOON docs: TOON reached 73.9% accuracy vs JSON’s 69.7% while using ~39.6% fewer tokens.

  • Reported mixed-structure result: ~39.6% fewer tokens vs pretty JSON, with higher retrieval accuracy (73.9% vs 69.7%).
  • Higher efficiency score (accuracy per 1K tokens): TOON 26.9 vs JSON 15.3.
  • Designed as a lossless JSON representation (round-trip safe), so app logic can keep JSON while prompt payload is compressed.
  • Structure hints ([N] row counts and {fields} headers) are intended to improve schema-following reliability.

Practical caveats (also from TOON docs):

  • Deeply nested or non-uniform data may favor compact JSON.
  • For latency-critical paths (especially some local/quantized setups), benchmark both formats and choose the faster one.

References:

Safety

  • Default remains TOON_ENABLED=false
  • Fail-open enabled path preserves original JSON on conversion errors
  • No personal paths, usernames, or secrets included in committed diff

@Plaidmustache Plaidmustache changed the title feat: add flag-gated TOON compression (~9-10% payload reduction) + validate on Ollama Cloud GLM-5 feat: add flag-gated TOON compression (~10-39% payload reduction) + validate on Ollama Cloud GLM-5 Feb 17, 2026
@veerareddyvishal144
Copy link

veerareddyvishal144 commented Feb 19, 2026

Thanks @Plaidmustache for this PR.
This works really well.

@veerareddyvishal144 veerareddyvishal144 merged commit 4562795 into Fast-Editor:main Feb 19, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants