Summary
Task 02 described the tokenizer as universal across modalities. Only text has been exercised. Until a second modality is tested, the claim is aspirational.
What to do
- Pick one non-text modality (time-series numeric, audio frames, byte-level binary blobs, pixel deltas) that is small enough to run in a unit test.
- Feed it through
tokenize_with_silence or an equivalent adapter.
- Verify the byte-trie builds meaningful depth and that concept binding across modalities behaves plausibly.
- Either:
- Promote Task 02 to "universal (text + X)" with test evidence, or
- Narrow the claim in README and docs to "text-only for PoC".
Acceptance
- Test file
tests/universal_tokenizer_<modality>.rs exists, or README/docs narrowed to reflect reality.
Links
docs/design/honest_agent/tasks/02_universal_tokenizer.md
src/trie/tokenizer.rs
Summary
Task 02 described the tokenizer as universal across modalities. Only text has been exercised. Until a second modality is tested, the claim is aspirational.
What to do
tokenize_with_silenceor an equivalent adapter.Acceptance
tests/universal_tokenizer_<modality>.rsexists, or README/docs narrowed to reflect reality.Links
docs/design/honest_agent/tasks/02_universal_tokenizer.mdsrc/trie/tokenizer.rs