loxc is a small text compressor built around lookup-table encoding. It favors
fast decode paths, simple data structures, and generated modules that can be
embedded at build time or loaded at runtime.
The stream layer provides bit-level readers and writers on top of byte arrays. Everything above it is built in terms of:
loxc_writer_tfor emitting packed bitsloxc_reader_tfor consuming packed bitsloxc_write_bits()andloxc_read_bits()as the basic primitives
This keeps the encoder and decoder independent from the final container format.
loxc_train measures several candidate encodings and picks the cheapest one:
FLAT_FIXED_WIDTHHIERARCHICAL_8HIERARCHICAL_4
The selector compares estimated bit cost using the actual symbol distribution. The chosen strategy is the one that minimizes total encoded size for the training corpus.
Hierarchical strategies model the symbol space as a sequence of fixed-width levels. Each level uses an escape value to continue to the next matrix.
level 0 -> [direct symbols ... | ESCAPE]
level 1 -> [direct symbols ... | ESCAPE]
level 2 -> [direct symbols ... | ESCAPE]
The direct slots are encoded with 4-bit or 6-bit chunks depending on the selected strategy. This gives compact representation for common symbols while still supporting larger symbol tables.
loxc_train first collects candidate substrings, then evaluates whether each
dictionary entry reduces output size globally.
The filter accepts only entries with positive net gain after accounting for:
- symbol table overhead
- dictionary metadata
- actual usage frequency
This keeps the final module small and avoids dictionary bloat.
.loxctabis the portable runtime table format..loxcis the compressed container format.
Both formats are byte-packed and explicitly little-endian. They do not depend on C struct layout or platform alignment.
The registry maps module names to loxc_module_t instances. Compression and
decompression look up the module by name unless the .loxc file contains an
embedded table.
Runtime-loaded modules are represented with a private context object that owns the decoded table data.
training text
-> loxc_train
-> .h / .c / .loxctab
-> module registry or runtime loader
-> loxc_compress / loxc_decompress
- fast decode
- predictable memory use
- portable runtime-loaded tables
- generated modules for embedded deployments
- simple file formats that can be inspected and debugged with standard tools