An embedding engine built for the sovereign web.
Emlet is a fast, fully self-contained semantic embedding engine designed to run anywhere JavaScript runs—browser, Node, edge, offline. No dependencies, no GPU, no network calls. Just load and embed.
The entire engine fits in 1 MB and produces deterministic vector embeddings suitable for similarity search, clustering, retrieval, tagging, or downstream ML workflows.
- 100M parameters, ~1MB total size
- 7K tokens/sec throughput (in the browser)
- Deterministic output (same input → same vector)
- Out-of-vocabulary synthesis (no missing tokens)
- Unicode-aware (text, emoji, symbols, ZWJ)
- Configurable vector size (1-1568D)
- Offline-first, zero dependencies
- Vanilla JavaScript, edge-ready
- No GPU. No cloud. No API.
- Self-extracting runtime
- Neuro-symbolic core
- A digital familiar
npm install emletOr load directly via CDN:
<script src="https://unpkg.com/emlet"></script>This exposes both emlet (a preloaded instance) and Emlet (the class) globally.
// CommonJS
const emlet = require('emlet')
const { emlet, Emlet } = require('emlet')
// ESM
import emlet from 'emlet'
import { emlet, Emlet } from 'emlet'const vec = emlet.embed('Hello, world!')
console.log(vec)
// → [0.08, -0.01, ...] (96-dimensional vector by default)The default export is a ready-to-use model instance.
You can create your own instance with a different output size:
const modelA = new Emlet() // 96D default
const modelB = new Emlet(128) // 128D output
const modelC = new Emlet(256, true) // 256D head + 32D tail = 288Dnew Emlet(dim = 96, useTail = false)-
dimNumber of dimensions to emit from the primary embedding space. -
useTailWhentrue, appends a 32-dimensional “glimpse” of the full 1536D semantic space to every vector.
This allows output sizes from 1 up to 1536 dimensions, or 1568 when the tail is enabled.
Tokens not present in the internal vocabulary are synthesized deterministically:
emlet.embed('quantaflux')There are no unknown tokens and no fallbacks to zero vectors.
Emlet natively handles Unicode symbols, emoji, modifiers, and ZWJ sequences:
emlet.embed('🦄')
emlet.embed('👩🏽🚀')These are embedded consistently and can be compared using standard vector similarity.
Punctuation is normally stripped during tokenization. If the input is a single character, it is embedded as-is:
emlet.embed('.')
emlet.embed('[')This allows punctuation-level modeling when needed without polluting normal text embeddings.
Emlet intentionally exposes a minimal API:
embed(text: string): number[]new Emlet(dim?: number, useTail?: boolean)
Everything else—chunking, similarity, indexing, clustering—is left to userland.
See test.js for example usage including batch encoding, similarity math, and vector inspection.
Emlet includes a test suite built with testr.
To run the test, first clone the repository:
git clone https://github.com/basedwon/emlet.gitInstall the dependencies, then run npm test:
npm install
npm testIf Emlet sparks something useful in your work, consider sending some coin to support further development.
Bitcoin (BTC):
1JUb1yNFH6wjGekRUW6Dfgyg4J4h6wKKdF
Monero (XMR):
46uV2fMZT3EWkBrGUgszJCcbqFqEvqrB4bZBJwsbx7yA8e2WBakXzJSUK8aqT4GoqERzbg4oKT2SiPeCgjzVH6VpSQ5y7KQ
Emlet License v1.0 (based on Apache 2.0) Use is permitted with attribution. Redistribution, rebranding, resale, and reverse engineering are prohibited without written permission.
See LICENSE for full terms.
Contact: basedwon@tuta.com for commercial or licensing inquiries.