Tiny zero‑dependency tokenizer for simple DSLs and config/query languages in Rust.
Generic: drop it into parsers, rule engines, interpreters, or build tooling.
Supports identifiers, numbers, strings, operators, and a small set of keywords.
Clear spans, a streaming iterator, and a zero‑copy mode when you want pure speed.
Designed for speed, clarity, and easy embedding.
Currently ranked #35 in Parser Tooling on lib.rs ⭐ If you find this useful, it helps visibility and adoption!
Install:
[dependencies]
sentience-tokenize = "0.2.3"Basic usage:
use sentience_tokenize::tokenize;
fn main() {
let toks = tokenize("let x = 1").unwrap();
for t in toks { println!("{:?} @{}..{}", t.kind, t.span.start, t.span.end); }
}Streaming iterator (no allocation of full token vec):
use sentience_tokenize::tokenize_iter;
for item in tokenize_iter("let x = 1") {
let t = item.unwrap();
println!("{:?} @{}..{}", t.kind, t.span.start, t.span.end);
}Zero-copy tokens (borrow &str slices from the source):
use sentience_tokenize::{tokenize_borrowed, BorrowedTokenKind};
let toks = tokenize_borrowed("\"hi\" 123").unwrap();
assert!(matches!(toks[0].kind, BorrowedTokenKind::String("hi")));
assert!(matches!(toks[1].kind, BorrowedTokenKind::Number("123")));- Zero dependencies (only std).
- Token kinds: identifiers, numbers, strings, parens/brackets/braces,
= + - * / ->,. .. @. - Keywords:
true false if then else let rule and or. - Spans included for each token.
- Iterator API:
tokenize_iteryieldsResult<Token, LexError>. - Zero-copy API:
tokenize_borrowedreturnsBorrowedToken<'_>/BorrowedTokenKind<'_>with&strslices. - Whitespace & // comments skipped.
serde: deriveSerialize/Deserializefor tokens and errors- zero-copy API:
tokenize_borrowedreturnsBorrowedTokenKind<'a>/BorrowedToken<'a>with&strslices (strings keep raw escapes)
| Aspect | Rules |
|---|---|
| Identifiers | ASCII: [A-Za-z_][A-Za-z0-9_]* |
| Numbers | Decimal integers/decimals; optional exponent e|E[+\-]d+. Single dot allowed once; .. is not consumed by numbers. |
| Strings | Double-quoted. Escapes: \n, \t, \r, \", \\. Unknown escapes = error. Raw newlines are accepted. |
| Comments | // to end-of-line. |
| Delimiters | ( ) { } [ ] , : ; |
| Operators | =, +, -, *, /, ->, ., .., @ |
| Keywords | true, false, if, then, else, let, rule, and, or |
The enum TokenKind, types Token/Span, functions tokenize/tokenize_iter, LineMap, and error types LexError{Kind} are part of the stable API.
Note: new TokenKind variants may be added in minor releases; avoid exhaustive match without a _ catch-all.
Lexing errors return a LexError with kind and span. Example with LineMap:
use sentience_tokenize::{tokenize, LineMap};
let src = "\"abc\\x\"";
let map = LineMap::new(src);
let err = tokenize(src).unwrap_err();
let (line, col) = map.to_line_col(err.span.start);
println!("{}:{}: {}", line, col, err.kind.as_str());1:5: invalid escape sequence- Types:
TokenKind,Token,Span,BorrowedTokenKind<'a>,BorrowedToken<'a> - Functions:
tokenize(&str) -> Result<Vec<Token>, LexError>,tokenize_iter(&str),tokenize_borrowed(&str) -> Result<Vec<BorrowedToken<'_>>, LexError> - Utilities:
LineMapfor byte→(line, col) - Errors:
LexError,LexErrorKind
use sentience_tokenize::{tokenize_iter, TokenKind};
fn main() {
for tok in tokenize_iter("let x = 1.2e-3") {
let t = tok.unwrap();
if let TokenKind::Ident(name) = &t.kind {
println!("ident: {} @{}..{}", name, t.span.start, t.span.end);
}
}
}use sentience_tokenize::{tokenize_borrowed, BorrowedTokenKind};
fn main() {
let toks = tokenize_borrowed("let x = \"hi\" 123").unwrap();
match toks[3].kind {
BorrowedTokenKind::String(s) => assert_eq!(s, "hi"),
_ => unreachable!(),
}
}Add to Cargo.toml:
[dependencies]
sentience-tokenize = "0.2.3"use sentience_tokenize::tokenize;
fn main() {
let code = r#"
// sample
let rule greet(name) = "hi, " + name
if true and false then x = 1 else x = 2;
"#;
let toks = tokenize(code).unwrap();
for t in toks {
println!("{:?} @{}..{}", t.kind, t.span.start, t.span.end);
}
}Let @18..21
Rule @22..26
Ident("greet") @27..32
LParen @32..33
Ident("name") @33..37
RParen @37..38
Eq @39..40
String("hi, ") @41..47
Plus @48..49
Ident("name") @50..54
...
cargo testcargo run --example basic
cargo run --example pretty
echo 'let rule greet(n) = "hi, " + n' | cargo run --example streamcargo benchFuzzing is supported via
cargo-fuzz (optional).
cargo bench
cargo test
cargo install cargo-fuzz
cargo fuzz run tokenize -- -runs=1000- Small, standalone lexer - no macros, no regexes.
- Useful as a foundation for parsers, DSLs, or interpreters.
- Explicit spans for better error reporting.
For more context and design motivation, see my blog post:
Designing a zero-dependency tokenizer in Rust
MIT © 2025 Nenad Bursać