A fast, accurate CLI tool for counting tokens in LLM model inputs
token-count is a POSIX-style command-line tool that counts tokens for various LLM models. It supports exact tokenization for OpenAI and Google Gemini models (offline), and adaptive estimation for Claude models (with optional API mode for exact counts). Pipe any text in, get token counts out—fast, offline, and accurate.
# OpenAI models (exact, offline)
echo "Hello world" | token-count --model gpt-4
3
# Google Gemini models (exact, offline)
echo 'Hello, Gemini!' | token-count --model gemini
5
# Claude models (estimation, offline)
echo 'Hello, Claude!' | token-count --model claude
4
# From file
token-count --model gpt-4 < document.txt
1842
# With context info
cat prompt.txt | token-count --model claude-sonnet-4-6 -v
Model: claude-sonnet-4-6 (anthropic-claude)
Tokens: 142
Context window: 1000000 tokens (0.0142% used)✅ Accurate - Exact tokenization for OpenAI and Google Gemini, adaptive estimation for Claude
✅ Fast - Optimized for speed with embedded tokenizers
✅ Offline - Zero runtime dependencies for OpenAI and Gemini; optional API for Claude
✅ Simple - POSIX-style interface, works like wc or grep
Linux / macOS:
curl -sSfL https://raw.githubusercontent.com/shaunburdick/token-count/main/install.sh | bashHomebrew (macOS / Linux):
brew install shaunburdick/tap/token-countCargo (All Platforms):
cargo install token-countManual Download:
Download pre-built binaries from GitHub Releases.
For detailed installation instructions, troubleshooting, and platform-specific guidance, see INSTALL.md.
- Platform: Linux x86_64, macOS (Intel/Apple Silicon), Windows x86_64
- Runtime: No dependencies (static binary)
- Build from source: Rust 1.86.0 or later, CMake 3.10+ (for gemini-tokenizer SentencePiece dependency)
# Default model (gpt-3.5-turbo)
echo "Hello world" | token-count
3
# Specific model
echo "Hello world" | token-count --model gpt-4
3
# From file
token-count --model gpt-4 < input.txt
1842
# Piped from another command
cat README.md | token-count --model gpt-4o
3521# Use canonical name
token-count --model gpt-4 < input.txt
# Use alias (case-insensitive)
token-count --model gpt4 < input.txt
token-count --model GPT-4 < input.txt
# With provider prefix
token-count --model openai/gpt-4 < input.txt# Level 0 (default) - just the token count
echo "Hello world" | token-count
2
# Level 1 (-v) - model info and token count
echo "Hello world" | token-count -v
Model: gpt-3.5-turbo (cl100k_base)
Tokens: 2
# Level 2 (-vv) - add context window usage percentage
echo "Hello world" | token-count -vv
Model: gpt-3.5-turbo (cl100k_base)
Tokens: 2
Context window: 16385 tokens (0.0122% used)
# Level 3 (-vvv) - add token IDs and decoded text (debug mode)
echo "Hello world" | token-count -vvv
Model: gpt-3.5-turbo (cl100k_base)
Tokens: 2
Context window: 16385 tokens (0.0122% used)
Token IDs: [9906, 1917]
Decoded tokens:
[0] 9906 → "Hello"
[1] 1917 → " world"Debug Mode Features (-vvv):
- Shows token IDs for the first 10 tokens
- Displays decoded text for each token
- Works with OpenAI and Gemini models (exact tokenization)
- Claude models show "estimation-based" message (no real token IDs)
- Input size limited to 50KB in debug mode (prevents stack overflow)
- Larger inputs gracefully degrade with a warning message
# List all supported models
token-count --list-models
# Output:
# Supported models:
#
# gpt-3.5-turbo
# Encoding: cl100k_base
# Context window: 16385 tokens
# Aliases: gpt-3.5, gpt35, gpt-35-turbo, openai/gpt-3.5-turbo
#
# gpt-4
# Encoding: cl100k_base
# Context window: 128000 tokens
# Aliases: gpt4, openai/gpt-4
# ...# Show help
token-count --help
# Show version
token-count --version| Model | Encoding | Context Window | Aliases |
|---|---|---|---|
| gpt-3.5-turbo | cl100k_base | 16,385 | gpt-3.5, gpt35, gpt-35-turbo |
| gpt-4 | cl100k_base | 128,000 | gpt4 |
| gpt-4-turbo | cl100k_base | 128,000 | gpt4-turbo, gpt-4turbo |
| gpt-4o | o200k_base | 128,000 | gpt4o |
| Model | Context Window | Aliases | Estimation Mode |
|---|---|---|---|
| claude-opus-4-6 | 1,000,000 | opus, opus-4-6, opus-4.6 | ±10% accuracy |
| claude-sonnet-4-6 | 1,000,000 | claude, sonnet, sonnet-4-6, sonnet-4.6 | ±10% accuracy |
| claude-haiku-4-5 | 200,000 | haiku, haiku-4-5, haiku-4.5 | ±10% accuracy |
| Model | Encoding | Context Window | Aliases |
|---|---|---|---|
| gemini-2.5-pro | gemini-gemma3 | 1,000,000 | gemini-pro, gemini-2-pro, gemini-2.5 |
| gemini-2.5-flash | gemini-gemma3 | 1,000,000 | gemini, gemini-flash, gemini-2-flash |
| gemini-2.5-flash-lite | gemini-gemma3 | 1,000,000 | gemini-lite, gemini-2-lite, gemini-2.5-lite |
| gemini-3-pro-preview | gemini-gemma3 | 1,000,000 | gemini-3-pro, gemini-3 |
Note: The gemini alias defaults to gemini-2.5-flash, the recommended general-purpose model.
Claude Tokenization Modes:
Offline Estimation (Default) - No API key needed:
# Fast offline estimation using adaptive content-type detection
echo 'Hello, Claude!' | token-count --model claude
4Exact API Mode (Optional) - Requires ANTHROPIC_API_KEY:
# Exact count via Anthropic API (requires consent)
export ANTHROPIC_API_KEY="sk-ant-..."
echo 'Hello, Claude!' | token-count --model claude --accurate
# Prompts: "This will send your input to Anthropic's API... Proceed? (y/N)"
# Output: 8
# Skip prompt for automation
cat file.txt | token-count --model claude --accurate -yHow Claude Estimation Works:
- Detects content type (code vs. prose) using punctuation and keyword analysis
- Code: 3.0 chars/token (lots of
{}[]();and keywords) - Prose: 4.5 chars/token (natural language)
- Mixed: 3.75 chars/token (markdown + code blocks)
- Target: ±10% accuracy for typical inputs
All models support:
- Case-insensitive names (e.g.,
GPT-4,gpt-4,Gpt-4,GEMINI) - Provider prefix (e.g.,
openai/gpt-4,anthropic/claude-sonnet-4-6,google/gemini)
token-count provides helpful error messages with suggestions:
# Unknown model with fuzzy suggestions
$ echo "test" | token-count --model gpt5
Error: Unknown model: 'gpt5'. Did you mean: gpt-4, gpt-4o?
# Typo correction
$ echo "test" | token-count --model gpt4-tubro
Error: Unknown model: 'gpt4-tubro'. Did you mean: gpt-4-turbo?
# Invalid UTF-8
$ token-count < invalid.bin
Error: Input contains invalid UTF-8 at byte 00- Success1- I/O error or invalid UTF-82- Unknown model name
# Clone repository
git clone https://github.com/shaunburdick/token-count
cd token-count
# Run tests
cargo test
# Run benchmarks
cargo bench
# Build release binary
cargo build --release
# Check code quality
cargo clippy -- -D warnings
cargo fmt --check
# Security audit
cargo audit# All tests (181 tests)
cargo test
# Specific test suite
cargo test --test model_aliases
cargo test --test verbosity
cargo test --test performance
# With output
cargo test -- --nocapturetoken-count/
├── src/
│ ├── lib.rs # Public library API
│ ├── main.rs # Binary entry point
│ ├── cli/ # CLI argument parsing
│ │ ├── args.rs # Clap definitions
│ │ ├── input.rs # Stdin reading
│ │ └── mod.rs
│ ├── tokenizers/ # Tokenization engine
│ │ ├── openai.rs # OpenAI tokenizer (tiktoken)
│ │ ├── claude/ # Claude tokenizer
│ │ │ ├── mod.rs # Main tokenizer
│ │ │ ├── estimation.rs # Adaptive estimation
│ │ │ ├── api_client.rs # Anthropic API
│ │ │ └── models.rs # Model definitions
│ │ ├── registry.rs # Model registry
│ │ └── mod.rs
│ ├── api/ # API utilities
│ │ ├── consent.rs # Interactive consent prompt
│ │ └── mod.rs
│ ├── output/ # Output formatters
│ │ ├── simple.rs # Simple formatter (level 0)
│ │ ├── basic.rs # Basic formatter (level 1)
│ │ ├── verbose.rs # Verbose formatter (level 2)
│ │ ├── debug.rs # Debug formatter (level 3+)
│ │ └── mod.rs
│ └── error.rs # Error types
├── tests/ # Integration tests
│ ├── fixtures/ # Test data
│ ├── model_aliases.rs
│ ├── verbosity.rs
│ ├── performance.rs
│ ├── error_handling.rs
│ ├── end_to_end.rs
│ ├── claude_estimation.rs # Claude estimation tests
│ ├── claude_api.rs # Claude API tests
│ └── ...
├── benches/ # Performance benchmarks
│ └── tokenization.rs
└── .github/
└── workflows/
└── ci.yml # CI configuration
- Maximum input size: 100MB per invocation
- Debug mode input limit: 50KB (for
-vvvflag with token ID display) - Memory usage: Typically <100MB, peaks at ~2x input size
- CPU usage: Single-threaded, 100% of one core during processing
Stack Overflow with Large Inputs in Debug Mode: The underlying tiktoken-rs library can experience stack overflow when processing large inputs in debug mode (-vvv flag). To prevent crashes, debug mode automatically limits input size to 50KB and gracefully degrades to token-count-only mode for larger inputs.
- Mitigation: 50KB input size limit in debug mode with user-friendly warning
- Impact: Only affects
-vvvflag; normal tokenization works fine with large files - Status: Protection implemented; tracked upstream in tiktoken-rs (#327, #245, #400)
For CI/CD Pipelines:
# Limit concurrent processes to avoid resource exhaustion
ulimit -n 1024 # Limit file descriptors
ulimit -v $((500 * 1024)) # Limit virtual memory to 500MB
echo "text" | token-count --model gpt-4For Untrusted Input:
# Use timeout to prevent hangs
timeout 30s token-count --model gpt-4 < input.txtFor Large Files:
# Monitor memory usage
/usr/bin/time -v token-count --model gpt-4 < large-file.txt- Last audit: 2026-03-14
- Findings: 0 critical, 0 high, 0 medium vulnerabilities
- Dependencies: Audited with
cargo audit
Run security checks:
cargo audit # Check for known vulnerabilities
cargo clippy -- -D warnings # Strict lintingIf you discover a security vulnerability, please email hello@burdick.dev (or open a private security advisory on GitHub). Do not open public issues for security concerns.
From our Constitution:
- POSIX Simplicity - Behaves like standard Unix utilities
- Accuracy Over Speed - Exact tokenization for supported models
- Zero Runtime Dependencies - Single offline binary
- Fail Fast with Clear Errors - No silent failures
- Semantic Versioning - Predictable upgrade paths
- Language: Rust 1.86.0+ (stable)
- CLI Parsing: clap 4.6.0+ (derive API)
- Tokenization:
- tiktoken-rs 0.9.1+ (OpenAI models - offline)
- Adaptive estimation algorithm (Claude models - offline)
- Anthropic API via reqwest 0.12+ (Claude accurate mode - optional)
- Async Runtime: tokio 1.0+ (for API calls)
- Error Handling: anyhow 1.0.102+, thiserror 1.0+
- Fuzzy Matching: strsim 0.11+ (Levenshtein distance)
- Testing: 181 tests with criterion benchmarks
- Library-first design: Core logic in
lib.rs, thin binary wrapper - Trait-based abstractions: Extensible for future tokenizers
- Strategy pattern: Multiple output formatters
- Registry pattern: Model configuration with lazy initialization
- Streaming support: 64KB chunks for large inputs
Contributions are welcome! This project follows specification-driven development.
See CONTRIBUTING.md for detailed instructions.
Quick start:
git clone https://github.com/shaunburdick/token-count
cd token-count
cargo test
cargo clippy- No disabled lint rules - Fix code to comply, don't silence warnings
- 100% type safety - No
anytypes or suppressions - All public APIs documented - With examples
- Test coverage - All user stories covered
- Zero clippy warnings - Strict linting enforced
MIT License - see LICENSE for details.
Built with:
- tiktoken-rs - Rust tiktoken implementation
- clap - Command line argument parser
- spec-kit - Specification-driven development
Special thanks to:
- OpenAI for open-sourcing tiktoken
- The Rust community for excellent tooling
Status: ✅ v0.4.0 Complete (Debug Mode) | Version: 0.4.0
Author: Shaun Burdick