A high-performance, embedded document database written from scratch in Rust.
No server. No config files. No migrations. Open a file and go.
FerrumDB is an embedded key-value database engine built in Rust, designed for applications that need fast local persistence without the overhead of a server process. It is inspired by Bitcask and implements a custom binary log format, in-memory indexing, AES-256-GCM encryption at rest, atomic transactions, and a live web dashboard β all in ~1,000 lines of safe, async Rust.
It ships Python bindings via PyO3 (pip install ferrumdb) and Node.js bindings via NAPI-RS (npm install ferrumdb).
| Feature | Detail |
|---|---|
| β‘ O(1) reads & writes | Append-only log + in-memory HashMap index rebuilt on startup |
| π Native JSON documents | Store any structured data; values are serde_json::Value |
| π Secondary indexing | O(1) field lookups via create_index() β maintained live on writes |
| π AES-256-GCM encryption | Per-block encryption with random nonces; data is protected at rest |
| βοΈ Atomic transactions | All-or-nothing batches written as a single log entry |
| β±οΈ Configurable fsync policy | Always / Periodic(ms) / Never β tune durability vs. throughput |
| π₯οΈ Ferrum Studio | Built-in web dashboard (Axum) at localhost:7474 |
| π Python bindings | pip install ferrumdb β no Rust toolchain required |
| π‘οΈ Crash resilience | Log compaction via atomic rename(); incomplete records are skipped |
| π Observability | Lock-free atomic metrics: ops/sec, uptime, GET/SET/DELETE counts |
FerrumDB was built ground-up without using an existing storage library. Every layer is custom:
βββββββββββββββββββββββββββββββββββββββββββ
β FerrumDB API β β High-level Rust & Python interface
βββββββββββββββββββββββββββββββββββββββββββ€
β StorageEngine β β Core engine: index + log management
β βββββββββββββββββββ ββββββββββββββββ β
β β In-Memory Index β β Secondary β β
β β HashMap<K,V> β β Indexes β β
β β RwLock async β β HashMap<F,V> β β
β ββββββββββ¬βββββββββ ββββββββββββββββ β
β β append / reads β
β ββββββββββΌβββββββββββββββββββββββββ β
β β Append-Only Log (AOF) β β β Bitcask-inspired binary format
β β [len: u64][JSON bytes]... β β length-prefixed, sequential
β ββββββββββ¬βββββββββββββββββββββββββ β
βββββββββββββΌββββββββββββββββββββββββββββββ€
β ββββββββββΌβββββββββββββββββββββββββ β
β β AsyncFileSystem trait β β β Pluggable I/O abstraction
β β ββββββββββββ βββββββββββββββ β β
β β β Disk β β Encrypted β β β β Decorator pattern
β β β (tokio) β β (AES-GCM) β β β random nonce per block
β β ββββββββββββ βββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββ
Key design decisions:
- Bitcask AOF: Writes are append-only (fast, sequential I/O). The in-memory index is the source of truth for reads. On startup, the engine replays the log to rebuild state β making recovery deterministic and crash-safe.
- Pluggable
AsyncFileSystemtrait: The I/O layer is fully abstracted.DiskFileSystemandEncryptedFileSystemimplement the same trait β swapped via the decorator pattern. This makes the storage engine 100% testable without touching disk. - AES-256-GCM per block: Each binary record is individually encrypted with a cryptographically random 12-byte nonce. The nonce is stored alongside the ciphertext. GCM authentication tags detect any file tampering.
- Tokio async throughout: Reads use
RwLock(many concurrent readers), writes serialize via write lock. Metrics useAtomicU64β no lock contention on the hot path. - Log compaction: A background
compact()rewrites only live (non-expired, non-deleted) records to a temp file, then swaps atomically viarename()β POSIX-atomic, no data loss possible.
| Component | Technology |
|---|---|
| Language | Rust (2021 edition) |
| Async runtime | Tokio |
| Serialization | serde + serde_json |
| Encryption | aes-gcm (AES-256-GCM) |
| Web dashboard | Axum |
| Python bindings | PyO3 (via maturin) |
| Benchmarking | Criterion |
| Testing | tokio::test + tempfile |
Benchmarked with Criterion on an append-only log with FsyncPolicy::Never (max throughput):
| Operation | Performance |
|---|---|
Single SET |
~1β3 Β΅s |
Single GET (in-memory) |
< 1 Β΅s |
1,000 sequential SETs |
~2β5 ms |
100 concurrent SETs (Tokio tasks) |
~3β8 ms |
| Secondary index query (100 docs) | < 1 Β΅s |
Run benchmarks yourself:
cargo bench
FerrumDB is available on PyPI. Install it using pip:
pip install ferrumdbfrom ferrumdb import FerrumDB
# Zero-setup: creates myapp.db if it doesn't exist
db = FerrumDB.open("myapp.db")
# Store any JSON-serializable value
db.set("user:1", '{"name": "alice", "role": "admin", "score": 99}')
db.set("user:2", '{"name": "bob", "role": "user", "score": 45}')
# Read back
print(db.get("user:1")) # {"name": "alice", "role": "admin", "score": 99}
print(db.count()) # 2
print(db.keys()) # ["user:1", "user:2"]
# Secondary indexing β O(1) field lookups
db.create_index("role")
admins = db.find("role", '"admin"') # => ["user:1"]
# Delete
db.delete("user:2")FerrumDB is available on crates.io. Add it to your project:
cargo add ferrumdb
cargo add tokio -F full
cargo add serde_jsonOr manually add to your Cargo.toml:
[dependencies]
ferrumdb = "0.1.1"
tokio = { version = "1", features = ["full"] }
serde_json = "1"use ferrumdb::{FerrumDB, Config, Transaction, FsyncPolicy};
use serde_json::json;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Standard open (zero-setup, uses ferrum.db)
let db = FerrumDB::open_default().await?;
// Store documents
db.set("user:1".into(), json!({"name": "alice", "role": "admin"})).await?;
// Secondary index query
db.create_index("role").await?;
let admins = db.find("role", &json!("admin")).await;
// Atomic transaction
let tx = Transaction::new()
.set("k1".into(), json!({"tag": "blue"}))
.set("k2".into(), json!({"tag": "red"}))
.delete("k1".into());
db.commit(tx).await?;
// Encrypted database (AES-256-GCM, random nonce per block)
let key: [u8; 32] = *b"my_super_secret_key_32_bytes_!!?";
let db_enc = FerrumDB::open(
Config::new()
.with_encryption(key)
.with_fsync_policy(FsyncPolicy::Periodic(std::time::Duration::from_millis(100)))
).await?;
Ok(())
}Ferrum Studio is a built-in web dashboard to browse, query, and inspect your database with real-time metrics.
Option 1 β Via the REPL (auto-launches when you cargo run):
cargo run --release
# π₯ Ferrum Studio β http://localhost:7474Option 2 β Standalone CLI (works with any .db file, any language):
cargo install ferrumdb-cli
ferrumdb web myapp.db # opens http://localhost:7474
ferrumdb web myapp.db --port 8080 # custom port
ferrumdb info myapp.db # show key count & file size
ferrumdb compact myapp.db # remove deleted/expired entriesThe CLI works regardless of whether you use the Rust, Python, or Node.js bindings β just point it at your .db file.
cargo run
cargo run -- --fsync=always # strongest durability| Command | Description |
|---|---|
SET <key> <json> |
Store a document |
GET <key> |
Retrieve and pretty-print |
DELETE <key> |
Remove a key |
KEYS |
List all keys |
COUNT |
Total number of entries |
INDEX <field> |
Create secondary index on JSON field |
FIND <field> <value> |
Query by indexed field |
HELP |
Show commands + live session metrics |
Full working examples for each language are in the examples/ directory:
| Example | Language | Description | Run |
|---|---|---|---|
| rust-example | Rust | Task Manager β CRUD, secondary indexes, transactions, TTL | cd examples/rust-example && cargo run |
| python-example | Python | Contact Book β CRUD, secondary indexes, transactions | cd examples/python-example && python main.py |
| node-example | Node.js | Note Taker β CRUD, secondary indexes, transactions | cd examples/node-example && node main.mjs |
Each example is self-contained and demonstrates the core FerrumDB API in its respective language.
FerrumDB optimizes for simplicity and embedded use cases. Understand the trade-offs:
| Limitation | Reason | Workaround |
|---|---|---|
| Entire index in RAM | O(1) reads require full HashMap in memory |
Best for databases < 1 GB |
| Single-writer only | Append-only log has no cross-process lock protocol | One process per DB file |
| No range queries | Secondary indexes store exact value matches | Use Tantivy for range scans |
| No nested field indexes | Indexes only top-level JSON keys | Flatten documents before storing |
| Blocking compaction | Rewrites entire log β hold write lock | Schedule during low-traffic |
| No WAL / MVCC | Simpler append-only design | Accept occasional contention |
| No replication | Single-file, embedded design | Handle replication at app level |
Best for: local-first apps, desktop tools, embedded caching, session/config stores, write-heavy workloads.
Not for: large datasets (> 1 GB), complex queries (JOINs, aggregations), multi-writer or distributed scenarios.
set FERRUMDB_FSYNC=always # sync every write (safest)
set FERRUMDB_FSYNC=never # never sync (fastest)
set FERRUMDB_FSYNC=periodic:200 # sync every 200mslet db = FerrumDB::open_from_env().await?;See CHANGELOG.md for a full list of changes per version.
MIT β see LICENSE for details.
Built with π¦ by Muhammad Usman