Skip to content

MuhammadUsmanGM/FerrumDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

37 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

⚑ FerrumDB

A high-performance, embedded document database written from scratch in Rust.
No server. No config files. No migrations. Open a file and go.


What is FerrumDB?

FerrumDB is an embedded key-value database engine built in Rust, designed for applications that need fast local persistence without the overhead of a server process. It is inspired by Bitcask and implements a custom binary log format, in-memory indexing, AES-256-GCM encryption at rest, atomic transactions, and a live web dashboard β€” all in ~1,000 lines of safe, async Rust.

It ships Python bindings via PyO3 (pip install ferrumdb) and Node.js bindings via NAPI-RS (npm install ferrumdb).


🌟 Features

Feature Detail
⚑ O(1) reads & writes Append-only log + in-memory HashMap index rebuilt on startup
πŸ“„ Native JSON documents Store any structured data; values are serde_json::Value
πŸ” Secondary indexing O(1) field lookups via create_index() β€” maintained live on writes
πŸ” AES-256-GCM encryption Per-block encryption with random nonces; data is protected at rest
βš›οΈ Atomic transactions All-or-nothing batches written as a single log entry
⏱️ Configurable fsync policy Always / Periodic(ms) / Never β€” tune durability vs. throughput
πŸ–₯️ Ferrum Studio Built-in web dashboard (Axum) at localhost:7474
🐍 Python bindings pip install ferrumdb β€” no Rust toolchain required
πŸ›‘οΈ Crash resilience Log compaction via atomic rename(); incomplete records are skipped
πŸ“Š Observability Lock-free atomic metrics: ops/sec, uptime, GET/SET/DELETE counts

πŸ—οΈ Architecture

FerrumDB was built ground-up without using an existing storage library. Every layer is custom:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                FerrumDB API              β”‚  ← High-level Rust & Python interface
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚             StorageEngine               β”‚  ← Core engine: index + log management
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  In-Memory Index β”‚  β”‚ Secondary    β”‚  β”‚
β”‚  β”‚  HashMap<K,V>   β”‚  β”‚ Indexes      β”‚  β”‚
β”‚  β”‚  RwLock async   β”‚  β”‚ HashMap<F,V> β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚           β”‚ append / reads              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚   Append-Only Log (AOF)         β”‚    β”‚  ← Bitcask-inspired binary format
β”‚  β”‚   [len: u64][JSON bytes]...     β”‚    β”‚     length-prefixed, sequential
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚  AsyncFileSystem trait          β”‚    β”‚  ← Pluggable I/O abstraction
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚    β”‚
β”‚  β”‚  β”‚   Disk   β”‚  β”‚  Encrypted  β”‚  β”‚    β”‚  ← Decorator pattern
β”‚  β”‚  β”‚  (tokio) β”‚  β”‚  (AES-GCM)  β”‚  β”‚    β”‚     random nonce per block
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key design decisions:

  • Bitcask AOF: Writes are append-only (fast, sequential I/O). The in-memory index is the source of truth for reads. On startup, the engine replays the log to rebuild state β€” making recovery deterministic and crash-safe.
  • Pluggable AsyncFileSystem trait: The I/O layer is fully abstracted. DiskFileSystem and EncryptedFileSystem implement the same trait β€” swapped via the decorator pattern. This makes the storage engine 100% testable without touching disk.
  • AES-256-GCM per block: Each binary record is individually encrypted with a cryptographically random 12-byte nonce. The nonce is stored alongside the ciphertext. GCM authentication tags detect any file tampering.
  • Tokio async throughout: Reads use RwLock (many concurrent readers), writes serialize via write lock. Metrics use AtomicU64 β€” no lock contention on the hot path.
  • Log compaction: A background compact() rewrites only live (non-expired, non-deleted) records to a temp file, then swaps atomically via rename() β€” POSIX-atomic, no data loss possible.

βš™οΈ Technical Stack

Component Technology
Language Rust (2021 edition)
Async runtime Tokio
Serialization serde + serde_json
Encryption aes-gcm (AES-256-GCM)
Web dashboard Axum
Python bindings PyO3 (via maturin)
Benchmarking Criterion
Testing tokio::test + tempfile

πŸ“Š Performance

Benchmarked with Criterion on an append-only log with FsyncPolicy::Never (max throughput):

Operation Performance
Single SET ~1–3 Β΅s
Single GET (in-memory) < 1 Β΅s
1,000 sequential SETs ~2–5 ms
100 concurrent SETs (Tokio tasks) ~3–8 ms
Secondary index query (100 docs) < 1 Β΅s

Run benchmarks yourself: cargo bench


🐍 Python Installation & Usage

FerrumDB is available on PyPI. Install it using pip:

pip install ferrumdb
from ferrumdb import FerrumDB

# Zero-setup: creates myapp.db if it doesn't exist
db = FerrumDB.open("myapp.db")

# Store any JSON-serializable value
db.set("user:1", '{"name": "alice", "role": "admin", "score": 99}')
db.set("user:2", '{"name": "bob",   "role": "user",  "score": 45}')

# Read back
print(db.get("user:1"))       # {"name": "alice", "role": "admin", "score": 99}
print(db.count())             # 2
print(db.keys())              # ["user:1", "user:2"]

# Secondary indexing β€” O(1) field lookups
db.create_index("role")
admins = db.find("role", '"admin"')   # => ["user:1"]

# Delete
db.delete("user:2")

πŸ¦€ Rust Installation & Usage

FerrumDB is available on crates.io. Add it to your project:

cargo add ferrumdb
cargo add tokio -F full
cargo add serde_json

Or manually add to your Cargo.toml:

[dependencies]
ferrumdb = "0.1.1"
tokio = { version = "1", features = ["full"] }
serde_json = "1"
use ferrumdb::{FerrumDB, Config, Transaction, FsyncPolicy};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Standard open (zero-setup, uses ferrum.db)
    let db = FerrumDB::open_default().await?;

    // Store documents
    db.set("user:1".into(), json!({"name": "alice", "role": "admin"})).await?;

    // Secondary index query
    db.create_index("role").await?;
    let admins = db.find("role", &json!("admin")).await;

    // Atomic transaction
    let tx = Transaction::new()
        .set("k1".into(), json!({"tag": "blue"}))
        .set("k2".into(), json!({"tag": "red"}))
        .delete("k1".into());
    db.commit(tx).await?;

    // Encrypted database (AES-256-GCM, random nonce per block)
    let key: [u8; 32] = *b"my_super_secret_key_32_bytes_!!?";
    let db_enc = FerrumDB::open(
        Config::new()
            .with_encryption(key)
            .with_fsync_policy(FsyncPolicy::Periodic(std::time::Duration::from_millis(100)))
    ).await?;

    Ok(())
}

πŸ–₯️ Ferrum Studio

Ferrum Studio is a built-in web dashboard to browse, query, and inspect your database with real-time metrics.

Option 1 β€” Via the REPL (auto-launches when you cargo run):

cargo run --release
# πŸ”₯ Ferrum Studio β†’ http://localhost:7474

Option 2 β€” Standalone CLI (works with any .db file, any language):

cargo install ferrumdb-cli
ferrumdb web myapp.db              # opens http://localhost:7474
ferrumdb web myapp.db --port 8080  # custom port
ferrumdb info myapp.db             # show key count & file size
ferrumdb compact myapp.db          # remove deleted/expired entries

The CLI works regardless of whether you use the Rust, Python, or Node.js bindings β€” just point it at your .db file.


πŸ–₯️ CLI REPL

cargo run
cargo run -- --fsync=always   # strongest durability
Command Description
SET <key> <json> Store a document
GET <key> Retrieve and pretty-print
DELETE <key> Remove a key
KEYS List all keys
COUNT Total number of entries
INDEX <field> Create secondary index on JSON field
FIND <field> <value> Query by indexed field
HELP Show commands + live session metrics

πŸ“‚ Examples

Full working examples for each language are in the examples/ directory:

Example Language Description Run
rust-example Rust Task Manager β€” CRUD, secondary indexes, transactions, TTL cd examples/rust-example && cargo run
python-example Python Contact Book β€” CRUD, secondary indexes, transactions cd examples/python-example && python main.py
node-example Node.js Note Taker β€” CRUD, secondary indexes, transactions cd examples/node-example && node main.mjs

Each example is self-contained and demonstrates the core FerrumDB API in its respective language.


⚠️ Known Limitations

FerrumDB optimizes for simplicity and embedded use cases. Understand the trade-offs:

Limitation Reason Workaround
Entire index in RAM O(1) reads require full HashMap in memory Best for databases < 1 GB
Single-writer only Append-only log has no cross-process lock protocol One process per DB file
No range queries Secondary indexes store exact value matches Use Tantivy for range scans
No nested field indexes Indexes only top-level JSON keys Flatten documents before storing
Blocking compaction Rewrites entire log β€” hold write lock Schedule during low-traffic
No WAL / MVCC Simpler append-only design Accept occasional contention
No replication Single-file, embedded design Handle replication at app level

Best for: local-first apps, desktop tools, embedded caching, session/config stores, write-heavy workloads.

Not for: large datasets (> 1 GB), complex queries (JOINs, aggregations), multi-writer or distributed scenarios.


Environment Config

set FERRUMDB_FSYNC=always        # sync every write (safest)
set FERRUMDB_FSYNC=never         # never sync (fastest)
set FERRUMDB_FSYNC=periodic:200  # sync every 200ms
let db = FerrumDB::open_from_env().await?;

πŸ“‹ Changelog

See CHANGELOG.md for a full list of changes per version.


πŸ“ License

MIT β€” see LICENSE for details.

Built with πŸ¦€ by Muhammad Usman

About

A high-performance embedded key-value database written in Rust with Python & Node.js bindings - featuring encryption, secondary indexes, atomic transactions, and a web dashboard.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors