S3 compatible object storage in a single binary. Stuff it in, pull it out later.
Hamster is a self hosted, S3 compatible object store built around one idea: object storage should be simple to run and safe with your data, without a heavyweight distributed system or a restrictive license.
Status: early development (v0). Not production ready. The design is settled and the core is being built in the open. Please don't trust real data to Hamster yet. Star or watch the repo to follow progress toward v1.
When MinIO archived its community edition in 2026 and steered users toward a commercial product, the open source S3 stores that remain split into two camps: feature-rich systems that bring real operational weight, and admirably simple ones that leave out the features regulated data can't live without. Hamster aims for the missing middle:
- A single binary you can run anywhere — no ZooKeeper, no etcd, no external database.
- Erasure-coded durability, so storage stays cheap without giving up safety.
- The compliance controls simpler stores skip: versioning, object lock, and WORM retention — what retention and audit regimes (HIPAA, SEC 17a-4 territory) actually ask for.
- A permissive Apache 2.0 license, so you can build on it without legal friction.
- Single binary, no external dependencies. Laptop, VPS, or cluster — nothing else to operate.
- S3 compatible. Works with existing S3 SDKs, CLIs, and tools.
- Durable by default. Reed–Solomon erasure coding spreads each object across independent failure domains, so you can lose drives or whole nodes without losing data.
- Grows smoothly. Partitioned placement rebalances as you add capacity — add a node and data redistributes without reshaping the cluster.
- Safe to upgrade. Additively versioned on-disk and on-wire formats, built for backwards-compatible, zero-downtime rolling upgrades validated by end-to-end tests.
- Trustworthy. Durability and consistency run under a deterministic simulation harness that injects partitions, disk failures, and reordering — correctness tested, not hoped for.
High level and honest: a check mark means shipped and tested, not promised. Versions beyond that are the roadmap's plan and may shift as the code pushes back. On-disk and on-wire formats may change between v0 releases.
| Version | Features | Status |
|---|---|---|
| v0.1 |
|
✅ |
| v0.2 | Clustering — Raft-replicated metadata, mTLS between nodes, token-based join | ✅ |
| v0.3 | Erasure-coded durability with self-healing repair, the S3 endpoint served from the cluster | ✅ |
| v0.4 | Partitioned placement (failure-domain spread, capacity weighting) and online rebalancing — drain, replace, remove, grow, downsize — plus a continuous background scrubber that self-heals bitrot and lost shards | ✅ |
| v0.5 | Object versioning — per-bucket versioning config, version IDs, delete markers, ListObjectVersions, by-version GET/DELETE — on the single node and the cluster |
✅ |
| v0.6 | Object lock and WORM retention — GOVERNANCE and COMPLIANCE modes, legal holds, bucket default retention — on the single node and the cluster | ✅ |
| v0.7 | Encryption at rest (SSE-S3) — envelope encryption, per-object keys wrapped by a cluster master key from a pluggable source | ✅ |
| v0.8 | Key and CA rotation: master-key rewrap and CA custody/rotation — both no-downtime, metadata- or trust-only | ✅ |
| v0.9 | Zero-downtime rolling upgrades: cluster version advertisement, the health interlock (cluster can-stop), the end-to-end upgrade test suite, and the supported per-node roll |
✅ |
| v0.10 | Observability — one metrics registry rendered many ways: a Prometheus /metrics endpoint, a typed snapshot for the CLI and web console, and a durability summary on status |
✅ |
| v0.11 | One clustered path — one flat CLI (a node is a one-node cluster), S3 on every node by default, proposal forwarding so any node accepts writes, and streaming PUT / Range GET / server-side copy / erasure-coded multipart so the cluster surface is a strict superset of single-node | ✅ |
| v0.12 | Adaptive load shedding — latency-gradient concurrency limiting that sheds with 429 at the node's self-discovered capacity, request-latency histograms, and degradation (bad-drive) detection, with no OS primitives | 🚧 in progress |
| v0.13 | Web console | planned |
| TBD | TBD prior to v1 | planning |
| v1.0 | Software updates and migrations supported from v1 | planned |
Grab a binary from the releases page (or go build ./cmd/hamster — no cgo, no build tricks), then found a node and serve it. The HAMSTER_* variables define the credentials it will accept:
export HAMSTER_ACCESS_KEY_ID=hamster
export HAMSTER_SECRET_ACCESS_KEY=keep-this-one-secret
hamster init -data-dir ./data # found a one-node cluster
hamster serve -data-dir ./data # serve the S3 APIThat's a standard S3 endpoint on 127.0.0.1:9000 — any S3 client works as is. A single node is just a one-node cluster: add nodes later and the same objects spread across them, no reformat. The client sends its own credentials (the standard AWS_* variables) set to the same values:
export AWS_ACCESS_KEY_ID=hamster
export AWS_SECRET_ACCESS_KEY=keep-this-one-secret
aws --endpoint-url http://127.0.0.1:9000 s3 mb s3://stash
aws --endpoint-url http://127.0.0.1:9000 s3 cp video.mp4 s3://stash/aws s3, rclone, restic, and s3cmd all work — a compatibility suite runs all four against every change.
A cluster is Raft-replicated metadata (v0.2) plus an erasure-coded data path (v0.3): mutual TLS between nodes with zero TLS configuration, single-use join tokens, and the full S3 API on every node — streaming PUT, Range GET, server-side copy, and erasure-coded multipart — with objects spread k+m across the cluster and reconstructed from any k. Every node accepts writes: a non-leader runs the data plane locally and forwards only the small metadata commit to the leader, so object bytes never cross the leader hop.
Three terminals, sharing the credentials each node's S3 endpoint accepts:
export HAMSTER_ACCESS_KEY_ID=hamster HAMSTER_SECRET_ACCESS_KEY=keep-this-one-secret
# terminal 1 — found the cluster, serve S3 on :9000
hamster init -data-dir ./n1 -node n1 -listen 127.0.0.1:7946
hamster serve -data-dir ./n1 -s3 127.0.0.1:9000
# terminal 2 — mint a single-use token and join in one command, serve S3 on :9001
TOKEN=$(hamster token -data-dir ./n1)
hamster serve -data-dir ./n2 -node n2 -listen 127.0.0.1:7956 -token "$TOKEN" -s3 127.0.0.1:9001
# terminal 3 — same again, serve S3 on :9002
TOKEN=$(hamster token -data-dir ./n1)
hamster serve -data-dir ./n3 -node n3 -listen 127.0.0.1:7966 -token "$TOKEN" -s3 127.0.0.1:9002hamster status -data-dir ./n1 shows every member and who leads. Point any S3 client at any node — writes land wherever they arrive (a non-leader forwards the commit) — and the data is erasure-coded across all three; kill a node and the object still reads, reconstructed from the survivors.
One way to run Hamster. A node is a one-node cluster: hamster init founds it — the CA is minted for you — and hamster serve runs it and serves S3. There is no separate single-node mode and no reformat to grow: a single node stores objects with no redundancy (its durability is one disk's), and the moment you add nodes the same objects spread across them, erasure-coded k+m, weighted by each node's capacity and spread across failure domains. The lifecycle below is online: no downtime, durability preserved throughout. A continuous background scrubber heals bitrot and lost shards on its own, before any read trips over them.
| Operation | How | What happens |
|---|---|---|
| Add a node | serve -token … |
joins as a learner, auto-promoted to voter (five-voter cap); existing data migrates onto it at its current width — no reshape |
| Grow into the new size | optimize |
re-encodes existing data up to the larger cluster's profile, spreading objects written when it was smaller across the new nodes (run after adding nodes — never automatic) |
| Reboot for maintenance | just reboot it | erasure coding tolerates a node briefly down (a 4+2 object survives two); repair rebuilds whatever was written during the outage when it returns — no drain needed |
| Take a node out of service | drain <node> |
new writes steer off it and its shards migrate away; reversible with undrain |
| Replace a node | serve -token … -replaces <old> |
swaps a fresh node in for an existing one at the same cluster size — same profile, no re-encode |
| Remove a node | remove <node> |
evicts a drained, empty node for good (its ID is tombstoned — a return needs a fresh join) |
| Shrink the cluster | drain <node> past a profile boundary |
re-encodes every object down to the smaller profile (with a [y/N] showing the durability/efficiency trade), then remove |
| Recover from quorum loss | recover |
rebuilds a cluster from one surviving node — the last resort |
Drain is reversible (undrain) and pairs with remove to decommission — the same split as kubectl drain/uncordon and delete node. A quick reboot needs neither: the erasure coding already covers a node being briefly down. (Two voters is a valid but failure-intolerant cluster; three is the first size that survives losing one.)
- Glossary — the vocabulary (object, version, shard, stripe, partition, node, cluster, layout, …), grouped by layer. Start here if a term is unfamiliar.
- Architecture — the system design narrative: request paths, metadata/data separation, erasure coding, placement, upgrades.
- How Hamster compares — an honest map next to Ceph, the MinIO origin story, the breadth-first Apache rewrites, and simpler single-node stores — including where Hamster is behind.
- Upgrading — the no-downtime, one-node-at-a-time rolling-upgrade procedure:
can-stop, swapping the binary your own way, confirming the roll, and rollback. - Architecture Decision Records — one decision per file, with the reasoning and the rejected alternatives.
- Roadmap — the v0.x and v1.0 milestones.
Early, but contributions are welcome. Hamster is Apache 2.0 licensed, and contributions are accepted under a Developer Certificate of Origin (DCO). Sign your commits with git commit -s.
Apache License 2.0. See LICENSE.