Skip to content

hamster-storage/hamster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

193 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hamster

S3 compatible object storage in a single binary. Stuff it in, pull it out later.

hammy

Hamster is a self hosted, S3 compatible object store built around one idea: object storage should be simple to run and safe with your data, without a heavyweight distributed system or a restrictive license.

Status: early development (v0). Not production ready. The design is settled and the core is being built in the open. Please don't trust real data to Hamster yet. Star or watch the repo to follow progress toward v1.

Why Hamster

When MinIO archived its community edition in 2026 and steered users toward a commercial product, the open source S3 stores that remain split into two camps: feature-rich systems that bring real operational weight, and admirably simple ones that leave out the features regulated data can't live without. Hamster aims for the missing middle:

  • A single binary you can run anywhere — no ZooKeeper, no etcd, no external database.
  • Erasure-coded durability, so storage stays cheap without giving up safety.
  • The compliance controls simpler stores skip: versioning, object lock, and WORM retention — what retention and audit regimes (HIPAA, SEC 17a-4 territory) actually ask for.
  • A permissive Apache 2.0 license, so you can build on it without legal friction.

Design principles

  • Single binary, no external dependencies. Laptop, VPS, or cluster — nothing else to operate.
  • S3 compatible. Works with existing S3 SDKs, CLIs, and tools.
  • Durable by default. Reed–Solomon erasure coding spreads each object across independent failure domains, so you can lose drives or whole nodes without losing data.
  • Grows smoothly. Partitioned placement rebalances as you add capacity — add a node and data redistributes without reshaping the cluster.
  • Safe to upgrade. Additively versioned on-disk and on-wire formats, built for backwards-compatible, zero-downtime rolling upgrades validated by end-to-end tests.
  • Trustworthy. Durability and consistency run under a deterministic simulation harness that injects partitions, disk failures, and reordering — correctness tested, not hoped for.

Features

High level and honest: a check mark means shipped and tested, not promised. Versions beyond that are the roadmap's plan and may shift as the code pushes back. On-disk and on-wire formats may change between v0 releases.

Version Features Status
v0.1
  • Core S3 API — buckets, objects, listings, multipart, presigned URLs, SigV4 auth (verified with aws, rclone, restic, s3cmd)
  • Durable single-node store with streaming uploads
v0.2 Clustering — Raft-replicated metadata, mTLS between nodes, token-based join
v0.3 Erasure-coded durability with self-healing repair, the S3 endpoint served from the cluster
v0.4 Partitioned placement (failure-domain spread, capacity weighting) and online rebalancing — drain, replace, remove, grow, downsize — plus a continuous background scrubber that self-heals bitrot and lost shards
v0.5 Object versioning — per-bucket versioning config, version IDs, delete markers, ListObjectVersions, by-version GET/DELETE — on the single node and the cluster
v0.6 Object lock and WORM retention — GOVERNANCE and COMPLIANCE modes, legal holds, bucket default retention — on the single node and the cluster
v0.7 Encryption at rest (SSE-S3) — envelope encryption, per-object keys wrapped by a cluster master key from a pluggable source
v0.8 Key and CA rotation: master-key rewrap and CA custody/rotation — both no-downtime, metadata- or trust-only
v0.9 Zero-downtime rolling upgrades: cluster version advertisement, the health interlock (cluster can-stop), the end-to-end upgrade test suite, and the supported per-node roll
v0.10 Observability — one metrics registry rendered many ways: a Prometheus /metrics endpoint, a typed snapshot for the CLI and web console, and a durability summary on status
v0.11 One clustered path — one flat CLI (a node is a one-node cluster), S3 on every node by default, proposal forwarding so any node accepts writes, and streaming PUT / Range GET / server-side copy / erasure-coded multipart so the cluster surface is a strict superset of single-node
v0.12 Adaptive load shedding — latency-gradient concurrency limiting that sheds with 429 at the node's self-discovered capacity, request-latency histograms, and degradation (bad-drive) detection, with no OS primitives 🚧 in progress
v0.13 Web console planned
TBD TBD prior to v1 planning
v1.0 Software updates and migrations supported from v1 planned

Quick start

Grab a binary from the releases page (or go build ./cmd/hamster — no cgo, no build tricks), then found a node and serve it. The HAMSTER_* variables define the credentials it will accept:

export HAMSTER_ACCESS_KEY_ID=hamster
export HAMSTER_SECRET_ACCESS_KEY=keep-this-one-secret
hamster init -data-dir ./data    # found a one-node cluster
hamster serve -data-dir ./data   # serve the S3 API

That's a standard S3 endpoint on 127.0.0.1:9000 — any S3 client works as is. A single node is just a one-node cluster: add nodes later and the same objects spread across them, no reformat. The client sends its own credentials (the standard AWS_* variables) set to the same values:

export AWS_ACCESS_KEY_ID=hamster
export AWS_SECRET_ACCESS_KEY=keep-this-one-secret
aws --endpoint-url http://127.0.0.1:9000 s3 mb s3://stash
aws --endpoint-url http://127.0.0.1:9000 s3 cp video.mp4 s3://stash/

aws s3, rclone, restic, and s3cmd all work — a compatibility suite runs all four against every change.

Running a cluster

A cluster is Raft-replicated metadata (v0.2) plus an erasure-coded data path (v0.3): mutual TLS between nodes with zero TLS configuration, single-use join tokens, and the full S3 API on every node — streaming PUT, Range GET, server-side copy, and erasure-coded multipart — with objects spread k+m across the cluster and reconstructed from any k. Every node accepts writes: a non-leader runs the data plane locally and forwards only the small metadata commit to the leader, so object bytes never cross the leader hop.

Three terminals, sharing the credentials each node's S3 endpoint accepts:

export HAMSTER_ACCESS_KEY_ID=hamster HAMSTER_SECRET_ACCESS_KEY=keep-this-one-secret

# terminal 1 — found the cluster, serve S3 on :9000
hamster init -data-dir ./n1 -node n1 -listen 127.0.0.1:7946
hamster serve -data-dir ./n1 -s3 127.0.0.1:9000

# terminal 2 — mint a single-use token and join in one command, serve S3 on :9001
TOKEN=$(hamster token -data-dir ./n1)
hamster serve -data-dir ./n2 -node n2 -listen 127.0.0.1:7956 -token "$TOKEN" -s3 127.0.0.1:9001

# terminal 3 — same again, serve S3 on :9002
TOKEN=$(hamster token -data-dir ./n1)
hamster serve -data-dir ./n3 -node n3 -listen 127.0.0.1:7966 -token "$TOKEN" -s3 127.0.0.1:9002

hamster status -data-dir ./n1 shows every member and who leads. Point any S3 client at any node — writes land wherever they arrive (a non-leader forwards the commit) — and the data is erasure-coded across all three; kill a node and the object still reads, reconstructed from the survivors.

Operations

One way to run Hamster. A node is a one-node cluster: hamster init founds it — the CA is minted for you — and hamster serve runs it and serves S3. There is no separate single-node mode and no reformat to grow: a single node stores objects with no redundancy (its durability is one disk's), and the moment you add nodes the same objects spread across them, erasure-coded k+m, weighted by each node's capacity and spread across failure domains. The lifecycle below is online: no downtime, durability preserved throughout. A continuous background scrubber heals bitrot and lost shards on its own, before any read trips over them.

Operation How What happens
Add a node serve -token … joins as a learner, auto-promoted to voter (five-voter cap); existing data migrates onto it at its current width — no reshape
Grow into the new size optimize re-encodes existing data up to the larger cluster's profile, spreading objects written when it was smaller across the new nodes (run after adding nodes — never automatic)
Reboot for maintenance just reboot it erasure coding tolerates a node briefly down (a 4+2 object survives two); repair rebuilds whatever was written during the outage when it returns — no drain needed
Take a node out of service drain <node> new writes steer off it and its shards migrate away; reversible with undrain
Replace a node serve -token … -replaces <old> swaps a fresh node in for an existing one at the same cluster size — same profile, no re-encode
Remove a node remove <node> evicts a drained, empty node for good (its ID is tombstoned — a return needs a fresh join)
Shrink the cluster drain <node> past a profile boundary re-encodes every object down to the smaller profile (with a [y/N] showing the durability/efficiency trade), then remove
Recover from quorum loss recover rebuilds a cluster from one surviving node — the last resort

Drain is reversible (undrain) and pairs with remove to decommission — the same split as kubectl drain/uncordon and delete node. A quick reboot needs neither: the erasure coding already covers a node being briefly down. (Two voters is a valid but failure-intolerant cluster; three is the first size that survives losing one.)

Documentation

  • Glossary — the vocabulary (object, version, shard, stripe, partition, node, cluster, layout, …), grouped by layer. Start here if a term is unfamiliar.
  • Architecture — the system design narrative: request paths, metadata/data separation, erasure coding, placement, upgrades.
  • How Hamster compares — an honest map next to Ceph, the MinIO origin story, the breadth-first Apache rewrites, and simpler single-node stores — including where Hamster is behind.
  • Upgrading — the no-downtime, one-node-at-a-time rolling-upgrade procedure: can-stop, swapping the binary your own way, confirming the roll, and rollback.
  • Architecture Decision Records — one decision per file, with the reasoning and the rejected alternatives.
  • Roadmap — the v0.x and v1.0 milestones.

Contributing

Early, but contributions are welcome. Hamster is Apache 2.0 licensed, and contributions are accepted under a Developer Certificate of Origin (DCO). Sign your commits with git commit -s.

License

Apache License 2.0. See LICENSE.

About

S3 compatible object storage in a single binary, with erasure coded durability, versioning, and object lock built in.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages