MIT 6.824 style distributed systems lab rebuilt in C++. It includes a series of labs in which you will build a transactional, sharded, fault-tolerant key/value storage system (like Google Spanner, MongoDB, etc).
This project implements a Replicated State Machine using the Raft consensus algorithm. The project focuses on leader election, log replication, and fault tolerance for a distributed key-value storage system.
- Implemented leader election using
RequestVoteRPCs. - Periodically initiated elections when no leader was detected.
- Implemented heartbeats (empty
AppendEntriesRPCs) to maintain leadership. - Managed election timeouts to prevent split votes.
- Added log entries to the leader using the
Start()function and replicated them across nodes withAppendEntriesRPC. - Advanced the commit index when a majority of nodes replicated the logs.
- Handled network failures and ensured followers rejoin and synchronize with the leader.
- Maintained log consistency across nodes during disconnections.
All tests were successfully passed, including:
- Leader election and re-election after failure.
- Log replication consistency across nodes.
- Handling follower disconnections and leader recovery.
This project extends the Raft-based replicated state machine to implement a fault-tolerant key-value storage service. The service supports Put, Append, and Get operations, ensuring strong consistency across distributed servers.
- Implemented
Put(key, value),Append(key, arg), andGet(key)functions for the key-value store. - Each operation is replicated via Raft to ensure consistency across servers.
- Clients retry requests if sent to the wrong server or if the leader changes.
- Achieved strong consistency using linearizability, ensuring all clients see the same and latest state.
- Clients retry failed operations due to server changes or network issues.
- Operations proceed as long as a majority of servers are alive, even during leader re-elections.
- Handled concurrent client operations with Raft ensuring all servers execute commands in the same order.
- Integrated key-value operations into Raft’s log, ensuring all servers apply operations consistently.
Successfully passed all tests, including:
- Basic key-value operations.
- Concurrent operations with multiple clients.
- Leader re-election and continued operation with a majority of servers alive.
- Ensured progress and consistency under unreliable network conditions.
This project extends the Raft-based key-value system by implementing a sharded key-value storage system. The system partitions keys across multiple replica groups to improve performance and handles shard reconfiguration when groups join or leave.
- Implemented a shard master to manage configurations for replica groups.
- The shard master assigns shards to replica groups and supports reconfiguration via
Join,Leave,Move, andQueryRPCs. - Replica groups use Raft to replicate the key-value data for their assigned shards.
- Each replica group is responsible for a subset of shards and handles operations (
Get,Put, andAppend) for its assigned shards. - Clients query the shard master to determine which group is responsible for a particular key.
- Handled shard reconfiguration when replica groups join or leave, ensuring smooth shard migration.
- Shards are transferred between replica groups while ensuring that clients always interact with the correct group.
- Implemented polling of the shard master to detect configuration changes and trigger shard migrations.
- Ensured fault tolerance by replicating data across replica groups, allowing continued operation even if some servers fail.
- Implemented linearizability, ensuring that operations across shards are consistent and appear in a global order.
Passed all tests:
- Basic and concurrent shard operations.
- Minimal shard transfers after groups join or leave.
- Correct handling of configuration changes, shard transfers, and client requests.
A modern linux environment (e.g., Ubuntu 22.04) is recommended for the labs. If you do not have access to this, consider using a virtual machine.
Get source code:
git clone --recursive [repo_address]
Install dependencies:
sudo apt-get update
sudo apt-get install -y \
git \
pkg-config \
build-essential \
clang \
libapr1-dev libaprutil1-dev \
libboost-all-dev \
libyaml-cpp-dev \
libjemalloc-dev \
python3-dev \
python3-pip \
python3-wheel \
python3-setuptools \
libgoogle-perftools-dev
sudo pip3 install -r requirements.txt
For next steps, checkout the guidelines in the course web page.