Raft Consensus

Overview

You may see this lab referred to as P3 or Project 3 for historical reasons. The goal of this lab is to implement two parts of the Raft consensus protocol in Java. You can find a detailed discussion of Raft in this paper and in the lecture slides. Use these discussions to guide your implementation. In particular, Figure 2 in the Raft paper will be an invaluable guide to completing the project.

In the first part of the project, you implement the leader election protocol. In the second part, you implement the log repair protocol. Note that there are no clients, so you do not have to handle new user requests. All actions are present in the set of initial server logs on each of the replicas.

Background and setup

Code. To complete the project, add code to RaftNode.java to implement the Raft replica behavior, and test it with the supplied test programs and other scenarios of your own. Just fill in RaftNode.java: do not modify any other files. Why? Because the AG grades your RaftNode.java under its own test scenarios: it ignores any of your code outside of RaftNode.java.

Test scenarios. Each AG test scenario calls the RaftNode constructor to instantiate a RaftNode object for each replica in the scenario, in the same way as the supplied test programs. All replica objects run with the test program in a single-threaded process. The test harness starts your RaftNode replicas in any mode of its choosing according to the scenario. The replicas communicate through the test harness, which simulates failures by selectively blocking messages. The test harness interrogates the RaftNode.get* methods to check that your replicas function correctly, so your RaftNode must implement those as well.

Three event handlers. A RaftNode must implement handler methods for three kinds of input events: RequestVotes requests from other replicas acting as candidate, AppendEntry requests from other replicas acting as leader, and timeouts from an internal timer. These inputs are generated by the test harness. How you handle these inputs depends on the RaftNode's current mode, which specifies whether the replica is acting as a candidate, follower, or leader.

Timers and heartbeats. Your RaftNode may invoke Timer.resetTimer() at any time to reset its timer and schedule the next timeout event. For example, a leader must ensure that its followers know it is up and running: it sends out periodic AppendEntry requests (heartbeats), even if there are no new log entries to send. You also use the timer in follower mode to detect a lack of heartbeats from the leader.

RPC rounds and responses. A RaftNode in leader mode or candidate mode uses its AppendEntryRequester and VoteRequester objects to send RPCs and retrieve the responses. Each Requester object has a method to send an RPC of its type to a specific receiver, and methods to retrieve or query received responses for RPCs that were sent during a specified term. Leaders and candidates implement group message rounds by sending an RPC to each of its peers, waiting for one timeout, and then calling the Requester object to check (poll) the responses. Polling for responses via a timer is not the most efficient approach, but it simplifies the code.

Switching replica modes. Your RaftNode must also switch replica modes correctly. For example, if the replica is a follower and the leader is silent (or appears to be, i.e., its heartbeats are lost in the network), then the replica switches to candidate mode when its timer fires. A replica in leader mode steps down and switches to follower mode if it learns of a peer with a higher term. Any replica can learn of a peer with a higher term via an incoming request or a response that rejects an RPC request.

Task 1: Leader election

The first part of th project is to implement the leader-election protocol.

When a server first becomes a candidate, it increments its term, and sends a requestVotes RPC to its peers. Check periodically (in response to timer firings) to see if a majority of replicas (including itself) accepts the candidate as leader. If a replica wins a majority, then it transitions to leader mode.

Of course, elections should not last forever, and if a candidate cannot collect a majority of votes before an election-timeout interval, then the candidate should increment its term and call another election. On the other hand, if a candidate receives a message (a response or an incoming RPC call) from a server with a term that is equal to or larger than its own, then it should cancel its election and transition to follower mode.

Task 2: Log repair

Once your servers have chosen a leader, they execute the log repair protocol. All server logs should converge to the same state by the end of a scenario simulation. More precisely, all servers agree on a stable prefix of their logs, and the length of this prefix advances monotonically on all servers as they run: this is consensus.

The leader's job during log repair is to figure out the earliest log entry that it has in common with each follower, and then roll the followers forward until their logs are identical to the leader's. Remember that in Raft the log entries flow in only one direction: from the leader to the followers.

Each RaftNode receives its initial RaftLog with log contents in its constructor. Each log entry is an Entry object with a term label and an action, both integers. Of course, this exercise is only a simulation: in a real RSM implementation, the action is an operation method ID and parameters for a call to the server application state machine.

Use the RaftLog methods (getEntry, getLastEntryIndex, and getLastEntryTerm) to query the log during repair. To install updates from a leader, use RaftLog.insert.