The goal of Project 3 is to implement two parts of the Raft consensus
protocol. You can find a detailed discussion of Raft in this
paper and in the lecture
slides. In particular, Figure 2 in the Raft paper will be an invaluable guide
to completing Project 3.
In the first part of the project, you will implement the leader-election
protocol. In the second part, you will implement the log-repair protocol. Note
that you do not have to handle new user requests. All actions are present in the
set of initial server logs.
Background and setup
The code for Project 3 is available in the course git repo.
In the p3 directory, will find three subdirectories. The bin directory contains
two scripts, including runtest.sh, which can be used to run a Raft
simulation. You should also dump your .class files into the bin directory (the
Makefile will do this for you).
The serverlogs directory is where server instances will look for their
persistent state. This include the action log as well as some additional
configuration state.
The src directory contains all the Java source you will need for the project.
The tests directory contains testing scripts. There is a simple sampletest.txt
there that can be used in combination with bin/runtest.sh.
Simulated servers run in separate processes and communicate with each other via
the rmiregistry. The runtest.sh script will do this for you. To implement
leader election and log replication, you will need to modify three java classes
in the edu.duke.raft package: CandidateMode.java, FollowerMode.java, and
LeaderMode.java. Limit your code changes to these files. You can only submit
these three files.
A server must handle three kinds of inputs: vote requests from other servers,
append-entry requests from other servers, and a timeout started by the server
itself. How you handle these inputs will depend on whether the server is acting
as a candidate, follower, or leader.
To simplify your implementation, we provide the RaftResponses class for storing
RPC responses from other servers. RPC requests can be made using the
remoteRequestVote and remoteAppendEntries methods in RaftMode. Note that these
"remote" methods have no return value. This is because your code can only access
responses to RPCs via the RaftResponse class. Responses are collected for a
given term. The term must be set to begin a collection period, and incoming
responses that do not match the internal term will be ignored. You will need to
be careful to synchronize accesses to RaftResponses.
To schedule a timer, you can use the protected final Timer scheduleTimer(long
millis, int timerID) method that all RaftMode implementations inherit. This
method will create a thread that will execute the public void handleTimeout(long
timerID) function in your RaftMode class. If you do not want the handler to
execute (e.g., because a heartbeat message arrived from the leader), then you
can cancel the Timer using its public void cancel() method.
Servers must maintain three pieces of persistent state: the current term, the
candidate the server voted for in the current term (or zero, if none), and the
action log. We provide the RaftConfig and RaftLog classes for ensuring that this
state is written to disk. You can get and set this state through the class
methods, and both implementations will ensure that updates are synchronously
written to disk before returning. These values are stored in the logs in the
serverlogs directory and are re-read whenever a server re-starts.
You are also responsible for making sure that the server correctly switches
between modes. The correct way to change the server's mode is via the method
public static void RaftServerImpl.setMode(RaftMode mode). In StartServer, you
can see that all servers begin as followers.
Task 1: Leader election
The first part of Project 3 is to implement the leader-election protocol
described in the lecture slides and the Raft paper.
Every server starts as a follower. One of a follower's main jobs is to detect
when the current leader has failed. Followers detect failure by waiting for a
period of time until they have not received any messages. You will implement
this logic by setting timers to trigger after a period of time. The time to wait
before deciding that the leader has failed is called the election timeout
and servers should set their timers to a random value between 150ms and 300ms
(see RaftMode.ELECTION_TIMEOUT_MIN and RaftMode.ELECTION_TIMEOUT_MAX).
When testing your implementation, you will likely find it useful to override the
default randomized election-timeouts. You can do this by adding an
"ELECTION_TIMEOUT_OVERRIDE" value to your server's config file. When this is
set, your server must wait for exactly this much time before calling for
a new election. For example, by setting one server's timeout override
to a value that is much
smaller than the other servers, you can ensure that this server is more likely
to become the leader.
If a follower receives no communication after an election-timeout period, then
it assumes that the leader has failed, and it transitions to candidate
mode. Followers also vote in elections, and your code should implement the
refined election criterion discussed in class (i.e., a server should only vote
for another if it has a more up-to-date log than its own).
When a server first becomes a candidate, it increments its term, and sends a
requestVotes RPC to all other servers. These RPCs are sent in parallel and
responses will arrive at different times. The threads responsible for making
each RPC will store a server's response in the RaftResponses class. Your
candidate code should set a timer to periodically check if a majority of servers
voted for it. If a server wins a majority, then it should transition to leader
mode. While polling for election results via a timer is not the most efficient
approach, it makes your code simpler.
Of course, elections should not last forever, and if a candidate cannot collect
a majority of votes before an election-timeout period of time, then the
candidate should increment its term and call another election. On the other
hand, if a candidate receives a heartbeat message from a server with a term that
is equal to or larger than its own, then it should cancel its election and
transition to follower mode.
A leader must ensure that the rest of the cluster knows it is up and running. It
does this by sending out periodic heartbeat messages to the followers. Heartbeat
messages are empty append-entry requests, and should be sent often enough to
prevent a follower's election timer from triggering. This period is called the
heartbeat interval, and it should be set to half of the minimum possible
election timeout (i.e., 75ms). See RaftMode.HEARTBEAT_INTERVAL. A server remains
in leader mode until it fails or learns that another server has a higher
term. If a server discovers someone with a higher term, the server changes to
follower mode.
Handling timers is critical for correctly implementing the election
protocol. Your server may have one or more timers set at any point in time. When
an input must be processed (whether a network message or a timer firing), you
may need to cancel outstanding timers and/or set up new ones. If you find that
you need to have more than one timer set at the same time, use the timerID
parameter to figure out which timer fired in handleTimeout.
Whenever a server switches modes, it should print a message to stdout. For
example, if server one becomes a candidate for term five, it should print,
"S1.5: switched to candidate mode.", to stdout. If server one becomes the leader
for term five, it should print, "S1.5: switched to leader mode."
My advice is to implement leader election using the simple criterion first
(i.e., followers vote for the candidate, regardless of their log's state), and
implement the refined criterion in Task 2.
Task 2: Log repair
Once your servers have chosen a leader, you must implement the log repair
protocol described in the lecture slides and the Raft paper. The leader's job is
to figure out the earliest log entry that it has in common with each follower,
and then roll the followers forward until their logs are identical to the
leader's. Remember that in Raft log entries only flow in one direction: from the
leader to the followers.
Logs are stored in plaintext files. The runtest.sh script reads and writes logs
stored in the serverlogs directory. Each log entry is a stored as two integers
on a single line. The first integer is the term and the second is the
action. For the purposes of implementing Raft, the meaning of the action
integers is irrelevant.
As followers catch up, you can monitor the state of their logs by looking at the
log files. For a sufficiently long simulation and sufficiently infrequent
failures, all server logs should converge to the same state by the end of a
simulation. Note that there is no guarantee that logs will converge, since
exactly the wrong failure at a critical moment can prevent an election from
happening or a log entry from propagating. However, all servers will
agree on a stable prefix of their logs, and the length of this prefix
advances monotonically on all servers as they run: this is consensus.
Testing
Testing your Raft implementation will be a bit tricky due to the asynchronous
nature of handling timeouts. However, runtest.sh is meant to help you
debug. runtest.sh can read a test file from the tests directory like
sampletest.txt. A test file will allow you to start servers, fail servers, pause
servers, resume servers, and sleep the testing script (note that sleep does not
cause the servers themselves to sleep). As in Project 1, you should write a
suite of simple test cases to make sure that your implementation is correct. As
mentioned earlier, overriding the election timeout for a server in its .config
file will also be very useful for testing.
runtest.sh appends the output of a test run to the file
serverlogs/server.output. The command below (run from directory p3) will run the
sample test for 10 seconds.
$> bin/runtest.sh 10 tests/sampletest.txt
At the top of a test file, you must set the variable NUM_SERVERS, which
indicates how many servers will run. Each server will be assigned a unique
numeric identifier, between 1 and NUM_SERVERS. You need to make sure that the
NUM_SERVERS field in each server's .config file matches the NUM_SERVERS
specified in the test file.
Look through sampletest.txt to see how to manipulate the events in your test.
In addition to running, pausing, and failing servers, you can also manipulate
servers' initial states by modifying the .log and .config files in the
serverlogs directory. runtest.sh will look for Server N's log and config
information in serverlogs/N.log and serverlogs/N.config, respectfully. If
runtest.sh cannot find the right .log and .config files, it will create them by
copying serverlogs/init.config and serverlogs/init.log into serverlogs/N.config
and serverlogs/N.log, respectively, before starting the server.
Submitting
Please submit your code through the course website. As before, the auto-grader will grade the master branch of your repo (i.e., CandidateMode.java, FollowerMode.java, and LeaderMode.java) and email your results. You are entitled to one submission per day (using the AG's clock) and three bonus submissions.
Please be patient while waiting for your feedback. Grading Project 3 is slow, so don't be surprised if your feedback arrives a half hour (or more) after you submit. Here are some hints for the test cases:
Tests 0 - 5 only test leader election.
Tests 6 - 14 test leader election and log reconciliation
Servers must have the same log by the end of the test. You are given a reasonable amount of time for this to happen.
Servers must elect a leader at least once during a test. Again, you are given more than enough time for this to happen.
The AG performs a number of correctness checks on your output, such as whether two servers enter leader mode in the same term.
The AG runs your code multiple times for each test, and your code must produce correct output on each run to pass the test.