mbver

Posted on Feb 25 • Edited on Feb 26

Refactor and cleanup HashiCorp Raft: adapting state pattern to concurrency

#distributedsystems #go

HashiCorp’s Raft is a mature implementation of the Raft protocol, but its codebase has some readability challenges. This article covers the cleanup process to improve clarity and maintainability. The code is published on github.

The first notable issue is the absence of the state pattern. Given Raft’s finite-state machine nature, it is well-suited for this design, yet HashiCorp’s implementation does not follow it. Instead, it starts and stops separate loops for the follower, candidate, and leader states, duplicating the main loop logic across them. By adopting a proper state pattern—where each state directly handles input, I was able to make the main loop more distinct and eliminate redundancy.
State transitions in Raft can be triggered by multiple goroutines, making them prone to race conditions. Previously, a node transitioned states by directly updating its state and term, forcing the current state's loop to exit and a new one to start. I found this deviates from state pattern and make the transition harder to track.

In my design, transitions are explicitly managed through dispatchTransition, which sends a transition message to the receiveTransitions loop. This loop then delegates the transition to the current state. Each state implements its own HandleTransition method, which directly follows Raft’s state transition diagram, ensuring that only valid transitions occur.

Figure 1: Raft consensus algorithm.

The second issue is the confusion between Configuration and Config. The concept of Configuration is entirely about Membership. To clarify this, I replaced Configuration with the Membership struct, which manages both the latest and committed memberships. It also encapsulates all membership-related operations, such as checking, updating, and creating new memberships.

The HandleMembershipChange method in the leader state now utilizes these methods to either reject membership changes or accept, dispatch, and apply them. This eliminates the scattered handling logic found in functions like configurationChangeChIfStable.

The third issue is the lack of encapsulation in the leader state's replication logic. For example, Raft's replicate method takes a followerReplication instance and runs the replication loop, while replicationTo attempts to send logs to a follower up to the latest index.

To address this, I refined followerReplication into peerReplication, encapsulating all replication-related methods within it. Additionally, the leader state should directly manage its peer replication map, rather than delegating it to Raft.

Here's a summary of the re-encapsulation of these methods, where pr stands for peerReplication:

original                  | cleanup                     | intent
--------------------------------------------------------------------------------------------------
Raft.replicate            | pr.run                      | run the replication loop for a peer
Raft.replicateTo          | pr.replicate                | send the latest logs
Raft.heartbeat            | pr.heartbeat                | run the heartbeat loop
Raft.sendLatestSnapshot   | pr.sendLatestSnapshot       | send latest snapshot to the peer
Raft.pipelineReplicate    | pr.runPipeline              | run replication in pipeline mode
Raft.pipelineSend         | pr.pipelineReplicate        | send latest logs in pipeline mode
Raft.pipelineDecode       | pr.receivePipelineResponses | loop to handle peer responses in pipeline mode
Raft.startStopReplication | Leader.startReplication     | start a peerReplication for each peer, skipping those that are already running.
                          | Leader.startPeerReplication | directly start a peerReplication
                          | Leader.stopPeerReplication  | directly stop a peerReplication

Helper methods like waitForReplicationTrigger, waitForHeartbeatTrigger, and waitForBackoff were added to eliminate repetitive select statements, making the replication and heartbeat logic clearer. Additionally, key methods were rewritten to eliminate goto statements and labels, resulting in a more linear flow that improves readability and makes the logic easier to follow.
A small but potentially confusing detail is CommitTimeout. Despite its name, it is actually an interval that periodically triggers replication to synchronize the leader's commit index on followers—it has nothing to do with commit timeouts. To clarify its purpose, I renamed it to CommitSyncInterval.

With the more powerful Membership and peerReplication structs, I was able to fully implement the staging feature. Previously, the staging step was skipped, and a new peer became a voter immediately. In my implementation, a dedicated staging struct ensures that only one peer can be staged at a time. It waits for peerReplication to sync logs and for the new membership to stabilize before promoting the peer to a voter. When a new leader takes over, it checks for any ongoing staging and completes it.

The fourth issue is the confusion between application state and the FSM. The actual FSM is Raft, which follows a state transition diagram. Raft has a loop that receives commits and requests from other loops, applying them to the application state. However, the application state is unlikely to be another FSM.

To address this, the FSM interface - responsible for applying commits, handling snapshots, and restoring state—has been refined into the CommandsState interface. Additionally, a MembershipApplier interface has been introduced to handle membership-related commits. The AppState struct now manages CommandsState, MembershipApplier, supporting channels, and relevant indices.

Furthermore, the runFSM loop has been refined into the receiveMutations method, simplifying its logic and making it more explicit.

The fifth issue is the poor organization of Raft’s core code. The Raft struct is declared in api.go, while raft.go is overloaded with various responsibilities, including the loops for follower, candidate, and leader states, leader stepup and stepdown, candidate running election and requests handling.
In the refined design, the code is cleanly split based on functionality:

raft_api.go – Contains only the exposed API methods that clients use to interact with Raft.
raft.go – Focuses solely on defining the Raft struct and managing its critical loops, such as the main loop, receiveHeartbeat loop, receiveTransitions loop, and receiveSnapshotRequest loop.
raft_builder.go – Implements the builder pattern to facilitate flexible object creation and streamline the initialization of a new Raft node.
State-Specific Files – No more state loops in raft.go. Each state has its own file and implements methods to satisfy the State interface. Further more, state specific code is defined there:
- state_candidate.go – contains runElection
- state_leader.go – contains stepUp and stepDown.
raft_internals.go – Houses all internal Raft methods that support states and loops. These methods have been renamed and refactored for clarity. Some notable method renamings include:

original         | cleanup               | intent
------------------------------------------------------------------------------------------------------
processRPRC      | handleRPC             | check RPC type and delegate handling to matching handler
appendEntries    | handleAppendEntries   | handle the appendEntries request
requestVotes     | handleRequestVote     | handle vote request
installSnapshot  | handleInstallSnapshot | handle install snapshot request
timeoutNow       | handleCandidateNow    | handle request to transition raft to candidate immediately

The sixth issue is the overly complicated transport. I found that TCP-based network transport is sufficient for both testing and real use. As a result, redundant interfaces and functions in transport.go and tcp_transport.go have been removed. Additionally, inmem_transport.go is dropped, as testing is now done directly with netTransport.

To simulate network partitions, I introduced the ConnGetter interface:

TransparentConnGetter (used in real deployments) does not interfere with how netTransport connects nodes.
BlockableConnGetter (used in tests) can block connections to simulate network partitions.

The following redundant components have been removed: serverAddressProvider, getConnFromAddressProvider, and getProviderAddressOrFallback.

Several refinements were made after consolidating transport logic:

A single NewNetTransport replaces multiple scattered methods (NewNetworkTransport, NewNetworkTransportWithLogger, NewNetworkTransportWithConfig).
The backoff struct now encapsulates exponential backoff logic, simplifying the listen loop.
handleCommand is split into handleMessage and dedicated handlers for different message types. Helper functions (dispatchWaitRespond, sendUntilStop, receiveUntilStop) eliminate repetitive code.
netConn is renamed to peerConn, encapsulating related methods (SetDeadline, sendMsg, readResp).
genericRPC is renamed to unaryRPC, and sendRPC is refactored to streamRPC to better reflect their behavior.
netPipeline is renamed to replicationPipeline, making its role explicit—it only sends requests and reads responses. Running loops is now handled by peerReplication.
The transport now sends heartbeat RPCs to a dedicated heartbeat channel. Raft explicitly processes them in a loop, aligning with how RPCs are handled in mainLoop. The previous heartbeatFn and heartbeatFnLock fields were removed as they added unnecessary complexity and obscured logic.

The seventh issue is the leader verification process being unnecessarily complex.

When a client sends a verification request, it goes to the mainLoop.
The leader registers it in each peerReplication and triggers an immediate heartbeat.
After the heartbeat, peerReplication notifies pending verification requests of success or failure.
Requests count the votes, and if verification fails or has enough votes, they resend themselves to Raft's mainLoop.

Resending verification requests to verifyCh complicates the logic. This mechanism was initially designed to implicitly force a leader stepdown when the heartbeat receives a response with a higher term during verification, but it is unnecessary. In the cleanup code, the leader always steps down if the heartbeat receives a higher term response, regardless of the verification process. As soon as the request succeeds or fails, result is sent immediately to client. Removing the resend logic makes verifyRequest handling cleaner. The leaderState.notify and cleanNotify are removed. Furthermore, the vague notifyAll is renamed to verifyAll for clarity.

Previously, leader self-verification was handled via Raft.checkLeaderLease in mainLoop. In the cleanup, I refactored it into selfVerify and checkFollowerContacts, encapsulated within Leader. The selfVerify loop now starts when a leader steps up and exits when it steps down. It continuously monitors follower contact and forces a stepdown if the leader loses contacts to the majority.

Aside from the refinements mentioned above, improvements were also made to other areas, including the observer, voting, and testing. Every detail has been polished to enhance readability and eliminate friction and confusion.

Tests now run in parallel to improve speed, allowing them to run more frequently.

Thanks for reading through! Hope you find this useful.

DEV Community

Refactor and cleanup HashiCorp Raft: adapting state pattern to concurrency

Top comments (0)

Read next

Edge Computing: Low-Latency paradigm for Distributed Systems

Solving the Empty Path Issue in Go Lambda Functions with API Gateway HTTP API

Counting the number of Tokens sent to a LLM in Go (part 2)

Taking Pagination to the Next Level: Sorting and Filtering in Go APIs