Consensus is at the heart of distributed systems. When multiple nodes need to agree on a single source of truth despite failures, network partitions, or delays, a consensus algorithm ensures that they make consistent decisions. Without consensus, distributed systems risk data inconsistency, split-brain scenarios, or service failures.

This blog explores what consensus is, why it's necessary, and the most important consensus algorithms used in modern systems—such as Paxos, Raft, and Viewstamped Replication. We'll explain each with diagrams, real-world analogies, and practical implications to help you choose the right algorithm for your system.


🌎 What Is Consensus?

In a distributed system, consensus refers to the process by which multiple nodes agree on a single, unified value or decision that will be adopted by all participants. This agreement must hold even when some nodes crash, go offline temporarily, or send delayed messages.

Why is Consensus Important?

Consensus is vital in situations where systems must remain fault-tolerant and highly available. It ensures consistent replication of data across nodes, enables leader election in clustered services, and guarantees correct execution of distributed transactions.

Key Challenges:

Distributed systems face several challenges that make consensus hard to achieve. These include crash faults where nodes silently stop responding, network partitions that isolate subsets of nodes, and message delays or reordering due to unreliable communication links. Furthermore, there is no universal clock, making it difficult to sequence events accurately.


🔑 Properties of Consensus Protocols

A consensus algorithm must satisfy several critical properties to be considered correct:

  1. Agreement: All non-faulty nodes must agree on the same value, even if multiple proposals are made.
  2. Validity: If all participating nodes propose the same value, the protocol should select that exact value.
  3. Termination: Every correct (non-faulty) node must eventually reach a decision and stop waiting.
  4. Integrity: The chosen value should never change once agreed upon, and the process must avoid selecting the same value multiple times.

⚖️ Paxos: The Foundational Algorithm

Paxos, designed by Leslie Lamport, is one of the earliest and most academically validated consensus algorithms. It forms the theoretical basis for many modern protocols, but is often criticized for its complexity.

Roles:

In Paxos, three roles exist: Proposers, who initiate proposals; Acceptors, who vote on proposals and ensure consistency; and Learners, who observe the outcome of consensus and apply the decision to their local state.

Phases:

Paxos operates in two main phases. First, during the Prepare phase, a proposer sends a proposal with a unique number to acceptors to solicit promises. In the Accept phase, if enough promises are received, the proposer sends an actual value to be accepted. If a quorum of acceptors accepts this proposal, the decision is made.

Strengths:

Paxos is mathematically proven to be correct and resilient to crash failures, making it a trustworthy choice in theory. It guarantees safety even when messages are lost or reordered, as long as a majority of nodes are functioning.

Drawbacks:

Despite its robustness, Paxos is notoriously difficult to understand and implement correctly. It also suffers from performance limitations under high contention, since multiple proposers may conflict and cause repeated retries.


🧬 Raft: Understandable and Practical

Raft was introduced as a more understandable alternative to Paxos, aiming to provide the same safety guarantees while improving developer comprehension and implementation simplicity.

Raft structures the consensus process into well-defined stages and uses a clear leader-based model. At any given time, a node can be a leader, follower, or candidate. The leader is responsible for handling client requests and replicating logs to followers.

Phases:

Raft begins with Leader Election, where followers timeout and become candidates, soliciting votes from peers. Once a leader is elected, it handles Log Replication, sending new commands to followers and ensuring consistency. The third component is Safety, where the protocol guarantees that committed entries are never overwritten, even after leader changes.

Real-World Usage:

Raft is widely used in modern infrastructure components like etcd, which serves as the key-value store for Kubernetes, and Consul, a service discovery and configuration system. It’s also used in systems like HashiCorp Vault for secret storage.

Benefits:

Raft stands out for its clarity and practical design. It provides built-in leader election, clearly separates protocol concerns, and is accompanied by excellent reference materials and academic papers, making it a go-to choice for production-grade systems.


🎲 Viewstamped Replication (VSR)

Viewstamped Replication is another leader-based consensus approach similar to Raft. It organizes time into views, each associated with a primary replica responsible for processing client requests.

If the primary fails, a view change is triggered, electing a new leader. During normal operation, the primary sends updates to backup nodes, which acknowledge and apply them. If a quorum of acknowledgments is received, the operation is committed.

VSR is implemented in systems such as HDFS JournalNodes for log replication and inspired some internal components of Google Spanner.


📊 Paxos vs Raft vs VSR: Comparison Table

Feature Paxos Raft Viewstamped Replication
Readability ❌ Complex ✅ Simple ✅ Moderate
Leader election ❌ Ad hoc ✅ Built-in ✅ Built-in
Log replication ❌ Add-on ✅ Integrated ✅ Integrated
Performance ❌ Low ✅ High ✅ High
Production Use Cases Chubby, ZooKeeper etcd, Consul Spanner, HDFS

🛋️ Final Thoughts

Consensus is a fundamental building block for building resilient, fault-tolerant, and distributed applications. While Paxos is a theoretical cornerstone and ideal for understanding core concepts, Raft has emerged as the de facto standard due to its clarity, modularity, and strong open-source ecosystem. VSR provides an alternate model that blends the practicality of Raft with a more formal view-based framework.

When selecting a consensus protocol for your architecture, consider your team’s familiarity with distributed concepts, the complexity you're willing to manage, and your system's tolerance for latency and throughput bottlenecks. Ultimately, mastering consensus allows you to build systems that don’t just survive failures—but thrive in them.


Inspired by Chapter 9 of "Designing Data-Intensive Applications" by Martin Kleppmann