Reaching Consensus in Ripple

Written by: Dave Cohen, David Schwartz and Arthur Britto.

Introduction

Ripple is a universal payment system enabling users to transfer funds (including fiat currencies, digital currencies and other forms of value) across national boundaries as seamlessly as sending an email. Like other digital currencies such as Bitcoin, the Ripple Protocol enables peer-to-peer transaction settlement across a decentralized network of computers. Unlike other digital currency protocols, however, the Ripple Protocol is currency agnostic and users are not required to transact in the Protocol’s native currency (XRP). In addition, the technology enables near real-time settlement (three to six seconds) and is built to route each international transaction to the cheapest FX bid/ask spread available in the network.

Mechanics

At the core, Ripple has a “ledger” that records information such as accounts, balances, and offers to trade assets. Operations such as creating accounts, making payments, and converting assets are performed through signed instructions called “transactions.”

Because Ripple is a cryptographic system, accounts are identified by cryptographic identities. Transactions are authorized by cryptographic signatures matching these identities. Every server processes every transaction according to the same deterministic, known rules.

The Double Spend Problem

Transactions that are fundamentally invalid can be recognized as such by all participants, for example sending $20 from an account that holds only $10. However, it is also possible to construct two transactions such that either is valid but not both. This is called the “double spend” problem.

Suppose Alice has $10 in some payment system. For it to be a useful payment system, she has to be able to send it to Bob if she wishes to. And she must be able to sent it to Charlie if she wishes to. However, if she can send the same $10 to both Bob and Charlie, the payment system ceases to be useful. So somehow, sending $10 to Bob must prevent her from sending the same $10 to Charlie.

Conventionally, this is solved with a central authority. For example, a bank will clear checks based on the issuer’s available balance, which the bank is the sole custodian of. However, distributed systems, like Ripple, by definition have no central authority. They must solve the double spend problem some other way.

Simplifying The Problem

Much of the double spend problem can be solved by well-known rules such as prohibiting an account from spending funds it does not have. In fact, the double spend problem can be reduced to putting transactions in order.

Consider the example of Alice trying to send the same $10 to both Bob and Charlie. If the payment to Bob is known to be first, then everyone will agree that she has the funds to pay Bob. And if the payment to Charlie is known to be second, then everyone will agree that she cannot send those funds to Charlie because they are now Bob’s.

We can also order transactions by deterministic rules. Because transactions are collections of digital information, they can trivially be sorted.

This would be sufficient to solve the double spend problem without a central authority, but it would require us to have every transaction that would ever occur (so that we could sort them) before we could be certain of the results of any transaction. Obviously, this is impractical.

However, if we could bunch transactions into groups and agree on those groupings, we could sort the transactions within that group. That is, so long as every participant agrees on which transactions are to be processed as a unit, they can use deterministic rules to solve the double spend problem without any need for a central authority. The participants simply each sort the transactions and apply them in a deterministic way following the known rules.

Ripple does not rely on one and only one of two conflicting transactions being in the agreed group. The group of transactions is executed according to deterministic rules, and there is no harm in having two conflicting transactions in the group. The one that is first according to the sorting rules will succeed and the one that is second will fail.

The primary role of consensus is for participants in the process to agree on which transactions are to be processed as a unit to resolve the double spend problem. There are four reasons this agreement is easier to achieve than might be expected:

  1. If there is no reason a transaction should not be included in such a group of transactions, all honest participants will agree to include it. If all participants already agree, consensus has no work to do.
  2. If there is any reason at all a transaction should not be included in such a group of transactions, all honest participants are willing to exclude it. If the transaction is still valid, there is no reason not to include it in the next round, and they should all agree to include it then.
  3. It is extremely rare for a participant to particularly care how the transactions were grouped. Agreement is easiest when everyone’s priority is reaching agreement and only challenging when there are diverging interests.
  4. Deterministic rules can be used even to form the groupings, leading to disagreement only in edge cases. For example, if there are two conflicting transactions in a round, deterministic rules can be used to determine which is included in the next round.

Every participant’s top priority is correctness. That is, the rules that people rely on to preserve the integrity of the network must not be violated. For example, a transaction that is not properly signed must never be processed. However, every honest participant’s second priority is agreement. A network with possible double spends has no utility at all. Agreement is facilitated by the fact that every honest participant values it above everything but correctness.

How Consensus Works

A consensus round starts with each participant who wishes to do so taking an initial position. This is the set of valid transactions they have witnessed.

Participants then “avalanche” to consensus: If a particular transaction does not have majority support, participants will agree to defer that transaction. If a particular transaction does have majority support, participants will agree to include the transaction. Thus slight majorities rapidly become full support and slight minorities rapidly become universal rejection from the current round.

To prevent consensus from stalling near 50% and to reduce the overlap required for reliable convergence, the required threshold to include a transaction increases over time. Initially, participants will continue to agree to include a transaction if 50% or more of other participants agree. This will then jump to 60% and continue to increase until disputed transactions are deferred to the next round.

When a participant sees a supermajority that agrees on the set of transactions to next be processed, it declares a consensus to have been reached.

Consensus Can Fail

It is not practical to develop a consensus algorithm that will never fail to achieve perfect consensus. One way to understand why this is so is to consider how the consensus process terminates. At some point, each participant must declare that a consensus has been reached and that some set of transactions is known to be the result of the process. This declaration commits that participant irrevocably to some particular set of transactions as the result of the consensus process.

Clearly, some participant must do this first or no participant will ever do it, and consensus will never be reached. Now, consider the participant that does this first. When this participant decides that consensus is finished, other participants have not yet made that decision. If they were incapable of changing the agreed set from their point of view, they would have already decided consensus was finished. So they must be still capable of changing their agreed set.

In other words, for the consensus process to ever terminate, some participant must declare that consensus has been reached on a set of transactions even though every other participant is theoretically still capable of changing the agreed upon set of transactions.

The probability of this kind of failure can be made very low, but it cannot be reduced to zero. The engineering tradeoffs are such that driving this probability down below about one in a thousand makes consensus significantly slower, and less able to tolerate network and endpoint failures.

How Ripple Handles Consensus Failure

After a consensus round completes, each participant applies the set of transactions that they believe were agreed to. This results in constructing what they believe the next state of the ledger should be.

Participants that are also validators then publish a cryptographic fingerprint of this next ledger called a “validation”. If the consensus round succeeded, the vast majority of honest validators should be publishing the same fingerprint.

Participants then collect these validations. From the validations, they can determine whether the previous consensus round resulted in a supermajority of participants agreeing on a set of transactions or not.

Participants then find themselves in one of three cases, in order of probability:

  1. They built the same ledger a supermajority agreed to. In this case, they can consider that ledger fully validated and rely on its contents.
  2. They built a different ledger than a supermajority agreed on. In this case, they must build and accept the supermajority ledger. This typically indicates that they declared a consensus early and many other participants changed after that. They must “jump” to the super-majority ledger to resume operation.
  3. No supermajority is clear from the received validations. In this case, the previous consensus round was wasted and a new round must occur before any ledger can be validated.

Of course, case 1 is the most common. Case 2 does no harm to the network whatsoever. A small percentage of the participants could even fall into case 2 every round, and the network would operate with no issues.

Case 3 results in the network losing a few seconds in which it could have made forward progress, but is extremely rare. In this case, the subsequent consensus round is much less likely to fail because disagreements are resolved in the consensus process and only remaining disagreements can cause a failure.

On rare occasions, the network as a whole will fail to make forward progress for a few seconds. In exchange, average transaction confirmation times are low.

Philosophy

One form of reliability is the ability of a system to provide results even under conditions where some components have failed, some participants are malicious, and so on. While this is important, there is another form of reliability that is much more important in cryptographic payment systems — the ability of a system to produce results that can be relied upon. That is, when a system reports a result to us as reliable, we should be able to rely on that result.

Real-world systems, however, face operational conditions in which both kinds of reliability can be compromised. These include hardware failures, communication failures, and even dishonest participants. Part of Ripple’s design philosophy is to detect conditions where the reliability of results are impaired and report them, rather than providing results that must not be relied on.

Ripple’s consensus algorithm provides a robust alternative to proof of work systems, without consuming computational resources needlessly. Byzantine failures are possible, and do happen, but the consequence is only minor delays. In all cases, Ripple’s consensus algorithm reports results as reliable only when they in fact are.

Further Resources