Why private blockchains should not be eager to run code
I’m not a fan of the term “smart contracts”. For a start, it has been used by so many people for so many different things, that we should probably just ban it completely. For example, the first known reference is from 1997, when Nick Szabo used it to describe physical objects that change their behavior based on some data. More recently, the term has been used for the exact opposite: to describe computation on a blockchain which is influenced by external events such as the weather. For now let’s put both of these meanings aside.
I want to focus here on “smart contracts” in the sense of general purpose computation that takes place on a blockchain. This meaning was popularized by Ethereum, whose white paper is subtitled “A Next-Generation Smart Contract and Decentralized Application Platform”. As a result of the attention that Ethereum has received, this meaning has become the dominant one, with banks (and others) working away on smart contract proofs-of-concept. Of course, since we’re talking about regulated financial institutions, this is mostly in the context of private or permissioned blockchains, which have a limited set of identified participants. For reasons that are now well understood, public blockchains, for all of their genius, are not yet suited for enterprise purposes.
So is the future bright for smart contracts in private blockchains? Well, kind of, but not really. You see, the problem is:
In private blockchains, smart contracts combine four good ideas with one bad one.
So what are the good ideas? (a) expressing business logic as a computer program, (b) representing the events which trigger that logic as messages to the program, (c) using digital signatures to prove who sent the messages, and (d) putting all of the above on a blockchain.
And the bad one? Executing every program for every message on every blockchain node. In other words, making the execution of all programs the job of the blockchain, instead of just using it as storage for the programs and messages. And yet this global execution is the entire reason why Ethereum was developed.
If you’re aware of the deterministic nature of computation, know about the halting problem, and understand how data dependencies prevent concurrency then you may already be convinced. But if not, make yourself a coffee, take a deep breath, and follow me down the rabbit hole…
In order to understand Ethereum-style smart contracts, we need to start with bitcoin, the first (and still most popular) public blockchain. The bitcoin blockchain was originally designed for one thing only: moving the bitcoin currency from one owner to another. But once it was up and running, people started embedding “metadata” in transactions to serve other purposes, such as digital assets and document notarization. While some bitcoiners fought these applications, an official mechanism for metadata was introduced in March 2014, with usage growing exponentially ever since.
As well as projects built on the bitcoin blockchain, many next-generation public blockchains were developed and launched, such as Nxt, Bitshares, Ripple and Stellar. These were designed from the ground up to support a broader range of activities, such as user-created assets, decentralized exchange and collateralized borrowing. Each of these blockchains has a different set of features, as decided upon by its developers, and each must be upgraded by all of its users when a new feature is added. Things started to get rather messy.
Having been involved in some of these projects, Vitalik Buterin posed a simple but brilliant question: Instead of lots of application-specific blockchains, why not have a single public blockchain that can be programmed to do whatever we might want? This über-blockchain would be infinitely extendible, limited only by the imagination of those using it. The world of crypto-enthusiasts was almost unanimously convinced by this powerful idea. And so, with $18 million in crowd funding and to great excitement, Ethereum was born.
Ethereum is a new public blockchain with an associated cryptocurrency called “ether”, like hundreds which came before it. But unlike other blockchains, Ethereum enables anybody to create a “contract” inside the blockchain. A contract is a computer program with an associated miniature database, which can only be modified by the program that owns it. If a blockchain user wants to change a database, they must send a digitally signed message to its contract. The code in the contract examines this message to decide whether and how to react. (This “encapsulation” of code and data is also a foundation of object-oriented programming.)
Ethereum contracts can be written in one of several new programming languages, such as Solidity and Serpent. Like most programming languages, these are Turing complete, meaning that they can express any general purpose computation. A key feature of Turing complete languages is the loop structure, which performs an operation repeatedly until some condition is fulfilled. For example, a loop might be used to print the numbers from one to a million, without requiring a million lines of code. For the sake of efficiency, programs written for Ethereum are compiled (i.e. converted) into more compact bytecode before being stored on the chain. Ethereum nodes then execute this bytecode within a virtual machine, which is essentially a simulated computer running inside a real one.
When an Ethereum contract is created on the blockchain, it sets up the initial state of its database. Then it stops, waiting politely until it’s called upon. When a user of the blockchain (or another contract) sends it a message in a transaction, the contract leaps into action. Depending on the code within, it can identify the source of the message, trigger other contracts, modify its database and/or send back a response to the caller. All of these steps are performed independently on every node in the network, with identical results.
To give an example, a simple Ethereum subcurrency contract maintains a database of user balances for a particular asset. If it receives a message to transfer funds from Alice to Bob, it will (a) check the message was signed by Alice, (b) check that Alice has sufficient funds, (c) transfer funds from Alice’s to Bob’s account in the database and (d) respond that the operation was successful. Of course, we don’t need Ethereum for that, because a simple bitcoin-style blockchain with native asset support can do the same thing. Ethereum really comes into its own for complex multi-stage business logic, such as crowdfunding, decentralized exchanges, and hierarchical governance structures. Or so, at least, the promise goes.
Breaking it down
Now that we know how Ethereum smart contracts work, we can break them down into five constituent parts:
- Expressing business logic as computer programs.
- Representing the events which trigger that logic as messages to the programs.
- Using digital signatures to prove who sent the messages.
- Putting the programs, messages and signatures on a blockchain.
- Executing every program for every message on every node.
To repeat what I said at the start, I believe that parts 1 through 4 are very good ideas. Let’s start with the first two (which, by the way, are not new). Unlike legal contracts which can have differences of interpretation, computer programs are unambiguous. For any given program in a well-defined programming language, the same input always leads to the same output. So if some business logic is expressed as a computer program, and events are represented as messages to that program, then the business outcome is set in stone. Indeed, this deterministic property of computation makes randomness a sticky problem in computer science, and even the geeks at Google can get it wrong.
What about digital signatures and blockchains? These avoid the need for a central authority to determine which messages were sent, in what order, and by whom. Instead, each participant creates a pair of private and public keys, and distributes its public key once to the other participants. Following that, they sign every message with their private key before distributing that message across the network. The other participants can then verify the message’s source using the sender’s public key only. It’s clever cryptographic stuff. Finally, by putting the program and signed messages on a blockchain, we can ensure that every participant has an identical view of who did what and when. Combined with deterministic computation, this means participants cannot disagree over the final business outcome.
But what about the last idea, of every node executing every program for every message? Here we come to the contentious part. Because while this global execution might be nice to have, it’s also not necessary. Because computation is deterministic, it makes no difference whether a program is executed by one node, every node, or some external process. It also doesn’t matter whether this happens in real time, on demand, or 10 years later. The computation’s result will always be the same. And if for some reason this isn’t the case, this can only be due to a problem in the blockchain software or network.
The trouble with computation
If it doesn’t matter where a computation takes place, why not do it everywhere? Well, it turns out that computer programs are unpredictable. However innocent they may look, they can take a long time to run. And sometimes, they go on running forever. Consider the following classic example (known as an LCG):
- Set x to a single-digit number of your choice
- Set y to 123*x+567
- Set x to the last two digits of y, i.e. y modulo 100
- If x is more than 2 then go back to step 2
- Otherwise stop and output the value of x
Simple enough, right? So here’s a question for you: Will this program ever finish? Or will it get stuck in an infinite loop? Not so sure? Well let me put you out of your misery: It depends on the initial value of x.
If x is 0, 1, 2, 5, 6, 7 or 8, the program stops fairly quickly. But if x is 3, 4 or 9, it continues indefinitely. Don’t believe me? Open up Excel and try for yourself (you’ll need the “MOD” function).
If you couldn’t predict that just by looking at the code, don’t feel too bad. Because not only is this hard for people, it’s impossible for computers. The problem of determining whether a given program will finish executing is called the halting problem. In 1936, Alan Turing, of “Turing complete” and The Imitation Game fame, proved that it cannot be solved for the general case. Barring trivial exceptions, the only way to find out if a program will finish running is to run it for as long as it takes, and that could be forever.
For those of us who’d prefer to live without blue screens of death and spinning beach balls, it’s all rather inconvenient. But live with it we do and, remarkably, most software works smoothly most of the time. And if not, modern operating systems like Windows protect us against runaway code by letting us terminate programs manually. However the same thing can’t be done on a blockchain like Ethereum. If we allowed individual nodes to terminate computations at will, different nodes would have different opinions about the outcome of those computations. In other words, the network consensus would break down. So what’s a blockchain to do?
Ethereum’s answer is based on transaction fees, also known as gas. The sender of each transaction pays for the computations it triggers, and this payment is collected by the miner who confirms it in a block. To be more precise, every Ethereum transaction states up front how much of the sender’s “ether” can be spent on processing it. The fee is gradually spent as the contract executes, step-by-step, within the Ethereum Virtual Machine. If a transaction runs out of fees before it finishes executing, any database changes are reverted and the fee is not returned. If a transaction completes successfully, any remaining fee is returned to its sender. In this way, transactions can only burden the network to the extent that they’re willing to pay for it. It’s undoubtedly a neat economic solution, but it requires a native blockchain currency in order to work.
Smart contracts vs concurrency
If gas can prevent runaway computation, do smart contracts get the green light? Well, not so fast, because there’s another problem with smart contracts that we need to talk about:
Smart contracts work poorly for high transaction throughput.
Concurrency is one of the most fundamental issues in computer architecture. A system has good concurrency if it allows several processes to happen simultaneously and in any order. Concurrent systems reduce delays and enable much higher throughput overall, by making optimal use of technologies such as process scheduling, parallel processing and data partitioning. That’s how Google searches 30 trillion web pages almost 100,000 times per second.
In any computer system, a set of transactions can only be processed simultaneously if they don’t depend on, or interfere with, each other. Otherwise, different processing orders might lead to completely different outcomes. Now recall that a smart contract has an associated database, and that it performs general-purpose computation including loops. This means that, in response to a particular message, a smart contract might read or write every single piece of information in its database. For example, if it is managing a sub-currency, it might decide to pay some interest to every holder of that currency. Of course, this won’t always be the case. But the problem is: before running the contract’s program for a particular message, a blockchain node cannot predict which subset of the contract’s database it’s going to use. Nor can it tell whether this subset might have been different under different circumstances. And if one contract can trigger any other, this problem extends to the entire content of every database of every contract. So every transaction must be treated as if it could interfere with every other. In database terms, each transaction requires a global lock.
Now think about the world a blockchain node lives in. Transactions comes in from different peers, in no particular order, since there is no centrally managed queue. In addition, at average intervals of between 12 seconds (Ethereum) and 10 minutes (bitcoin), a new block comes in, confirming a set of transactions in a specific order. A node will probably have seen most of a block’s transactions already, but some may be new. Either way, the order of the transactions in the block is unlikely to reflect the order in which they arrived individually. And since the order of transactions might affect the outcome, this means transactions cannot be processed until their order in the blockchain is confirmed.
Now, it’s true that an unconfirmed bitcoin transaction might need to be reversed because of a double spend. But an unconfirmed Ethereum transaction has no predictable outcome at all. Indeed, current implementations of Ethereum don’t even process unconfimed transactions. But if an Ethereum node was to process transactions immediately, it would still need to rewind and replay them in the correct order when a block comes in. This reprocessing is a huge waste of effort, and prevents external processes from concurrently reading the Ethereum database while it goes on. (To be fair, it should be noted that bitcoin’s reference implementation also rewinds and replays transactions when a block comes in, but this is due only to a lack of optimization.)
So what is it about bitcoin’s transaction model that makes out-of-order execution possible? In bitcoin, each transaction explicitly states its relationship to other transactions. It has a set of inputs and outputs, in which each input is connected to the output of a previous transaction which it “spends”. There are no other dependencies to worry about. So long as (a) two bitcoin transactions don’t attempt to spend the same output, and (b) the output of one doesn’t lead to the input of another, a bitcoin node can be sure that the transactions are independent, and it can process them in any order. Their final positions in the blockchain don’t matter at all.
To use formal computer science terminology, Ethereum transactions must be strictly totally ordered, meaning that the relative order between every pair of transactions must be defined. By contrast, bitcoin transactions form a directed acyclic graph which is only partially ordered, meaning that some ambiguity in transaction ordering is allowed. When it comes to concurrency, this makes all the difference in the world.
To look at it in practical terms, there’s been a lot of talk about private blockchains in the enterprise. But a private blockchain is just a distributed database with some additional features. And if you tried selling an enterprise-class database today that does not support concurrency, you’d be laughed out of the room. Equally ludicrous would be the suggestion that an individual node has to wait 12 seconds before seeing the result of its own transactions. As Vitalik himself recently tweeted:
Key pt about dapp dev that I’ve underappreciated: main prob is not tx cost; ppl can handle $0.001. Prob is latency; ppl want 500ms, not 17s.
— Vitalik Buterin (@VitalikButerin) September 27, 2015
For us at Coin Sciences this is not just an academic issue, because we need to decide whether and how to incorporate smart contracts into MultiChain. Strangely enough, despite the hundreds of feature requests and questions we’ve received so far, only two have been related to smart contracts, and even then in a weaker form than Ethereum provides. So while we’re keeping an open mind, it may turn out that smart contracts don’t solve any real problems for our users.
In favor of Ethereum
If you’re interested in only one side of the argument, you can stop reading here. But you may be wondering: Are the creators of Ethereum stupid? Why on earth would they require global execution in a public distributed database, if each node could simply choose which programs it cared about running? Are there any good reasons for the Ethereum way?
Actually, if we’re talking about public blockchains, I believe there are. But in order to understand these reasons, we need to think about the dynamics of the Ethereum network itself.
Preventing transaction spam
A blockchain is maintained by a peer-to-peer network, in which each node is connected to a random subset of the other nodes. When a new transaction is created on one node, it spreads rapidly and haphazardly to the others through a process called “relaying”. In an open public network, anyone can create transactions, so we need a way to protect ourselves against transaction spam which could overwhelm the system. Since the network is decentralized, this can only be achieved by individual nodes assessing new transactions as they come in, and deciding whether or not to relay them. While this mechanism can’t prevent a spammer from overwhelming an individual node, it does protect the network as a whole.
In a public network, when a node decides whether to relay a new transaction, one key criterion is the ratio between its fee and its cost to the network. In the case of bitcoin, this cost is based mainly on the transaction’s raw size in bytes. In Ethereum, a more complex formula is used, based on the computational effort the transaction will consume. Either way, fees act as a market-based mechanism for the prevention of transaction spam.
But how does a node know if the sender has sufficient funds to cover the fee they’re offering? In the case of Ethereum, each user’s balance of “ether” is affected by the outcome of previous transactions, since contracts can both spend and pay out ether. So without actually executing all programs for all previous messages, an Ethereum node has no way of knowing a user’s up-to-date balance. Therefore, it cannot assess whether a transaction should be relayed to other nodes. And without that, an open network could be trivially destroyed.
Compact data proofs
In a blockchain, blocks are filled mostly by the transactions that they confirm. However each block also has a compact “header”, which contains important information such as a timestamp and a link to the previous block. For public blockchains based on proof of work hashing, the input for the hashing algorithm is this block header alone. This means that the authority of a chain can be assessed by a “lightweight client” without downloading most of its content. For example, as of November 2015, the complete set of bitcoin headers is 30 MB in size, compared to 45 GB for the full chain. That’s a ratio of 1500:1, making a crucial difference to mobile devices with limited bandwidth and storage.
The header of each Ethereum block contains a “state root”, which fingerprints the state of the chain after processing the transactions in that block. Among other things, this state covers the content of every contract’s database, with the fingerprint calculated efficiently using a tree of one-way hash functions. The slightest change to any contract’s database would lead to a completely different state root, so the root “ties down” the database’s contents. (An equivalent notion of “UTXO commitments” for bitcoin has been discussed but not yet implemented.)
The tree-like method of calculating state roots has an important property: Given a known state root, the value of a particular entry in a contract database can be proven efficiently. The size of this proof is proportional to the depth of a binary tree whose leaves are the individual database entries, i.e. log2 the total database size. That means that, for an individual entry, the proof only doubles in length when the database size is squared – the kind of scalability that computer scientists kill for. Now recall that the state root of each block is in its header, which a lightweight client can verify. As a result, lightweight clients can safely and efficiently query any full node in the network for individual database entries, and full nodes cannot lie.
But if our blockchain headers include a state root, and the state root depends on the contents of the database, then every node must keep the blockchain’s database up to date. In turn this means running every contract for every message it has received so far. Without this, a mining node wouldn’t know the state root to place in a block header, nor could other nodes verify the blocks that they receive. The bottom line is: if we want lightweight clients to safely retrieve compact data proofs from the network, full nodes must perform all the computations described by the data in the chain.
The verdict for private blockchains
Let’s revisit these two arguments in the context of private blockchains. The first thing to note about private chains is that they tend not to have a native token or cryptocurrency. This is for several reasons:
- The sort of entities interested in private chains don’t want to deal with a new asset class.
- The consensus model for private chains is based on agreement between a set of closed miners, rather than proof of work. So the cost of mining is minimal and miners don’t need much reward.
- Since all the participants in a private chain are vetted, there is less concern over spam and abuse.
Recall that the first argument for global execution was to enable each Ethereum node to decide whether to relay an incoming transaction, based on the fee it offers. Well, the lack of a native token renders this reason irrelevant because, if a blockchain has no native token, transactions cannot pay fees. If for some reason spam remains an issue, it has to be controlled another way, e.g. by revoking the sender’s permissions.
Now let’s consider the second argument, to enable compact data proofs. A public blockchain is likely to have end users on mobile or other lightweight wallets. But this is less likely with private chains, whose primary function is sharing a database between larger companies. And if the blockchain is accessed from a mobile device, the mobile user is likely to be a customer of one of these companies, and can trust what that company tells them.
Instead, in private blockchains, the problems of global execution are especially acute. If a private blockchain has no native token, we have no gas-like market mechanism for preventing runaway code. Instead we would need to introduce some kind of fixed limit in terms of computational steps per transaction. But in order to allow transactions to intentionally perform a lot of processing, this limit would need to be high. As a result, the network could still end up wasting a lot of energy on unintended loops before finally shutting them down.
As for concurrency, private blockchains are far more likely to see the sort of transaction volumes that make concurrency essential. The capacity of public blockchains is limited by the fact that, in order to be meaningfully decentralized, they need thousands of nodes run by enthusiasts with limited budgets. By contrast private chains are far more likely to connect just a few dozen enterprises together, in which case capacity and speed are essential.
Double decker blockchains
So if everything about smart contracts makes sense in private chains, apart from global execution, where does this leave us? What type of blockchain will give us the performance and flexibility we need? To be honest, I’m still thinking about it. But one answer might be: a blockchain with two tiers.
The lower tier would be built on bitcoin-style transactions which are processed instantly and concurrently, and don’t need to wait for block confirmations. These transactions could perform simple movements of assets, including safe atomic exchanges, without resorting to smart contracts. But this lower tier would also be used as a blind storage layer for the programs and messages that represent more complex business processes, embedded as transaction metadata.
As for the upper tier, each network participant would choose which programs they want to run. Some might choose to run none at all, because they are only interested in simple asset movements. Others might execute a small group of programs that are relevant to their internal processes (with the knowledge that this group exchanges no messages with programs outside). A few might even opt for global execution, processing every message for every program, just like Ethereum. But the key thing would be that every node runs only the code it needs to. In computer science this technique is called lazy evaluation, because it entails doing as little work as possible, without omitting anything crucial. With lazy evaluation, if a blockchain computation goes awry, only those nodes which actually execute that program will notice. The network itself won’t feel a thing.
As for MultiChain, if we do end up supporting Turing complete computation, I doubt we’ll implement global execution. Perhaps we’ll go for this kind of lazy two-tiered approach, or perhaps we’ll think of something better.
Smart contracts in public blockchains
As I argued earlier, in public Turing complete blockchains like Ethereum, there are good reasons for global execution. But here’s another question: what is the enterprise use case for these chains? Let’s imagine some future time when enterprises have sufficient confidence in public blockchains to use them for real business processes. If a group of companies wants to embed some computational logic in a public blockchain, they have two choices: (1) using an Ethereum-style blockchain with global execution, or (2) using any blockchain as a simple storage layer and executing the code themselves. And given these options, why would they choose (1)?
This discussion is related to the notion of abstraction layers, made famous by the OSI networking model. For optimal reliability and flexibility, each layer of a system should be as abstracted (i.e. independent) from the other layers as possible. For example, we wouldn’t want our hard disk controllers to contain code for rendering JPEG images. So why would we want a blockchain to execute the programs that it stores? For the majority of use cases, we derive no benefit from this, and it comes at a significant cost.
If global execution doesn’t make sense in private blockchains, why is everyone working on this stuff? I think this can be explained, at least in part, by a misunderstanding over what blockchains can do in the real world. You see, public blockchains like bitcoin can directly move a real asset (namely, their native currency), because the blockchain defines the ownership of that currency. This conflates two aspects of assets which are usually distinct: (a) a ledger which records who owns the asset, and (b) its actual physical location. This makes cryptocurrencies the ultimate bearer instrument, creating a brave new world or a money launderers’ paradise, depending on who you ask.
But for other assets which exist independently of a blockchain, the only thing a chain can do is hold a record of who they should belong to. This will remain the case until we see the primary issuance of assets onto a blockchain, with legal ownership of that asset defined in terms of the chain’s database. For the institutional finance sector, I believe this day is still a long way off, not least because of the regulatory changes required. Until then, there will always be an extra step, contractual and procedural, between what the blockchain says and what happens in the real world. This step might as well include some Turing complete code, lazily executed at the last possible moment.
This problem is highlighted by the case of “smart bonds” that we’ve heard so much about. A smart bond is directly issued onto a blockchain, with the blockchain ensuring that coupon payments are made to the bond holders at the appropriate times. All well and good. But what happens if the bond issuer has insufficient funds in their blockchain account to cover a payment which is due? The blockchain can certainly set a flag to say that something is amiss, but it can’t do anything else. We still need an army of lawyers and accountants to sort the whole mess out, whether by a haircut, debt restructuring, forfeiture or outright bankruptcy. In short:
If smart contracts can’t deliver their promise, why are we paying their price?
Thank you for reading.