How to spot a half-baked blockchain

Posted December 14, 2016 by Gideon Greenspan in Private blockchains.

When chains and blocks serve no useful purpose

About 18 months have passed since the finance sector woke up, en masse, to the possibilities of permissioned blockchains, or to use the more general term, “distributed ledgers”. The period since has seen a tsunami of activity, including research reports, strategic investments, pilot projects, and the formation of many consortia. No one can accuse the banking world of not taking the potential of this technology seriously.

Naturally, the explosive growth in blockchain projects has driven the development of permissioned blockchain platforms, on which those projects are built. For example, our product MultiChain has tripled in usage over the past year, whether we measure web traffic, monthly downloads or commercial inquiries. And of course, there are many other platforms, such as BigChainDB, Chain, Corda, Credits, Elements, Eris, Fabric, Ethereum (deployed in a closed network), HydraChain and Openchain. Not to mention still more startups who have developed some kind of blockchain platform but have not made it publicly available.

For companies wishing to explore and understand a new technology, an abundance of choice is generally a good thing. However, in the case of blockchains, which still remain loosely defined and poorly understood, this cornucopia comes with a significant downside: many of the available “blockchain” platforms don’t actually address the core problem they are meant to solve. And what is that problem? Allow me to quote the succinct video definition by Richard Gendal Brown, CTO of R3, in full:

A distributed ledger is a system that allows parties who don’t fully trust each other to come to consensus about the existence, nature and evolution of a set of shared facts without having to rely on a fully trusted centralized third party.

To take an extreme example, consider a bunch of Lego bricks tied together with string. If we use the term “block chain” to describe this fashion item, who’s to say that we’re not describing it accurately? And yet, that particular chain of blocks will not help multiple parties to safely and directly share a database without a central intermediary. Similarly, many “blockchain” platforms do something related to chains of blocks, but also lack the necessary properties to serve as the basis for a peer-to-peer database.

Another chain of blocks that does not help with database sharing – source.

Minimum viable blockchain

In order to understand the basic requirements of a distributed ledger, it helps to clarify how these systems differ from regular databases, which are controlled by a single entity. For example, let’s consider a simple system for tracking who owns a particular company’s shares. The ledger, as implemented in a database, has one row for each owner containing two columns: the owner’s identifier, such as their name, and the corresponding quantity of shares.

Here are six crucial ways in which this system could fail its users:

Forgery: Transferring shares from one person to another without the sender’s permission.
Censorship: Refusing to fulfill someone’s request to transfer some shares elsewhere.
Reversal: Undoing a transfer that took place at some point in the past.
Illegitimacy: Changing the total quantity of shares in the system without a corresponding action by the issuer.
Inconsistency: Giving different responses to inquiries from different users.
Downtime: Not responding to incoming requests for information at all.

Because of all these possibilities, the shareholders must maintain a high level of trust in whoever is managing this ledger on their behalf. Building and running an organization worthy of that trust comes with substantial hassle and cost.

Blockchains or distributed ledgers remove the need for this kind of central database operator, by allowing the users of a database to interact directly with each other on a peer-to-peer basis. In our example, the stockholders could safely hold their shares on a blockchain which they collectively manage, and make transfers to each other instantly over that chain. (The disadvantage is a significant loss of confidentiality between the chain’s users, which we won’t address here but I’ve previously discussed at length.)

All this brings us back to the question of blockchain platforms. In order to serve as a viable basis for peer-to-peer database sharing, a blockchain has to protect its participants against all six types of database failure – forgery, censorship, reversal, illegitimate transactions, inconsistency and downtime. While many products in the market fulfill these requirements, quite a few of them come up short. I call these blockchains “half-baked” because they may address some of these risks, but not all. In some respects at least, the database’s users remain dependent on the good behavior of a single participant, which is precisely the scenario we want to avoid.

These half-baked blockchains come in any number of varieties, but three archetypes stand out as the most common or obvious. I’m not going to name individual products because, well, I don’t want to offend. The blockchain startup community is small enough that most of us know each other through conferences and other meetings, and the interactions tend to be positive. Nevertheless, if blockchains (in the sense of useful peer-to-peer databases) are ever going to emerge as a coherent product category, it’s important to distinguish between half-baked and real solutions.

The one validator blockchain

One pattern we’ve seen a few times is a blockchain in which only one participant can generate the blocks in which transactions are confirmed. Transactions are sent to this one node instead of being broadcast to the network as a whole, so their acceptance is subject to this party’s whims rather than some kind of majority consensus. Still, once a block has been built by this central party, it is broadcast to the other nodes in the network, who can independently confirm the validity of the transactions within, and record the new block locally and permanently.

To return to our six forms of database malfunction, this type of blockchain is far from useless. Transactions must be digitally signed by the entity whose funds they move, so they cannot be forged by the central party. They cannot be reversed because each node maintains its own copy of the chain. And transactions cannot perform illegal operations like creating assets out of thin air, because every node independently validates each transaction for correctness. Finally, each node maintains its own copy of the database, so its content is always available for reading.

Unfortunately, four out of six is not enough. The validating node can easily censor individual transactions, by refusing to include them in the blocks it creates. Even if the operators of this node are honest, a system or communications failure can render it unavailable, causing all transaction processing to come to a halt. In addition, depending on the setup, the validating node may be able to transmit different versions of the blockchain to different participants. In terms of censorship and consistency, the database still contains a single point of failure, on which all the other nodes rely.

One platform offers a twist on this scheme, in which blocks are centrally generated by a single node, but a quorum of other designated nodes signs them to indicate consensus. In terms of the risk of inconsistency, this certainly helps. The nodes in the quorum will only lend their signatures to a single version of the blockchain, which can therefore be considered as authoritative. Nonetheless, the quorum nodes cannot help if the block generator censors transactions, or loses its connection to the Internet. Ultimately, this type of blockchain still uses a hub-and-spoke architecture, rather than a peer-to-peer network.

The shared state blockchain

Technically speaking, there are many similarities between blockchains and more traditional distributed databases such as Cassandra and MongoDB. In both cases, transactions can be initiated by any node in the network, and must reach all the other nodes as part of a consensus about the database’s developing state. Both blockchains and distributed databases have to cope with latency (communication delays which stem from the distance between nodes) and the possibility of some nodes and/or communication links intermittently failing.

Distributed databases have been around for a while, so any blockchain platform developer would do well to understand their consensus algorithms and the strategies they use to globally order transactions and resolve conflicts. Nonetheless, it’s important not to take the comparison too far, because blockchains must contend with a crucial additional challenge – an absence of trust between the database’s nodes. Whereas distributed databases focus on providing scalability, robustness and high performance within a single organization’s boundaries, blockchains must be redesigned in order to safely traverse those boundaries.

To return to our six types of database risk, a node in a distributed database need only worry about downtime, i.e. the possibility of other nodes becoming unavailable. Nodes can safely assume that every transaction and message on the network is valid, and are not concerned with forgery, censorship, reversal, illegitimacy or inconsistency. Their worst problem is dealing with two simultaneous but valid transactions, initiated on different nodes, which affect the same piece of data. Solving these conflicts is by no means trivial, but it’s still a lot easier than worrying about “Byzantine faults“, in which some nodes deliberately act to disrupt the functioning of others.

A database can only be shared safely across trust boundaries if nodes treat all activity on the network with a certain degree of suspicion. For example, every transaction which modifies the database must be individually digitally signed since, in a peer-to-peer architecture, there is no other way to know its true point of origin. Similarly, every incoming message, such as the announcement of a new block, has to be critically assessed for its content and context. Unlike in distributed databases, nodes must not be able to immediately and directly modify another node’s state.

Some “blockchain” platforms have been developed by starting with a distributed database, and sprinkling some features on top to make them more blockchainy. For example, by grouping transactions into blocks and storing hashes (digital fingerprints) of those blocks in the database, they aim to add a form of immutability. But unless each node can be sure that its list of hashes cannot be modified by another node, this type of immutability is easily gamed. The standard response to these criticisms is that every security problem can be solved with sufficient time and coding. But this is rather like holding some prisoners in an open field, and trying to stop them escaping with tripwires and ditches. It’s far safer to use a purpose-built concrete structure, whose doors are locked and whose windows are barred.

The one cloud blockchain

By far the strangest phenomenon I’ve seen is blockchain platforms which can only be accessed through their developer’s cloud-based platform-as-a-service. To be clear, we’re not talking about some of a blockchain’s participants choosing to host their nodes on their cloud provider of choice, such as Microsoft Azure or Amazon Web Services. Rather, this is a blockchain which can only be accessed through APIs exposed by the servers of a company “hosting” it.

Let us grant, for argument’s sake, that a centralized blockchain provider genuinely has a group of nodes running under its control. What difference does this make to the users of the system who are sending API requests and receiving responses? The participants have no way of assessing if everyone’s transactions have been processed without omission or error. Perhaps the central service is malfunctioning, or perhaps it is censoring or reversing some transactions deliberately. And if you believe the blockchain provider has no reason to do this, why not use them to host a regular centralized database instead? You’ll get a more mature product with better performance, and suffer none of the risks of working with new technologies. In short, centralized blockchains are about as useful as Lego on a string.

Solving the mystery

We’ve now seen three types of platform which market themselves as “blockchains”, and indeed make some use of a chain of blocks, but which don’t solve the fundamental problem for which these systems are designed. To recap, this is to enable a single database to be safely and directly shared across trust boundaries, without a central intermediary.

Apart from pointing at this peculiar phenomenon, I believe it’s instructive to consider what might underlie it. Why are so many blockchain startups building products which don’t fulfill the promise of this technology, often achieving no more than traditional centralized or distributed databases? Why are so many talented people wasting so much of their time?

I can see two main classes of explanation – technical and commercial. To start with the technical, it is rather tricky to create distributed consensus systems which can tolerate one or more nodes behaving maliciously in unpredictable ways. In the case of MultiChain, we somewhat cheated, by using bitcoin’s battle-hardened reference implementation as a starting point, and then replacing proof of work by a structurally similar consensus algorithm called “mining diversity”. Teams developing a blockchain node from scratch have to think deeply about asynchronous and adversarial processes – a combination which few programmers have experience of. I can certainly understand the temptation to take a shortcut, such as using a single node to generate blocks, or piggybacking on an existing distributed database, or only running nodes in a trusted environment. Choosing any of these undoubtedly makes life easier for developers, even if this undermines the entire point.

As for commercial reasons, every startup seems to be approaching the blockchain opportunity from a different angle. Here at Coin Sciences, we’re focused on becoming a (database) software vendor, so we’re distributing MultiChain for free while developing a premium node with additional features. Other startups want to sell subscription services, so they will naturally build a platform which customers cannot host themselves. Some are hoping to centrally control a blockchain or help their partners to do so (an odd ambition for a disintermediation technology!) and are naturally drawn to consensus algorithms that rely on a single node. And finally, there are companies whose primary goal is to sell consulting services, in which case their platform need not function at all, so long as its website brings in some large customers.

Perhaps another issue is that some blockchain companies are being run by people who are undoubtedly bursting with talent, but lack a deep understanding of the technology itself. In startups carving out a new field, it’s probably vital for strategic decisions to be taken by people who understand the nature of that field and how it differs from what came before. Not a few blockchain startups appear to have painted themselves into a corner by pursuing a product vision which is attractive to their customers, but cannot actually be built.

As a user of blockchains, how can you avoid being caught by these fallacies? When evaluating a particular blockchain platform, be sure to ask whether it fulfills the six requirements of safe peer-to-peer database sharing: prevention of downtime and inconsistency, as well as transaction forgery, censorship, reversal and illegitimacy. And beware of explanations that consist of too much mumbling or hand waving – they probably mean that the answer is no.

Please post any comments on LinkedIn.

How to spot a half-baked blockchain

Minimum viable blockchain

The one validator blockchain

The shared state blockchain

The one cloud blockchain

Solving the mystery

Recent Posts

Categories

Archives

RSS feed