Taking a break from our series, David Beberman wrote this post after viewing the Scaling Ethereum 2019 LIVE — Day Two presentation.
A lot of research has gone into how to make blockchains scale through sharding. From what I can tell, the main concept is to enable parallel execution of multiple transactions on separate shards, without compromising the immutability and security of the blockchain, including all the shards. Most of the research I’ve been able to find focuses on algorithms for consensus on shards. Although all of this research looks very promising, I want to take a look at sharding from a different angle.
For the sake of argument, let’s say that a consensus algorithm for sharding exists. Further, let’s say that this algorithm runs on an open permissionless blockchain, and is available today. My question is, even with this, do we get the scaling that is hoped for with sharding. To get a feel for this, lets take a quick look at Amdahl’s Law:
This shows that the theoretical speedup of the execution of the whole task increases with the improvement of the resources of the system and that regardless of the magnitude of the improvement, the theoretical speedup is always limited by the part of the task that cannot benefit from the improvement. This essentially states that the throughput of a system, once all the parallelizable portions are maximized, is limited to the throughput of the serialized portions.
For blockchain sharding, I interpret this as meaning that the potential increase in throughput is currently limited to the quantity of transactions that can be executed simultaneously on separate shards. That is, if a transaction on a shard needs data from another shard it has to synchronize the transfer of the data from the other shard. This is a point of serialization and given a large pool of shards, according to Amdahl’s Law dominates the throughput.
Individual Native Coin Transactions
A transaction between two accounts limited explicitly to only transferring coin balances between the accounts, such as sending coin between two accounts, does not need data from any other account. Therefore, provided the data for both accounts is available on a particular shard, the transaction can be executed asynchronously with other transactions on other accounts. This scales with the number of shards and the number of disjoint transactions between pairs of accounts. As the number of accounts grows, one would expect that the opportunity for sharding such independent transactions grows as well. In the limit the throughput is dominated by the time it takes to execute a single transaction regardless of how many transactions are being executed at any given point in time. This is exactly the situation we want for improving blockchain throughput.
Smart Contracts For Sharding
Things do not workout so well for smart contracts. A smart contract is implemented on the blockchain as a single ledger account with data state associated with the program code. Each transaction runs through the code changing the data state. Each change is recorded on the blockchain and verified with a hash representing the state. To maintain a deterministic, consistent state of the blockchain, each smart contract can only be executed on one shard at a time, unless the state of the smart contract account itself can be sharded. In general, this makes the smart contract account execution the limiting factor for all of the accounts that are sending transactions to the smart contract. As smart contracts are used for tokenization, it is highly likely that as a given token increases in circulation, its smart contract becomes a bottleneck for throughput, regardless of how many shards exists, as predicted by Amdahl’s Law.
With the current implementation model for smart contracts I see only two possible ways for scaling with sharding:
Use multiple smart contracts segregated on shards
Use deterministic multi-threaded smart contracts, (aka SIMD)
Multiple smart contracts can take advantage of sharding. For example, if each smart contract representing a token is assigned to a separate shard, then transactions for a given token do not affect transactions on other tokens. Although each individual token’s transactions are limited to the throughput of the smart contract on its specific shard, with a large and growing number of tokens, the number of shards can grow linearly with them. This doesn’t solve the problem of the throughput of an individual smart contract, but it is an improvement over all the smart contracts on a single blockchain without sharding. Even with this approach, issues crop up if any state is needed from a user account that is shared among the shards (e.g. native coin to pay for the transactions).
A technique from supercomputing called vectorization enables a program to execute sections of its code in parallel. This is known as single instruction multiple data (“SIMD”). Programs written for SIMD are written to be deterministic. In essence SIMD machines, which today are general purpose graphics processor units (“GPGPU”), shard their data across an array of processors. This works very well for certain classes of applications, such as matrix operations for graphics and similar.
This technique could be applied to blockchain smart contracts theoretically. That is, a smart contract could be written explicitly to support parallel execution of transactions in some manner. I don’t know exactly how this would be implemented. Perhaps blockchain researches will come up with something innovative in this area. However, even with such a solution, writing a smart contract that is SIMD capable becomes significantly more complex, one would expect.
Neither multiple smart contracts nor SIMD smart contracts are ideal solutions, although both may provide some opportunity for scale.
Smart Object Assets Help Enable Sharding
Limiting points for sharding with smart contracts is both sharding of state and code execution. If there were a means to avoid transactions serializing on a single smart contract state and code execution, sharding could increase throughput scaling. To put this another way, if there was a means for multiple instruction multiple data (“MIMD”) execution, the opportunity for blockchain sharding would be significantly improved.
As was described in “Rethinking The Blockchain Account Concept”, if each user account had its own state, instead of using separate smart contracts, then each user account could contain objects that represent assets, whether as tokens or other types of entities. As described in “Extensible Smart Object Assets, Smart Object Asset Ownership and Fractional Smart Object Asset OwnershipWith the DataGrid Blockchain Extensible Blockchain Object Model”, (XSOA)’s and references to XSOA’s could be used to transfer ownership between accounts with transactions directly between the account states.
For example, given two sets of transactions, where each transaction is between different accounts, that is: one transaction is from account A to account B; and another transaction is between account C to account D, then the transactions can be executed on different shards simultaneously. Further, because the code for the XSOA’s is independent of any of the accounts, and may be different code for each of the transactions we can realize sharding for a MIMD model. That is different code on each shard and different data on each shard.
The limiting point for scale here is the number of transactions that can take place simultaneously between disjoint account sets. We would expect that as the quantity of accounts grows, the opportunity for disjoint account sets within any group of transactions grows as well, which in turn would result in a growing opportunity for sharding.
Using as a given the availability of a sharding consensus algorithm, an outstanding question is how to make use of such technology. Smart contracts inherently serialize transactions and other than a complex SIMD type solution, only offer scaling by using multiple separate isolated smart contracts. Even with that, each smart contract’s throughput is limited to a single shard’s throughput. By rethinking the user account to include state information, and using the XBOM model, the DataGrid Blockchain offers a solution to sharding scalability that scales with the number of accounts and disjoint transactions among the accounts. In addition to enabling inheritance and live code reuse, we believe that this is a significant solution to the blockchain scaling problem.