ResilientDB: High Throughput Yielding Permissioned Blockchain Fabric
Ongoing research at UC Davis.
A key concentration of my doctoral thesis is towards the design and analysis of efficient
Byzantine Fault-Tolerant (BFT) protocols with a special emphasis on blockchain systems.
I envision blockchain's use case much more than just a crypto-currency (Bitcoin!).
For me the guiding principles of blockchain are democracy and decentralization.
An efficient blockchain application can not only help achieve the above guiding principles
but also sustain against malicious attacks.
Existing research on blockchain technology largely focusses on the design of a "fast"
Although this is a good goal to achieve, this goal limits the capability within the blockchain
We believe blockchain can serve as a solution to problems in several distinct domains.
To realize this potential, I have led the design of our high-throughput yielding permissioned
blockchain fabric, ResilientDB.
Through ResilientDB we want to present Blockchain as a Service (BaaS).
Permissionless blockchain designs (such as Bitcoin and Ethereum) need to incentivize the participating
replicas (miners) to act honest and follow the protocol.
Such requirements are not required by a permissioned blockchain application.
However, existing permissioned blockchain systems such as Corda, MultiChain, and Hyperledger all
achieve very low throughput and have poor scalability.
ResilientDB, on the other hand, not only achieves high throughput (of the order 100K txns/s) but also
scales across continents.
My doctoral research has focussed in designing several such efficient BFT consensus protocols
that can help a permissioned blockchain scale to a large set of replicas without affecting their
Some of our recent works in this direction are part of the proceedings of ICDE'21, VLDB'20, ICDCS'20 and DISC'19.
Transaction commit protocols help in reaching an agreement among the participating
nodes when a transaction has to be committed or aborted.
One of the earliest and most popular commitment protocol is the two-phase commit (2PC) protocol.
Prior works have shown that 2PC is blocking.
This blocking characteristics of the 2PC protocol endangers database availability and
makes it unsuitable for use with the partitioned databases.
The inherent shortcomings of the 2PC protocol led towards the design of resilient
three-phase commit (3PC) protocol.
The 3PC protocol introduces an additional state, which ensures there is no direct
transition between the non-committable and committable states.
Although this simple modification makes the 3PC protocol non-blocking under node failures,
it acts as the major performance suppressant in the design of efficient distributed databases.
We design the EasyCommit (a.k.a EC) protocol, which requires two phases of communication
and is non-blocking under node failures.
We associate two key insights with the design of our EC protocol:
(i) delaying the commitment of updates to the database until the transmission of global decision
to all the participating nodes, and
(ii) inducing message redundancy in the network.
Further, we have noted that these commit protocols are employed in
geographically large scale systems.
Such systems, adhering to the philosophy of partitioned databases, require complex agreement
protocols that are both non-blocking and topology-aware.
A Geo-scale aware system should take advantage of relative distance between the nodes, that is,
Moreover, the communication across these clusters should be limited to reap benefits.
Hence, even EC protocol may not suit the Geo-Scale settings.
This led us to the design of a novel topology-aware agreement protocol for the
geo-scale systems – Geo-scale EasyCommit (GEC).
EasyCommit (EC) was published at EDBT'18, while GEC was part of DAPD'19.
Testing Weakly Consistent Distributed System
Work done at Purdue University.
DistCheck facilitates applying Litmus testing to a given distributed application.
Litmus testing is a key technique used for validation of memory models.
Prior works have employed the seminal tool, Herd7 to perform Litmus testing on their applications.
Herd7 although comprehensive, lacks a clean documentation and necessary support for a new user.
Litmus testing when applied to a distributed application helps to study
the application's semantics with different consistency models.
Such a testing can help reveal bugs and developer's misinterpretation
of the underlying store semantics.
Prior works, including Herd7, do not provide intuitive ways of expressing and testing
the desired level of consistency on a distributed application.
DistCheck fills this gap by allowing programmers to specify a Litmus testcase that consists of:
(a) a set of sessions,
(b) an initial state of the program variables, and
(c) the condition to be validated.
DistCheck also allows developer to specify the semantics of the underlying store
using relations such as session order (so), same object (same) and visibility (vis).
Once the developer has given the required inputs, DistCheck
performs a simple symbolic execution and employs the Z3 SMT solver to check the
validity of the condition.
In the most simplest form, each session is just a set of read and write operations.
Existing task parallel languages allow the programmer to express
the desired amount of parallelism (a.k.a ideal parallelism), while delegating
the task of extracting the useful parallelism to the compiler and/or runtime.
Recursive Task Parallel (RTP) programs constitute an important subset
of such task parallel programs.
In RTP programs, each task can recursively create newer tasks and wait for
those respective tasks to terminate.
This leads to the execution of a large number of redundant task-creation and-termination operations.
Importantly, the structure of RTP programs makes it quite challenging to
identify and eliminate such redundant operations.
We design a new optimization DECAF that helps to eliminate redundant task-creation and-termination operations.
Further, we show that DECAF leads to improved execution times
(geometric mean of 2.14xon the Intel and 2.53x on the AMD system, with
respect to the state-of-the-art optimized code.
This work was part of ICS'17
IMSuite: IIT Madras Benchmark Suite for Simulating Distributed Algorithms
Work done at IIT Madras.
Prior benchmark suites in the distributed systems community includes either
large applications or micro benchmarks.
Moreover, these benchmarks neither had sufficient amount of parallelism nor
exploited different parallel constructs.
This led to design of our IMSuite benchmark suite.
IMSuite implements some of the key classical distributed algorithms.
These classical algorithms cover important characteristics of distributed
systems such as communication (broadcast, unicast, or multicast), timing
(synchronous, asynchronous or partially synchronous) and failure.
These algorithms have been implemented as kernel benchmarks.