NoSQL And NewSQL: A Comparison Of Distributed Database Systems

Transcription

SCYLLADB WHITE PAPERNoSQL and NewSQL:A Comparison ofDistributed DatabaseSystems

CONTENTSNOSQL AND NEWSQL: TWO TAKES ONDISTRIBUTED DATABASE ARCHITECTURE3COCKROACHDB AND SCYLLA: NOSQL AND NEWSQL4INTRODUCTION TO SCYLLA4PERFORMANCE5CONSISTENCY AND AVAILABILITY5CASSANDRA QUERY LANGUAGE AND ACID GUARANTEES6DATA DISTRIBUTION6DATA MODELS6NOSQL VERSUS NEWSQL: MEASURING PERFORMANCE7THE INITIAL DATA LOAD: 1 BILLION KEYS VS 100 MILLION KEYS7YCSB: RESULTS FROM WORKLOAD A8RESULTS FROM YCSB WORKLOADS A THROUGH F9SCYLLACOCKROACHDBCONCLUSION91011

NOSQL AND NEWSQL: TWO TAKESON DISTRIBUTED DATABASEARCHITECTUREMany IT organizations have become familiarwith database tradeoffs, often wrestling withthe fundamental decision between relationaland non-relational databases. Organizations thatneeded a highly available distributed databasechose non-relational NoSQL, while those thatneeded strong consistency and transactionschose relational SQL.Non-relational databases introduced a spate ofnovel query languages, the most prominent ofwhich is the Cassandra Query Language (CQL).Common CQL-compliant databases includeApache Cassandra, Scylla, DataStax Enterprise,and even Microsoft’s cloud-native Azure CosmosDB and Amazon Keyspaces.As cloud computing became the norm, nonrelational databases proliferated. But developersbegan to wonder if it might be possible tore-introduce features of relational databasesthat had been sacrificed by the generationof non-relational, NoSQL databases. Coulda distributed, highly available database alsoprovide strong consistency? Perhaps it waspossible to have one’s cake and to eat it too.A team at Google took up the challenge andestablished a new generation of distributeddatabases that came to be known as “NewSQL.”The first NewSQL database, Spanner, is a“scalable, multi-version, globally distributed, andsynchronously replicated database.” Spannerwas designed specifically to address limitationsin Google’s Bigtable database. According to thewhite paper, Spanner was conceived in part forapplications that require “strong consistencyin the presence of wide-area replication.” Howdoes Spanner accomplish this? The paper statesthat “the linchpin of Spanner’s feature set isTrueTime.” Spanner is able to consistently ordertransactions across distributed servers becauseservers in Google datacenters around the worldall share precisely the same high-precisionclocks (HPC). Spanner leverages TrueTime toexecute transactions more efficiently.Concepts from the Spanner paper were alsoincorporated in another distributed database,CockroachDB. CockroachDB is an opensource alternative to Google’s implementationof Spanner. In contrast to Google Spanner,CockroachDB runs on commodity hardware,which lacks the high-precision clocks thathelp Spanner to scale consistent transactions.Instead, CockroachDB uses different, thoughsimilar, algorithms and techniques to providedistributed transactions like those of Spanner.The model traditionally used for comparingtradeoffs in distributed systems is known as the“CAP theorem.” While not developed specificallyfor describing databases, the CAP theoremdoes provide a useful model for articulatingthe tradeoffs between NoSQL and NewSQLdatabases. According to the CAP theorem adistributed system can provide only two ofthe following three attributes: Consistency,Availability, and Partition tolerance.NoSQL databases are generally consideredto be AP systems, providing Availabilityand Partition tolerance at the expense ofConsistency. In contrast, NewSQL databasesprovide Consistency, Availability and Partitiontolerance. (According to Eric Brewer’spainstaking analysis, Google Spanner istechnically a CP system that can claim to be an“effectively CA” system. Such nuances, whileimportant, are beyond the scope of this paper.)The fusion of strong consistency withdistributed architecture is undeniably attractive.The question is whether NewSQL can deliveron this promise without compromising in othercritical areas – primarily performance. Notably,the traditional CAP theorem makes no provisionfor performance or latency. For example,according to the CAP theorem, a database canbe considered Available if a query returns aresponse after 30 days. Obviously, such latencywould be unacceptable for any real-worldapplication.A newer way to model databases, called thePACELC theorem, extends beyond the CAPtheorem model, showing that systems caneither tend towards latency sensitivity or strong3

Figure 1: this “family tree” of modern distributed databases show how both NewSQL systems like CockroachDB(to the left) and NoSQL systems like Scylla (to the right) stemmed from scalability challenges addressed in theoriginal Google whitepaper for Bigtable (center).consistency. In this way, a NewSQL system suchas CockroachDB is defined as PC/EC — focusingon being strongly consistent — whereas aNoSQL system like Scylla will be defined as PA/EL — highly available and latency sensitive.The technical team at ScyllaDB decided toevaluate the performance characteristics ofNoSQL versus NewSQL to highlight thesedifferences. In this paper, we compare theperformance characteristics of Scylla, a highlyavailable, distributed, best-of-breed NoSQLdatabase, against the best-in-class NewSQLdatabase, CockroachDB. Such a comparison isadmittedly an “apples to oranges” relationship,as the two databases utilize radically differentarchitectures.COCKROACHDB AND SCYLLA:NOSQL AND NEWSQLINTRODUCTION TO SCYLLALike CockroachDB, Scylla has been underdevelopment since 2014. Scylla is a widecolumn NoSQL database that uses a tunableeventual consistency model to provide fast andefficient reads and writes along with the multidatacenter high availability.In Scylla all nodes are equal; there is no singlepoint of coordination and each node can serveany request. All nodes work together to provideservice, even if one or more of the nodesbecome unavailable due to some failure. Thisenables Scylla to scale linearly to many nodeswithout performance degradation.4

Built on the same underlying principles asCassandra, Scylla is fully compatible withCQL and provides all of the functionality ofCassandra. Scylla also provides a DynamoDBcompatible API known as Project Alternator.PERFORMANCEAs a high-performance, next-generation NoSQLdatabase, Scylla focuses on extracting everydrop of available CPU and networking powerto maximize performance. The computingmodel that underpins Scylla is completelyasynchronous, running a native task-switchingmechanism developed to run a million lambdafunctions (“continuations”) per second, per core.In a high-performance database, control ismore important than efficiency. In Scylla, everycomputation and I/O operation is a memberof a priority class. CPU and I/O schedulerscontrol the execution of continuations. Latencysensitive classes, such as read and writeoperations, are dynamically afforded a higherpriority than background maintenance tasks,such as compaction, repair, and streaming. Toprovide such tight control, Scylla optimizes I/Ooperations to be executed in the most efficientway possible on available hardware, pinningthreads to CPU cores, threads to shards, andlimiting filesystem/disk load to I/O that matchesthe hardware’s capacity.CockroachDB’s primary focus is to provideconsistency and availability, along withflexibility and support for SQL. It provides highavailability the same way Scylla does — withredundancy — but it maintains consistencyacross replicas while serving concurrentoperations. Maintaining consistency inevitablyimposes additional overhead. To mitigate theoverhead of consistency, CockroachDB uses anumber of innovative approaches designed toprovide high-performance quality of service.For example, CockroachDB leverages modernconsensus algorithms, such as Raft, Otherinnovations include a hybrid transactionsscheduler that makes reads linearization costfree. CockroachDB is written in Go and is, likeapplications written in Java, susceptible togarbage collection spikes.CONSISTENCY AND AVAILABILITYAs noted, a defining feature of NewSQLdatabases is support for consistent transactionswithin distributed topologies. The problemis that CockroachDB currently supports onlyone transaction execution mode: serializableisolation. This is a powerful feature whenan application requires strong isolationguarantees that are free of anomalies. It alsoprecludes a weaker isolation model wherehigher performance is preferred over strongtransactions.All reads and writes in CockroachDB executein the context of transactions. A transactionis an interactive session in which a clientsends requests and then finalizes them witha commit or abort keyword. To serializetransactions, CockroachDB offers a paralleltwo-phase commit variant along with a novelhybrid serialization scheduler. In the bestcase, CockroachDB is capable of committinga transaction in one round-trip time (RTT).In general, though, CockroachDB requires aserialization check on every read and write,as well as waiting until all writes have beenreplicated. CockroachDB uses indirection in theread path to atomically switch data visibility.Overall, according to Jepsen’s analysis,CockroachDB provides near strong-1SRconsistency.Like most non-relational databases, Scyllaimplements a model known as “eventualconsistency.” Eventual consistency supports therapidly growing number of modern workloadsthat depend heavily on availability and are lessdependent on guarantees of strong consistency.For example, during partitioning caused by anoutage, it is often preferable for an isolateddata center to continue to accept reads andwrites. While Scylla does lean strongly towardsavailability over consistency, it also offers anAPI for stronger consistency that leverageslightweight transactions (LWT).Unlike CockroachDB, Scylla enables consistencyto be tuned per transaction. For example,consistency level one ensures that queriessucceed even if one node acknowledges a reador write. A stronger consistency level ensures5

that a majority of replica nodes acknowledgea read or write. Where every replica node mustacknowledge that the transaction successfullycompleted, Scylla supports a consistency levelthat encompasses all nodes. In other words,Scylla supports consistency, albeit only withinsingle partition.As we’ve noted, CockroachDB favorsconsistency over availability. Yet, it leverages avariety of innovative approaches to enhanceavailability. Nevertheless, CockroachDB’s usageof the Raft consensus protocol dictates that aquorum of nodes within a ReplicaSet must bealive and accessible. While it enables consistencyto span partitions, this requirement also placesa limit on availability and performance perpartition within a CockroachDB cluster.CASSANDRA QUERY LANGUAGE AND ACIDGUARANTEESThe Cassandra Query Language (CQL),employed by Scylla, can be deceptively similarto SQL. Consider the following CQL query:SELECT * FROM Table;UPDATE Table (a, b, c) VALUES (1, 2, 3)WHERE Id 0;When using SQL, developers typically expectACID guarantees. However, Scylla does notprovide full ACID semantics for its operations.By definition, ACID guarantees provided in thecontext of a single transaction are: Atomicity: Transactions with multiplestatements are treated as a single unit. Consistency: Transactions can move databasestate only from one valid state to another validstate. Isolation: Concurrent transactions areexecuted as if they had been submittedsequentially, and all individual transactionshave a “before and after” relationship. Durability: The results of transaction executionare preserved, even in the event of systemfailures.What Scylla does not provide, from an ACIDperspective, is isolation. Isolation is typicallyrequired only by applications that performmulti-statement, cross-partition transactions.For a detailed analysis of this topic, you canread the Jepsen analysis of Scylla (See section3.4, Normal Writes Are Not Isolated) and ouraccompanying blog post.DATA DISTRIBUTIONAn even distribution of data across nodesin a cluster ensures that load can be evenlydistributed and processed by all nodes. In Scylla,data is distributed as uniformly as possibleusing a hash function (partitioner). Token ringand cluster topology configuration is sharedwith client applications, enabling an efficientchoice of the closest nodes, and even reachingthe specific CPU core that handles the partitionwithin the node. This capability minimizesnetwork hops and maximizes load balancing.CockroachDB implements a two-level indexingstructure, analogous with Spanner, and alsoused by Bigtable and HBase, that they referto as “order-preserving data distribution.”While this design creates complexity onthe CockroachDB internals, it enables a fullimplementation of SQL. Transactions are usedto insert and delete data into range. Each rangein CockroachDB is a Raft group. The unit ofreplication is a range.DATA MODELSScylla offers a natural RDBMS-like model wheredata is organized in tables that consist of rowsand columns on top of the wide-column storage.A row key consists of a partition key and anoptional clustering key. A clustering key definesrows ordering inside of the partition. A partitionkey determines partition placement.Users define tables schemas, insert datainto rows, and then read the data. There areusual concepts such as Secondary Indexes,Materialized Views, Complex Data Types,Lightweight Transactions, and other featuresbuilt on top.6

The wide-column data model differs from theclassical RDBMS-style model in that rows arenot first-class citizens but the cells are. (Rowsconsist of cells.)CockroachDB offers a classic relational datamodel with tables and rows built on top ofLSM-based key-value storage. CockroachDB iswire-compatible with PostgreSQL.NOSQL VERSUS NEWSQL: MEASURINGPERFORMANCETo summarize, Scylla and CockroachDB havebeen designed and engineered to addressdifferent problems and use cases. Followingdifferent design principles, the two databasesnaturally take different approaches to servinga variety of workloads. With these divergentpurposes in mind, it is still worthwhileto evaluate their relative performancecharacteristics. Again, we are comparing “applesto oranges.” Our goal is simply to highlight theperformance impact on the broader tradeoffbetween availability and consistency.To measure database performance undervarious workloads, we executed a series ofbenchmarks defined in the Yahoo! Cloud ServingBenchmark (YCSB) test suite. The YCSB is awidely used, industry-standard benchmarkingframework that measures scalability andperformance (defined as latency). The YCSBtests encompass a variety of workloads (Athrough F), each of which represents a differentaccess pattern. The six benchmarks representthe following workloads: Workload A: Update Heavy, 50/50 read/writeratio Workload B: Read Mostly, 95/5 read/write ratio Workload C: Read Only, 100/0 read/write ratio Workload D: Read Latest, 95/0/5 read/update/insert ratio Workload E: Short Range, 95/5 scan/insertratio Workload F: Read-Modify-Write, 50/50 read/read-modify-write ratioTo execute these benchmarks, we provisionedScylla and CockroachDB on AWS public cloudinfrastructure. For both databases, we createdsimilar clusters, each consisting of 3 nodesrunning on storage optimized i3.4xlarge AWSEC2 instances within a single geographicregion (eu-north-1), evenly spread across threeavailability zones with the standard replicationfactor (RF 3) and default configuration.To measure CockroachDB 20.1.6 performance,we used the brianfrankcooper/YCSB 0.17.0benchmark with PostgreNoSQL binding andCockroachDB v20.1.6 YCSB port to the Goprogramming language.For Scylla 4.2.0, we used the brianfrankcooper/YCSB 0.18.0 SNAPSHOT with a Scylla-nativebinding and a Token Aware load-balancing policy.THE INITIAL DATA LOAD: 1 BILLION KEYS VS100 MILLION KEYSInitially, we aimed to measure load time andthroughput under the stress, populating bothdatabases with a dataset of 1 billion keys.The official CockroachDB documentation statesthat the storage capacity limit is set at 150GBper vCPU and up to 2.5TB per node total. Evenso, we were unable to successfully load 1 billionkeys into the CockroachDB cluster. The clusterbecame unresponsive after 3-5 hours of loading,generating critical errors in the logs. Similarbehavior was observed during a subsequent30-minute long sustained workload test.Our observation shows the load throughput forthe CockroachDB cluster degraded from 12Ktransactions per second (OPS) down to 2.5KOPS within 3-5 hours. Loading 1 billion keys ata rate of 2.5K keys per second was projected totake about 111 hours or 4.5 days. Notably, similarissues were observed by YugaByte: [1 billiontrial], [slowdown], and [results].These issues led us to reduce the dataset sizefor CockroachDB to 100 million keys. Withthe smaller dataset, loading took 7 hoursand resulted in 1.1TB of data, which was latercompacted to 450GB. The latency graph overthis 7 hours period can be seen in Figure 2.7

Figure 2: P99 latency for loading 100 million keys into CockroachDB.In contrast, Scylla easily loaded 1 billion keys,which represented about 4.8TB of data, in aboutthree hours, exhibiting the same performancecharacteristics as with the smaller dataset. Ittook just 20 minutes for Scylla to load 100 millionkeys, which translated into just 300GB of data.OF NOTE: It took Scylla 20 minutes to load100 million keys, compared to 7 hours forCockroachDB. That’s 20x more efficient atloading a comparable dataset. In fact, loading1 billion keys took Scylla less than half the time ittook CockroachDB to load 100 million keys.YCSB: RESULTS FROM WORKLOAD AWith the initial data load complete, we wereable to evaluate results produced by YCSBWorkload A, which represents a 50/50 mix ofread and write workloads. This benchmark testsdatabase performance when clients activelywrite and read at the same time.Under this workload, Scylla achieved 120K OPSwith P99 latency under 4.6ms, while CPUutilization remained under 60%, and 150K OPSwith P99 under 12ms for the 1 billion key datasetsize.The dashboard in Figure 3, taken from ScyllaMonitoring Stack, displays the performanceof the Scylla cluster for 1 billion keys serving120,000 OPS at 600 µsec P99 latency for writesand below 4ms of P99 latency for reads.The dashboard in Figure 4 displaysCockroachDB’s results for Workload A,running against a dataset of 100 million keys.CockroachDB produced at best about 16KOPS with P99 latency at 52ms, generating anintermediate spike of 200ms at CPU utilizationthat varied between 50% and 75%.OF NOTE: Compared with CockroachDB, Scyllahandled 10x the amount of data while providing9.3x the throughput at 1/4 the latency.Figure 3: Performance of Scylla cluster for 1 billion keys under Workload A.8

Figure 4: CockroachDB results for Workload A.RESULTS FROM YCSB WORKLOADS ATHROUGH FThe graph in Figure 5 displays throughputand latency for a 100 million key dataset onCockroachDB versus 1 billion key dataset onScylla. By emphasizing performance overconsistency, Scylla can provide 10x betterthroughput and 4x better latency for a datasetthat is 10x larger in size.ScyllaFor the 1 billion key dataset, Scylla successfullymanaged to serve 150K-200K OPS on mostof the workloads at 75-80% utilization withreasonable latency. On average, Scylla achieved180K OPS with p99 latencies under 5.5ms at75% load.Workload A CRDB (100M keys) vs Scylla (1B keys)15000060 ms15000040 ms5000020 ms012 ms16000CRDB/CY/A/zipfianScylla/OY/A/zipfanWorkload ClassThroughputP99 LatencyFigure 5: Workload A for CockroachDB(100M keys) vs Scylla (1B keys)msLatency [ms]Throughput [OPS]52 ms100000The charts in Figure 6 display the throughputand latency achieved by Scylla for each YCSBworkload. For Workloads A-D, Scylla deliveredpredictable throughput between 150K and 180KOPS at 5.5ms to 12ms P99 latency.Similarly, the Figure 7 chart demonstrates thatScylla delivers predictable performance at largescale, while approaching maximum systemutilization.For example for workload D with a dataset of 1billion keys, Scylla demonstrated 180K OPS withp99 latency below 5.5ms and CPU utilizationof only 75%. It is worth noting that suchperformance and scale are rare for any opensource distributed database system.9

Scylla - Throughput and Latency per YCSB WorkloadThroughputP99 100000500Latency [ms]25050000120127AB100005.5CDScylla - Throughput and Latency per YCSB Workload2621EThroughputWorkloadF0P99 LatencyFigure 6: Scylla200000 throughput and latency per YCSB workload.1000180000Throughput 5000012127ABFigure 7: ScyllaDB metricsCD2621E0FWorkloadWorkload E defines operations that are shortrange scans, rather than queries against individualrecords. According to the YCSB benchmark,the workload involves “threaded conversations,where each scan is for the posts in a giventhread assumed to be clustered by thread ID.”As expected, this workload produced the worstperformance – only 10k operations per second.YCSB Workload E is implemented in a way thatdoes not play to the strengths of either Scyllaor CockroachDB. In Scylla, range partition scansare token-based, and tokens are randomlydistributed throughout the cluster. As such,many random reads across multiple nodesare required to satisfy a single scan request.Unsurprisingly, this is not an efficient way to dorange scans. Yet while range scans are not themain advantage of Scylla, Scylla still did wellenough and outperformed CockroachDB by 5x.CockroachDBEven with the smaller dataset of 100 milionkeys, CockroachDB ran up against performancescalability limits much earlier than Scylla, asshown in Figure 8. For Workload A, CockroachDBrecorded 16K OPS with p99 under 52ms. WithWorkload D, CockroachDB achieved its best result,demonstrating 40K OPS with p99 below 176ms,while running at 80% CPU utilization. Furtherincreasing the load did not increase throughput.Instead, the higher load increased latency.The Figure 9 graphs represents Workload E(range scans) results with 2k OPS and P99latency of 537ms.CRDB - Throughput and Latency per YCSB WorkloadThroughput40000P99 536.916000385.910000125.852.456.6176.22000Latency [ms]0Throughput [OPS]Latency [ms]Throughput [OPS]150000250600000ABCDEFWorkload class- Throughputthroughputand Latencyandper latencyYCSB WorkloadFigure 8:CRDBCockroachDBper YCSB workload.40000300003500038000Throughput40000P99 Latency100010750

Figure 9: CockroachDB results for Workload E.CONCLUSIONNoSQL and NewSQL models are clearlyconverging, each providing more functionalityalong with better performance and availabilitythan traditional relational database offerings.It is not a surprise that a highly availableNoSQL database such as Scylla outperformsa distributed NewSQL database such asCockroachDB by a wide margin. The results arenot meant to indicate whether one should selectScylla, or even NoSQL, for every workload.To summarize, with a dataset of 1 billion keys,Scylla demonstrated: Throughput up to 10x better thanCockroachDB Predictable low latencies, even while handling10x the dataCockroachDB demonstrated throughputdegradation during data loading. During theYCSB workloads, we measured throughput thatclosely matched the CockroachDB whitepaper,albeit with larger and more variable latencies.Modern, cloud-native applications oftenrequire high availability and predictable lowlatency. Such workloads are ideal for Scylla.Requirements for strong consistency areless common. Workloads characterized bymodest dataset size, and which require strongconsistency guarantees and transactions alongwith a relational database model, involvingJOINs, are ideal for a NewSQL database suchas CockroachDB. While using SQL transactionscan be a convenient solution to various businesscases, they may end up incurring significantperformance and maintenance costs as thesystem scales – although sometimes they are anecessary evil.11

ABOUT SCYLLADBScylla is the real-time big data database. API-compatiblewith Apache Cassandra and Amazon DynamoDB, Scyllaembraces a shared-nothing approach that increasesthroughput and storage capacity as much as 10X.Comcast, Discord, Disney Hotstar, Grab, Medium,Starbucks, Ola Cabs, Samsung, IBM, Investing.com andmany more leading companies have adopted Scylla torealize order-of-magnitude performance improvementsand reduce hardware costs. Scylla’s database is availableas an open source project, an enterprise edition and afully managed database as a service. ScyllaDB wasfounded by the team responsible for the KVM hypervisor.For more information: ScyllaDB.comSCYLLADB.COMUnited States HeadquartersIsrael Headquarters2445 Faber Place, Suite 200Palo Alto, CA 94303 U.S.A.Email: info@scylladb.com11 Galgalei HapladaHerzelia, IsraelCopyright 2020 ScyllaDB Inc. All rights reserved. All trademarks orregistered trademarks used herein are property of their respective owners.

In this way, a NewSQL system such as CockroachDB is defined as PC/EC — focusing on being strongly consistent — whereas a NoSQL system like Scylla will be defined as PA/ EL — highly available and latency sensitive. The technical team at ScyllaDB decided to evaluate the performance characteristics of NoSQL versus NewSQL to highlight these