Lecture-19-NoSQL-ColStore - Duke University

Transcription

11/5/17AnnouncementsCompSci 516 HW3 released on SakaiDatabase Systems– Due on Monday, Nov 20, 11:55 pm (in 2 weeks)– Start soon, finish soon!Lecture 19NoSQLandColumn Store– You can learn about conceptual questions fromonline material, but must write your own answer Keep working on your project too!Instructor: Sudeepa RoyDuke CS, Fall 2017CompSci 516: Database Systems1Duke CS, Fall 2017CompSci 516: Database Systems2Reading MaterialNOSQL: “Scalable SQL and NoSQL Data Stores”Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No. 4) see webpage http://cattell.net/datastores/ for updates and more pointers MongoDB manual: https://docs.mongodb.com/manual/NoSQLColumn Store: D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos and S. Madden. The Design and Implementation ofModern Column-Oriented Database Systems. Foundations and Trends in Databases, vol. 5, no.3, pp. 197–280, 2012. See VLDB 2009 tutorial: http://nms.csail.mit.edu/ stavros/pubs/tutorial2009column stores.pdfOptional: “Dynamo: Amazon’s Highly Available Key-value Store” By Giuseppe DeCandia et. al. SOSP2007 “Bigtable: A Distributed Storage System for Structured Data” Fay Chang et. al. OSDI 2006Duke CS, Fall 2017CompSci 516: Database Systems3Duke CS, Fall 2017CompSci 516: Database Systems4So far -- RDBMS Relational Data Model Relational Database Systems (RDBMS) RDBMSs have– a complete pre-defined fixed schema– a SQL interface– and ACID transactionsDuke CS, Fall 2017CompSci 516: Database Systems5Duke CS, Fall 2017CompSci 516: Database Systems61

11/5/17TodayWarnings! NoSQL: ”new” database systems Material from Cattell’s paper (2010-11) –some info will be outdated– not typically RDBMS– relax on some requirements, gain efficiency andscalability– see webpage http://cattell.net/datastores/ forupdates and more pointers New systems choose to use/not use severalconcepts we learnt so far– e.g. “System ---” does not use locks but uses multiversion CC (MVCC) or,– “System ---” uses asynchronous replication therefore, it is important to understand the basics(Lectures 1-18) even if they are not used in somenew systems!Duke CS, Fall 2017CompSci 516: Database Systems7 We will focus on the basic ideas of NoSQLsystems Optional reading slides at the end– there are also comparison tables in the Cattell’spaper if you are interestedDuke CS, Fall 2017OLAP vs. OLTP We will examine a number of SQL and so- called“NoSQL” systems or “data stores” Designed to scale simple OLTP-style applicationloadsRecall transactions!Multiple concurrent read-write requestsCommercial applications (banking, online shopping)Data changes frequentlyACID properties, concurrency control, recovery OLAP (OnLine Analytical Processing)– Many aggregate/group-by queries – multidimensional data– Data mostly static– Will study OLAP Cube soonDuke CS, Fall 2017CompSci 516: Database Systems9– to do updates as well as reads– in contrast to traditional DBMSs and data warehouses– to provide good horizontal scalability (?) for simpleread/write database operations distributed over manyservers Originally motivated by Web 2.0 applications– these systems are designed to scale to thousands ormillions of usersDuke CS, Fall 2017New Systems vs. RDMS NoSQL stands for “Not Only SQL” or “NotRelational”– not entirely agreed upon These systems typically sacrifice some of thesedimensions– e.g. database-wide transaction consistency, in order toachieve others, e.g. higher availability and scalabilityCompSci 516: Database Systems10 Many of the new systems are referred to as“NoSQL” data storesdata modelconsistency mechanismsstorage mechanismsdurability guaranteesavailabilityquery supportDuke CS, Fall 2017CompSci 516: Database SystemsNoSQL When you study a new system, compare it withRDBMS-s on its––––––8New Systems OLTP (OnLine Transaction Processing)–––––CompSci 516: Database Systems11 Next: six key features of NoSQL systemsDuke CS, Fall 2017CompSci 516: Database Systems122

11/5/17NoSQL: Six Key FeaturesImportant Examples of New Systems1. the ability to horizontally scale “simple operations”throughput over many servers2. the ability to replicate and to distribute (partition) data overmany servers3. a simple call level interface or protocol (in contrast to SQLbinding)4. a weaker concurrency model than the ACID transactions ofmost relational (SQL) database systems5. efficient use of distributed indexes and RAM for data storage6. the ability to dynamically add new attributes to data recordsDuke CS, Fall 2017CompSci 516: Database Systems131. Memcached: main features Three systems provided a “proof of concept” andinspired many other data stores1. Memcached2. Amazon’s Dynamo3. Google’s BigTableDuke CS, Fall 2017CompSci 516: Database Systems142. Dynamo : main features pioneered the idea of eventual consistency as away to achieve higher availability and scalability popular open source cache supports distributed hashing (later) data fetched are not guaranteed to be up-todate demonstrated that in-memory indexes can behighly scalable, distributing and replicatingobjects over multiple nodes but updates are guaranteed to be propagatedto all nodes eventuallyDuke CS, Fall 2017Duke CS, Fall 2017CompSci 516: Database Systems153. BigTable : main features Recall ACID for RDBMS desired properties oftransactions:– Atomicity, Consistency, Isolation, and Durability NOSQL systems typically do not provide ACID https://cloud.google.com/bigtable/ i 516: Database Systems16BASE (not ACID J) demonstrated that persistent record storagecould be scaled to thousands of nodes “column families”Duke CS, Fall 2017CompSci 516: Database Systems17 Basically Available Soft state Eventually consistentDuke CS, Fall 2017CompSci 516: Database Systems183

11/5/17ACID vs. BASE“CAP” Theorem The idea is that by giving up ACID constraints, onecan achieve much higher performance and scalability Often Eric Brewer’s CAP theorem cited for NoSQL A system can have only two out of three of the following properties: Consistency The systems differ in how much they give up– e.g. most of the systems call themselves “eventuallyconsistent”, meaning that updates are eventuallypropagated to all nodes– but many of them provide mechanisms for some degree ofconsistency, such as multi-version concurrency control(MVCC)– do all clients see the same data? Availability– is the system always on? Partition-tolerance– even if communication is unreliable, does the systemfunction? The NoSQL systems generally give up consistency– However, the trade-offs are complexDuke CS, Fall 2017CompSci 516: Database Systems19Duke CS, Fall 2017Two foci for NoSQL systemsCompSci 516: Database Systems201. “Simple” Operations Reading or writing a small number of related records ineach operation1. “Simple” operations– e.g. key lookups– reads and writes of one record or a small number of records2. Horizontal Scalability This is in contrast to complex queries, joins, or read-mostlyaccess Inspired by web, where millions of users may both read andwrite data in simple operations– e.g. search and update multi-server databases of electronicmail, personal profiles, web postings, wikis, customer records,online dating records, classified ads, and many other kinds ofdataDuke CS, Fall 2017CompSci 516: Database Systems212. Horizontal ScalabilityCompSci 516: Database Systems222. Horizontal vs. Vertical Scaling Shared-Nothing Horizontal Scaling Effective use of multiple cores (vertical scaling) isimportant The ability to distribute both the data and the load ofthese simple operations over many servers– but the number of cores that can share memory islimited– with no RAM or disk shared among the servers horizontal scaling generally is less expensive Not “vertical” scaling– where a database system utilizes many cores and/or CPUsthat share RAM and disks Some of the systems we describe provide both verticaland horizontal scalabilityDuke CS, Fall 2017Duke CS, Fall 2017CompSci 516: Database Systems23– can use commodity servers Note: horizontal and vertical partitioning are not relatedto horizontal and vertical scaling (Lecture 18)– except that they are both useful for horizontal scalingDuke CS, Fall 2017CompSci 516: Database Systems244

11/5/17Choices in NOSQL systems:1. Concurrency ControlWhat is different in NOSQL systemsa) Locks When you study a new NOSQL system, noticehow it differs from RDBMS in terms of––some systems provide one-user-at-a-time read or update locksMongoDB provides locking at a field levelb) MVCC1.2.3.4.Concurrency ControlData Storage MediumReplicationTransactionsDuke CS, Fall 2017CompSci 516: Database Systemsc) None–––do not provide atomicitymultiple users can edit in parallelno guarantee which version you will readd) ACID––25pre-analyze transactions to avoid conflictsno deadlocks and no waits on locksDuke CS, Fall 2017Choices in NOSQL systems:2. Data Storage Medium26Choices in NOSQL systems:3. Replication whether mirror copies are always in synca) Synchronousb) Asynchronousa) Storage in RAM– snapshots or replication to disk– poor performance when overflows RAM– faster, but updates may be lost in a crashb) Disk storagec) Both– caching in RAMDuke CS, Fall 2017CompSci 516: Database Systems– local copies synchronously, remote copiesasynchronouslyCompSci 516: Database Systems27Choices in NOSQL systems:4. Transaction MechanismsDuke CS, Fall 2017CompSci 516: Database Systems28Comparison from Cattell’s paper (2011)a) supportb) do not supportc) in between– support local transactions only within a singleobject or “shard”– shard a horizontal partition of data in adatabaseDuke CS, Fall 2017CompSci 516: Database Systems29Duke CS, Fall 2017CompSci 516: Database Systems305

11/5/17Data Model Terminology for NoSQL Unlike SQL/RDBMS, the terminology for NoSQLis often inconsistentData Model Terminology for NoSQL The systems all store sets of attribute-value pairs– but use four different data structures– we are following notations in Cattell’s paper All systems provide a way to store scalar values– e.g. numbers and strings Some of them also provide a way to store morecomplex nested or reference valuesDuke CS, Fall 2017CompSci 516: Database Systems311.2.3.4.TupleDocumentExtensible RecordObjectDuke CS, Fall 20171. TupleCompSci 516: Database Systems322. Document Allows values to be nested documents or lists aswell as scalar values Same as before A “tuple” is a row in a relational table– think about XML or JSON– attribute names are pre-defined in a schema– the values must be scalar– the values are referenced by attribute name– in contrast to an array or list, where they arereferenced by ordinal position The attribute names are dynamically defined foreach document at runtime A document differs from a tuple in that theattributes are not defined in a global schema– and a wider range of values are permittedDuke CS, Fall 2017CompSci 516: Database Systems33Duke CS, Fall 20173. Extensible RecordCompSci 516: Database Systems344. Object A hybrid between a tuple and a document families of attributes are defined in a schema but new attributes can be added (within anattribute family) on a per-record basis Attributes may be list-valuedDuke CS, Fall 2017CompSci 516: Database Systems Analogous to an object in programminglanguages– but without the procedural methods Values may be references or nested objects35Duke CS, Fall 2017CompSci 516: Database Systems366

11/5/17Data Store CategoriesExample NOSQL systems The data stores are grouped according to their data model Key-value Stores:– store values and an index to find them– based on a programmer- defined key– Project Voldemort, Riak, Redis, Scalaris, TokyoCabinet, Memcached/Membrain/Membase Document Stores: Document Stores:– store documents– The documents are indexed and a simple query mechanism isprovided– Amazon SimpleDB, CouchDB, MongoDB, Terrastore Extensible Record Stores: Extensible Record Stores:– Hbase, HyperTable, Cassandra, Yahoo’s PNUTS– store extensible records that can be partitioned vertically andhorizontally across nodes– Some papers call these “wide column stores” Relational Databases:– MySQL Cluster, VoltDB, Clustrix, ScaleDB, ScaleBase,NimbusDB, Google Megastore (a layer on BigTable) Relational Databases:– store (and index and query) tuples– e.g. the new RDBMSs that provide horizontal scalingDuke CS, Fall 2017CompSci 516: Database Systems37Use Case : Key-value store where you need to look up objects based onmultiple fields– e.g., a driver’s name, license number, ownedvehicle, or birth date39Duke CS, Fall 2017CompSci 516: Database Systems40Use case: Scalable RDBMS uses cases similar to those for document stores: If your application requires many tables withdifferent types of data– multiple kinds of objects, with lookups based on any field. However, aimed at higher throughput, and may providestronger concurrency guarantees,– a relational schema centralizes and simplifies datadefinition and SQL simplifies operations– or for projects with many programmers– at the cost of slightly more complexity than the document stores Suppose storing customer information for an eBay-styleapplication, and you want to partition your data bothhorizontally and vertically:– cluster customers by country, so that you can efficiently search all ofthe customers in one country– separate the rarely-changed “core” customer information such ascustomer addresses and email addresses in one place, and– put certain frequently-updated customer information (such as currentbids in progress) in a different place, to improve performanceCompSci 516: Database Systems38– e.g. in a Department of Motor Vehiclesapplication, with vehicles and driversUse case: Extensible Record StoreDuke CS, Fall 2017CompSci 516: Database Systems application with multiple different kinds ofobjects– that does many RDBMS queries to create a tailoredpage when a user logs in– Suppose it takes several seconds to execute thosequeries, and the user’s data is rarely changed– you might want to store the user’s tailored page as asingle object in a key-value storeCompSci 516: Database SystemsDuke CS, Fall 2017Use case: Document Store if you have a simple application with only onekind of object, and you only need to look upobjects up based on one attribute Suppose you have a web applicationDuke CS, Fall 2017 Key-value Stores:41 However, more useful if the application does notrequire– updates or joins that span many nodes– transaction coordination– or, data movementDuke CS, Fall 2017CompSci 516: Database Systems427

11/5/17Consistent Hashing (CH) Recall dynamic hashing schemes If the #of slots (directory size) changes, then almostall keys had to be remapped In consistent hashing (CH), with #keys K and #slots N, only K/N keys need to be remapped on average Applies to the design of Distributed Hash Table(DHTs) for Uniform Load DistributionConsistent Hashingin DynamoDB– partition a keyspace among a set of sites/nodes– additionally provide an overlay network that connectsnodes such that the nodes responsible for any key can beefficiently locatedDuke CS, Fall 2017CompSci 516: Database Systems43Duke CS, Fall 2017DynamoDB : CH 1/2 Data item identified by a key Assign to a node by hashing the key to yield its positionon the ring Walk the ring clockwise to find the first node with aposition larger than the item’s position Each node is responsible for the region in the ringbetween it and its predecessor node on the ring– represents the “position” on the ring Note: departure or arrival of a node onlyaffects its immediate neighbor The other nodes remain unaffected K/N on average! Data item identified by a key Assign to a node by hashingthe key toCompSci 516: Database Systems45Duke CS, Fall 2017DynamoDB: Replication46 Proposed by CS theoreticians from MIT:– Karger-Lehman-Leighton-Panigrahy-Levine-Lewin– “Consistent Hashing and Random Trees: Distributed Caching Protocolsfor Relieving Hot Spots on the World Wide Web” – STOC 1997– coordinator handles all keys in its range– Coordinator replicates each key it is in charge of Consistent hashing gave birth to Akamai Technologies– Founded by Danny Lewin and Tom Leighton in 1998– Akamai’s content delivery network is one of the largest distributedcomputing platforms– Now market cap 12B and 6200 employees– Managing web-presence of many major companies by storing it locally replicating it at the N-1 clockwise succesor nodes in the ring Each node is in charge of region of the ring between it and itsN-th predecessor 2001: The concept of Distributed Hash Table (DHT) isproposed (how to look for a file) and CH was re-purposed Now used in Dynamo, Couchbase, Cassandra, Voldemort,Riak, .Node B replicates key K at nodes C and DNode D will store keys in the range (A, B], (B, C], (C, D]Note: there may be N “physical” nodes, uses “virtualnodes”CompSci 516: Database SystemsCompSci 516: Database SystemsCH History Dynamo replicates its data on multiple (N) hosts for highavailability and durability Each key k is assigned to a coordinator which is in charge ofreplicationDuke CS, Fall 201744DynamoDB : CH 2/2 [ref. the DynamoDB paper, sec 4.3] Must scale incrementally Consistent hashing is used to dynamically distributedata around a “ring” of nodes ( sites) The output of a hash function is treated as a circularring Each node is assigned a random value in this spaceDuke CS, Fall 2017CompSci 516: Database Systems47Duke CS, Fall 2017CompSci 516: Database Systems488

11/5/17Why choose RDBMS over NoSQL : 1/31. If new relational systems can do everythingthat a NoSQL system can, with analogousperformance and scalability (?), and with theconvenience of transactions and SQL, NoSQLis not neededSQL vs. NOSQLArguments for both sidesstill a controversial topic2. Relational DBMSs have taken and retainedmajority market share over othercompetitors in the past 30 years– (network, object, and XML DBMSs)Duke CS, Fall 2017CompSci 516: Database Systems49Why choose RDBMS over NoSQL : 2/33. Successful relational DBMSs have been builtto handle other specific application loads inthe past:–––––CompSci 516: Database Systems51Why choose NoSQL over RDBMS : 1/31. We haven’t yet seen good benchmarks showingthat RDBMSs can achieve scaling comparable withNoSQL systems like Google’s BigTable– then a key-value store is adequate and probably easier to understandthan a relational DBMS– Likewise for a document store on a simple application: you only paythe learning curve for the level of complexity you requireCompSci 516: Database Systems50Why choose RDBMS over NoSQL : 3/3Duke CS, Fall 2017CompSci 516: Database Systems52Why choose NoSQL over RDBMS : 2/33. Some applications require a flexible schema2. If you only require a lookup of objects based on asingle keyDuke CS, Fall 2017CompSci 516: Database Systems4. While no “one size fits all” in the SQLproducts themselves, there is a commoninterface with SQL, transactions, andrelational schema that give advantages intraining, continuity, and data interchangeread-only or read-mostly data warehousingOLTP on multi-core multi-disk CPUsin-memory databasesdistributed databases, andnow horizontally scaled databasesDuke CS, Fall 2017Duke CS, Fall 201753– allowing each object in a collection to have differentattributes– While some RDBMSs allow efficient “packing” of tupleswith missing attributes, and some allow adding newattributes at runtime, this is uncommonDuke CS, Fall 2017CompSci 516: Database Systems549

11/5/17Why choose NoSQL over RDBMS : 3/34. A relational DBMS makes “expensive” (multi- nodemulti-table) operations “too easy”– NoSQL systems make them impossible or obviouslyexpensive for programmersColumn Store5. While RDBMSs have maintained majority marketshare over the years, other products haveestablished smaller but non-trivial markets in areaswhere there is a need for particular capabilities– e.g. indexed objects with products like BerkeleyDB, or graph-followingoperations with object-oriented DBMSsDuke CS, Fall 2017CompSci 516: Database Systems55Duke CS, Fall 2017CompSci 516: Database Systems57Duke CS, Fall 2017CompSci 516: Database SystemsDuke CS, Fall 2017CompSci 516: Database Systems56Row vs. Column Store Row store– store all attributes of a tuple together– storage like “row-major order” in a matrix Column store– store all rows for an attribute (column) together– storage like “column-major order” in a matrix e.g.– MonetDB, Vertica (earlier, C-store), SAP/Sybase IQ,Google Bigtable (with column groups)Ack: Slide from VLDB 2009 tutorial on Column storeDuke CS, Fall 2017CompSci 516: Database SystemsDuke CS, Fall 2017CompSci 516: Database SystemsAck: Slide from VLDB 2009 tutorial on Column store5958Ack: Slide from VLDB 2009 tutorial on Column store6010

11/5/17Ack: Slide from VLDB 2009 tutorial on Column storeAck: Slide from VLDB 2009 tutorial on Column storeDuke CS, Fall 2017CompSci 516: Database Systems61Duke CS, Fall 2017CompSci 516: Database SystemsMongoDB Additional and Optional Slides onMongoDBMongoDB is an open source document store written in C provides indexes on collectionslocklessprovides a document query mechanismsupports automatic shardingReplication is mostly used for failoverdoes not provide the global consistency of a traditional DBMS supports dynamic queries with automatic use of indices, likeRDBMSs also supports map-reduce – helps complex aggregations acrossdocs provides atomic operations on fieldsCompSci 516: Database Systems63Duke CS, Fall 2017Optional slide: Read yourself The update command supports “modifiers” that facilitate atomicchanges to individual valuesCompSci 516: Database Systems64Optional slide: Read yourselfMongoDB: Atomic Ops on Fields–––––Optional slide: Read yourself– but you can get local consistency on the up-to-date primary copy of adocument(May be useful for HW3)https://docs.mongodb.comDuke CS, Fall 201762MongoDB: Index MongoDB indices are explicitly defined usingan ensureIndex call set sets a value inc increments a value push appends a value to an array pushAll appends several values to an array pull removes a value from an array, and pullAll removes severalvalues from an array– any existing indices are automatically used forquery processing Since these updates normally occur “in place”, they avoid theoverhead of a return trip to the server There is an “update if current” convention for changing a documentonly if field values match a given previous value MongoDB supports a findAndModify command to perform anatomic update and immediately return the updated document To find all products released last year (2015)or later costing under 100 you could write: db.products.find({released: { gte: new Date(2015, 1, 1,)}, price{‘ lte’: 100},})Duke CS, Fall 2017Duke CS, Fall 2017– useful for implementing queues and other data structures requiringatomicityCompSci 516: Database Systems65CompSci 516: Database Systems6611

11/5/17Optional slide: Read yourselfOptional slide: Read yourselfMongoDB: Data MongoDB stores data in a binary JSON-likeformat called BSON MongoDB supports master-slave replicationwith automatic failover and recovery– BSON supports boolean, integer, float, date, stringand binary types– MongoDB can also support large binary objects,eg. images and videos– These are stored in chunks that can be streamedback to the client for efficient deliveryDuke CS, Fall 2017CompSci 516: Database SystemsMongoDB: Replication67– Replication (and recovery) is done at the level ofshards– Replication is asynchronous for higherperformance, so some updates may be lost on acrashDuke CS, Fall 2017CompSci 516: Database Systems6812

column_stores.pdf Optional: "Dynamo: Amazon's Highly Available Key-value Store" By Giuseppe DeCandiaet. al. SOSP 2007 "Bigtable: A Distributed Storage System for Structured Data" Fay Chang et. al. OSDI 2006 Duke CS, Fall 2017 CompSci 516: Database Systems 3 NoSQL Duke CS, Fall 2017 CompSci 516: Database Systems 4