AA CompComparariissoonn Ooff SSQLQL And NoSQLNoSQL Databases

Transcription

ISO/IEC JTC1/SC32/WG2 N1537A Comparison of SQLand NoSQL DatabasesKeith W. HareJCC Consulting, Inc.Convenor, ISO/IEC JTC1 SC32 WG313 May 2011Metadata Open Forum1

AbstractNoSQL databases (either nono--SQL or Not OnlySQL) are currently a hot topic in some parts ofcomputing. In fact, one website lists over ahundred different NoSQL databases.This presentation reviews the features common tothe NoSQL databases and compares those featuresto the features and capabilities of SQL databases.13 May 2011Metadata Open Forum2

Who Am I? Muskingum College, 1980, BS in Biology andComputer ScienceSenior Consultant with JCC Consulting, Inc.since 1985 – high performance database systemsOhio State – Masters in Computer &Information Science, 1985SQL Standards committees since 1988Vice Chair, INCITS H2 since 2003Convenor, ISO/IEC JTC1 SC32 WG3 since200513 May 2011Metadata Open Forum3

Topics SQL DatabasesSQL Standard SQL Characteristics SQL Database Examples NoSQL DatabasesNoSQL Defintion General Characteristics NoSQL Database Types NoSQL Database Examples 13 May 2011Metadata Open Forum4

Standard SQLThe following is a short, incomplete history of the SQLStandards – ISO/IEC 9075 1987 – Initial ISO/IEC Standard 1989 – Referential Integrity 1992 – SQL2 1995 SQL/CLI (ODBC)1996 SQL/PSM – Procedural Language extensions1999 – User Defined Types2003 – SQL/XML2008 – Expansions and corrections2011 (or 2012) System Versioned and Application TimePeriod Tables13 May 2011Metadata Open Forum5

SQL Characteristics Data stored in columns and tablesRelationships represented by dataData Manipulation LanguageData Definition LanguageTransactionsAbstraction from physical layer13 May 2011Metadata Open Forum6

SQL Physical Layer Abstraction Applications specify what, not howQuery optimization enginePhysical layer can change without modifyingapplicationsCreate indexes to support queries In Memory databases 13 May 2011Metadata Open Forum7

Data Manipulation Language (DML) Data manipulated with Select, Insert, Update, &Delete statements Select T1.Column1, T2.Column2 From Table1, Table2 Where T1.Column1 T2.Column1 Data AggregationCompound statementsFunctions and ProceduresExplicit transaction control13 May 2011Metadata Open Forum8

Data Definition Language Schema defined at the startCreate Table (Column1 Datatype1, Column2 Datatype2, )Constraints to define and enforce relationships Primary KeyForeign KeyEtc.Triggers to respond to Insert, Update , & DeleteStored ModulesAlter Drop Security and Access Control13 May 2011Metadata Open Forum9

Transactions – ACID Properties Atomic – All of the work in a transaction completes(commit) or none of it completesConsistent – A transaction transforms the databasefrom one consistent state to another consistentstate. Consistency is defined in terms of constraints.Isolated – The results of any changes made during atransaction are not visible until the transaction hascommitted.Durable – The results of a committed transactionsurvive failures13 May 2011Metadata Open Forum10

SQL Database Examples CommercialIBM DB2 Oracle RDMS Microsoft SQL Server Sybase SQL Anywhere Open Source (with commercial options)MySQL Ingres Significant portions of theworld’s economy use SQL databases!13 May 2011Metadata Open Forum11

NoSQL DefinitionFrom www.nosqlwww.nosql--database.org:Next Generation Databases mostly addressing some ofthe points: being nonnon--relational, distributed,distributed, openopen-source and horizontal scalable.scalable. The original intentionhas been modern webweb--scale databases.databases. Themovement began early 2009 and is growing rapidly.Often more characteristics apply as: schemaschema--free,easy replication support, simple API, eventuallyconsistent / BASE (not ACID), a huge dataamount,amount, and more.13 May 2011Metadata Open Forum12

NoSQL -database.org/ lists 122 NoSQLDatabases Cassandra CouchDB Hadoop & Hbase MongoDB StupidDB Etc.13 May 2011Metadata Open Forum13

NoSQL Distinguishing Characteristics Large data volumes Scalable replication and distribution Google’s “big data”Potentially thousands of machinesPotentially distributed around the worldQueries need to return answers quicklyMostly query, few updatesAsynchronous Inserts & UpdatesSchemaSchema-lessACID transaction properties are not needed – BASECAP TheoremOpen source development13 May 2011Metadata Open Forum14

BASE Transactions Acronym contrived to be the opposite of ACID Basically Available,vailable,Soft state,Eventually ConsistentCharacteristics Weak consistency – stale data OKAvailability firstBest effortApproximate answers OKAggressive (optimistic)Simpler and faster13 May 2011Metadata Open Forum15

Brewer’s CAP TheoremA distributed system can support only two of thefollowing characteristics: Consistency Availability Partition toleranceThe slides from Brewer’s July 2000 talk do notdefine these characteristics.13 May 2011Metadata Open Forum16

Consistency all nodes see the same data at the same time –Wikipediaclient perceives that a set of operations hasoccurred all at once – PritchettMore like Atomic in ACID transactionproperties13 May 2011Metadata Open Forum17

Availability node failures do not prevent survivors fromcontinuing to operate – WikipediaEvery operation must terminate in an intendedresponse – Pritchett13 May 2011Metadata Open Forum18

Partition Tolerance the system continues to operate despite arbitrarymessage loss – WikipediaOperations will complete, even if individualcomponents are unavailable – Pritchett13 May 2011Metadata Open Forum19

NoSQL Database TypesDiscussing NoSQL databases is complicatedbecause there are a variety of types: Column Store – Each storage block containsdata from only one column Document Store – stores documents made up oftagged elements KeyKey--Value Store – Hash table of keys13 May 2011Metadata Open Forum20

Other NonNon--SQL Databases XML DatabasesGraph DatabasesCodasyl DatabasesObject Oriented DatabasesEtc Will not address these today13 May 2011Metadata Open Forum21

NoSQL Example: Column Store Each storage block contains data from only onecolumnExample: HadoopHadoop//Hbasehttp://hadoop.apache.org/ Yahoo, Facebook Example: Ingres VectorWiseColumn Store integrated with an SQL database http://www.ingres.com/products/vectorwise 13 May 2011Metadata Open Forum22

Column Store Comments More efficient than row (or document) store if:Multiple row/record/documents are inserted at thesame time so updates of column blocks can beaggregated Retrievals access only some of the columns in arow/record/document 13 May 2011Metadata Open Forum23

NoSQL Example: Document Store Example: ache.org/ BBC Example: rg/ Foursquare, Shutterfly JSON – JavaScript Object Notation13 May 2011Metadata Open Forum24

CouchDB JSON Example{" id": "guid"guid goes here"," rev": "314159","type": "abstract","author": "Keith W. Hare""title": "SQL Standard and NoSQL Databases","body": "NoSQL"NoSQL databases (either nono-SQL or Not Only SQL)are currently a hot topic in some parts ofcomputing.","creation timestamp":creation timestamp": "2011/05/10 13:30:00 0004"}13 May 2011Metadata Open Forum25

CouchDB JSON Tags " id"" id" " rev" GUID – Global Unique IdentifierPassed in or generated by CouchDBRevision numberVersioning mechanism"type", "author", "title", etc. Arbitrary tagsSchemaSchema--lessCould be validated after the fact by useruser-written routine13 May 2011Metadata Open Forum26

NoSQL Examples: KeyKey-Value Store Hash tables of KeysValues stored with KeysFast access to small data valuesExample – www.project--voldemort.comvoldemort.com// Linkedin Example – rg// Backend storage is BerkeleyBerkeley--DB 13 May 2011Metadata Open Forum27

Map Reduce Technique for indexing and searching large datavolumesTwo Phases, Map and Reduce MapExtract sets of KeyKey-Value pairs from underlying data Potentially in Parallel on multiple machines ReduceMerge and sort sets of KeyKey-Value pairs Results may be useful for other searches 13 May 2011Metadata Open Forum28

Map Reduce Map Reduce techniques differ across productsImplemented by application developers, not byunderlying software13 May 2011Metadata Open Forum29

Map Reduce PatentGoogle granted US Patent 7,650,331, January 2010System and method for efficient largelarge-scale data processingA largelarge-scale data processing system and method includes oneor more applicationapplication-independent map modules configured toread input data and to apply at least one applicationapplication-specificmap operation to the input data to produce intermediate datavalues, wherein the map operation is automatically parallelizedacross multiple processors in the parallel processingenvironment. A plurality of intermediate data structures areused to store the intermediate data values. One or moreapplicationapplication-independent reduce modules are configured toretrieve the intermediate data values and to apply at least oneapplicationapplication-specific reduce operation to the intermediatedata values to provide output data.13 May 2011Metadata Open Forum30

Storing and Modifying Data Syntax variesHTML Java Script Etc. Asynchronous – Inserts and updates do not waitfor confirmationVersionedOptimistic Concurrency13 May 2011Metadata Open Forum31

Retrieving Data Syntax VariesNo setset--based query language Procedural program languages such as Java, C, etc. Application specifies retrieval pathNo query optimizerQuick answer is importantMay not be a single “right” answer13 May 2011Metadata Open Forum32

Open Source Small upfront software costsSuitable for large scale distribution oncommodity hardware13 May 2011Metadata Open Forum33

NoSQL Summary NoSQL databases reject:Overhead of ACID transactions “Complexity” of SQL Burden of upup--front schema design Declarative query expression Yesterday’s technology Programmer responsible forStepStep--byby--step procedural language Navigating access path 13 May 2011Metadata Open Forum34

Summary SQL DatabasesPredefined Schema Standard definition and interface language Tight consistency Well defined semantics NoSQL DatabaseNo predefined Schema PerPer--product definition and interface language Getting an answer quickly is more important thangetting a correct answer 13 May 2011Metadata Open Forum35

13 May 2011Metadata Open Forum36

Questions?13 May 2011Metadata Open Forum37

Web References “NoSQL -- Your Ultimate Guide to the Non - sqlnosql--database.org/links.html“NoSQL ki/NoSQL://en.wikipedia.org/wiki/NoSQLPODC Keynote, July 19, 2000. Towards Robust.Robust. Distributed Systems.Systems.Dr. Eric A. Brewer.Brewer. Professor, UC Berkeley. CoCo-Founder & ChiefScientist, Inktomi .www.eecs.berkeley.edu/ f“Brewer's CAP Theorem” posted by Julian Browne, January 11,2009. capcap--theorem“How to write a CV” Geek & Poke 2011/01/nosql.html13 May 2011Metadata Open Forum38

Web References “Exploring CouchDBCouchDB:: A documentdocument-oriented database for Webapplications”, Joe Lennon, Software developer, raph Databases, NOSQL and Neo4j” Posted by PeterNeubauer on May 12, 2010 dra vs MongoDB vs CouchDB vs Redis vs Riak vsHBase comparison”, Kristóf hdbcouchdb--vsvs--redis“Distinguishing Two Major Types of ColumnColumn-Stores” Posted byDaniel Abadi onMarch 29, stinguishingtwotwo--majormajor--typestypes-of 29.html13 May 2011Metadata Open Forum39

Web References “MapReduce:MapReduce: Simplified Data Processing on Large Clusters”,Jeffrey Dean and Sanjay Ghemawat,Ghemawat, December able SQL”, ACM Queue, Michael Rys, April 19, 2011http://queue.acm.org/detail.cfm?id 1971597“a practical guide to noSQLnoSQL”,”, Posted by Denise Miura on March17, 2011 at deguide-toto--nosql/13 May 2011Metadata Open Forum40

Books “CouchDB The Definitive Guide”,Guide”, J. Chris Anderson, Jan Lehnardtand Noah Slater. O’Reilly Media Inc., SebastopoolSebastopool,, CA, USA.2010“Hadoop The Definitive Guide”,Guide”, Tom White. O’Reilly Media Inc.,SebastopoolSebastopool,, CA, USA. 2011“MongoDB The Definitive Guide”,Guide”, Kristina Chodorow andMichael Dirolf.Dirolf. O’Reilly Media Inc., SebastopoolSebastopool,, CA, USA.201013 May 2011Metadata Open Forum41

TrTraansansactctiioonsns - AACCIDID PrProoperpertiesties AAtotomicmic - AAllll ooff ththee woworrkk iinn aa trtraannssaacctitioonn ccoompmplleetetess (c(coommmmiit)t) oorr nnoonnee ooff iitt ccoompmplleetetess Coonnssiisstetenntt - AA trtraannssaacctitioonn trtraannssfoforrmsms ththee dadatatabbaassee frofromm oonnee ccoonnssiisstetenntt sstatatete toto aannooththeerr ccoonnssiisstetenntt