Big Fast Data With MongoDB - Squarespace

Transcription

Big Fast Data withMongoDBClint CombsThursday, April 12, 12

Polyglot Persistence Same idea as Polyglot Programming DB Variety Data Structure Variety Don’t use a list when we need a map. The same goes for data persistence.Thursday, April 12, 12

Many OptionsOracleLogsDB2RedisBigTableThursday, April 12, 12MongoDBFilesDatomicExcelNeo4jMySQLCSVCouchDB

ay, April 12, 12FilesDatomicExcelNeo4jMySQLCSVCouchDB

MongoDBThursday, April 12, 12

MongoDBA Scalable High-PerformanceDocument-Oriented DatabaseThursday, April 12, 12

When is MongoDB aGood Fit? when you have a large data set when you have a dynamic schema when you need speed when you can tolerate eventual consistencyThursday, April 12, 12

Wordnik Migrated from MySQL Primary reason: performance 500k requests/hour - 4x that during peak 12 billion documents in MongoDB 3 TB per nodeThursday, April 12, 12

Wordnik (cont.) insert 8k documents per second withbursts to 50k per second each java client can sustain 10MB/secondreading from a mongodThursday, April 12, 12

Wordnik (cont.) Every type of retrieval is faster sample fetch: 400 ms to 60 ms dictionary entries: 20 ms to 1 ms document metadata: 30 ms to 0.1 ms spelling suggestions: 10 ms to 1.2 msThursday, April 12, 12

Wordnik (cont.) MongoDB built-in caching benefits removed memcached, freed GBs of RAM average speed increase of 1-2 ms perrequest under load not all data in RAM, so 60ms avg. includesdisk accessThursday, April 12, 12

Wordnik (cont.) full story: http://blog.wordnik.com/12months-with-mongodb others: http://www.mongodb.org/display/DOCS/Production DeploymentsThursday, April 12, 12

MongoDB Applications archiving event logging document and CMS systems geospatial indexing CQRSCommand Query Responsibility SeparationThursday, April 12, 12

CQRSSystem of Record(Oracle)Change EventsCommandModelQueryModelClientThursday, April 12, 12Query Data(MongoDB)

What does it store? JSON documents up to 16MB in size stored in BSON format larger documents can be stored withGridFS all documents stored in collectionsThursday, April 12, 12

Data Types JSON types: string, integer, boolean, double,null, array, and object other: date, object id, binary data, regularexpressions, code timestamps are for internal use only JavaScript code into system.jsThursday, April 12, 12

Java Type MappingsMongoDB Typestring, int, boolean, doubleObject IDRegular ExpressionDates/TimesDatabase ReferencesBinary DataTimestamp DataCode DataEmbedded DocumentsArraysThursday, April 12, 12Java TypeString, int, boolean, ode, CodeWScopeBasicDBObjectanything that extends List

mongo shell ‘mongo’ starts the command-line interface use database-name show collections - displays a list ofcollections in the current databaseThursday, April 12, 12

Collections collections of BSON documents schema-free can be organized in namespaces for userconvenience (really a flat model in DB)Thursday, April 12, 12

Object IDs every document in a collection has aunique id field id can be any type id created as a BSON ObjectID by thedatabase if not suppliedThursday, April 12, 12

Collections create: db.createCollection(“users”) rename:db.users.renameCollection(“allUsers”) drop: db.users.drop() count: db.users.count()Thursday, April 12, 12

newhart example docs in this talk are from newhart newhart is an open source project thatprovides audit tracking with storage inMongoDB uses capped collections for audit trails will be available atgithub.org/ClintCombs/newhartThursday, April 12, 12

newhart document{}Thursday, April 12, 12" id" : ObjectId("4f84b0473004b2a7bf9dcdad"),"origin" : "org.newhart.example.AuditUsers","originKey" : "mary2012-create-1334095943422","originKeyType" : "User","criticality" : "major","auditTS" : ISODate("2012-04-10T22:12:23.426Z"),"createTS" : ISODate("2012-04-10T22:12:23.424Z"),"updateTS" : ISODate("2012-04-10T22:12:23.424Z"),"msg" : "new user created: mary2012","data" : {"loginName" : { "text" : "mary2012" }},"errors" : [ ],"warnings" : [ ],"labels" : [ "security", "create" ]

Insert use newhart show collections db.users.insert({origin:"shell"})Thursday, April 12, 12

Queries db.users.find(); db.users.findOne();Thursday, April 12, 12

Queries (cont.) db.users.find({origin:"shell"})Field Selection: ers"},{labels:1})Arrays db.users.find({labels:"authenticate"});Thursday, April 12, 12

Skip and Limit imit(3)Thursday, April 12, 12

Updates db.users.save({" id" :ObjectId("4f84fecae2f8776e5cdc3ba7"),"origin" : "Clint"}) update if exists, otherwise insert ay, April 12, 12

Deleteremove all items db.users.remove()same as above db.users.remove({})remove all items with minor criticality db.users.remove({criticality:"minor"})Thursday, April 12, 12

Capped Collections Fixed-size, FIFO collections e, size:1000000}); size is number of bytes, including databaseheadersThursday, April 12, 12

Sorting ext":1}).sort({"data.loginName.text":1}) db.users.find().sort({ natural:-1}).limit(1)Thursday, April 12, 12

Indexes db.users.ensureIndex({origin:1})Unique Index dexing Embedded Fields pound Keys db.contacts.ensureIndex({last:1, first:1})Thursday, April 12, 12

Optimization withExplain explain(){"cursor" : "BtreeCursor data.loginName.text 1","nscanned" : 6,"nscannedObjects" : 6,"n" : 6,"millis" : 0,"nYields" : 0,"nChunkSkips" : 0,"isMultiKey" : false,"indexOnly" : false,"indexBounds" : {"data.loginName.text" : [[ "mary2012", "mary2012" ]]}}Thursday, April 12, 12

Fault Tolerance & ScalingReplica Sets and ShardingThursday, April 12, 12

Replica SetsR1R3ArbiterR2R4ClientThursday, April 12, 12

Replica SetsR1R3ArbiterR4ClientThursday, April 12, 12

Replica SetsR1R3logsmongodR4R2datajournalfile lockClientThursday, April 12, 12Arbiter

Replica SetsR1R3ArbiterR4ClientThursday, April 12, 12

Replica SetsR1R3ArbiterR2R4ClientThursday, April 12, 12

Replica SetsR2R1R3ArbiterR2R4ClientThursday, April 12, 12

Replica SetsR2R1R3ArbiterR3R2R4ClientThursday, April 12, 12

Replica SetsR2R1R3R3ArbiterR3R2R4ClientThursday, April 12, 12

Replica SetsR2R2R1R3R3ArbiterR3R2R4ClientThursday, April 12, 12

Replica SetsR2R2R1R3R3ArbiterR3R2R4ClientThursday, April 12, 12R2

Replica SetsR1R3ArbiterR2R4ClientThursday, April 12, 12

Replica SetsR1R3ArbiterR2R4ClientThursday, April 12, 12

Replica SetsR1R3ArbiterR2R4ClientThursday, April 12, 12

Replica SetsR1R3ArbiterR2R4ClientThursday, April 12, 12

slaveOk used when querying a replica set drivers route requests to master by default set slaveOk to query a secondary memberof the replica set rs.slaveOk()Thursday, April 12, 12

Read from SecondaryR1R3ArbiterR2R4ClientThursday, April 12, 12

Read from SecondaryR1R3ArbiterR2R4ClientThursday, April 12, 12

Read from SecondaryR1R3ArbiterR2R4ClientThursday, April 12, 12

Read from SecondaryR1R3ArbiterR2R4ClientThursday, April 12, 12

ShardingShard 2G-LShard 3M-RShard 4T-ZR1R1R1R1R2R2R2R2R3R3R3R3mongod c1mongod c2mongosmongosClientThursday, April 12, 12mongosReplica SetShard 1A-F

UI Clients MongoVUE JMongoBrowser others.Thursday, April 12, 12

Drivers Standard Java Driver: http://github.com/mongodb/mongo-java-driver Hammersmith: high-performance async Javadriver (Akka 2.0 durable mailboxes)https://github.com/bwmcadams/hammersmith Casbah Scala Driver:http://github.com/mongodb/casbahThursday, April 12, 12

mongodb.org Drivers C C Erlang Haskell JavaScriptThursday, April 12, 12 .NET Perl PHP Python Ruby

Community Drivers ActionScript Clojure ColdFusion D Dart FantomThursday, April 12, 12 F# Go Groovy Lisp Smalltalk and more.

Quick Reference Cardsand Cookbook http://www.10gen.com/reference Commands Queries Indexing Replica Sets http://cookbook.mongodb.org/Thursday, April 12, 12

Summary MongoDB is one of many persistencealternatives Document-Oriented Big and Fast Flexible - no schemaThursday, April 12, 12

Contact Me Twitter: @ClintCombs Presentation posted at:http://ccombs.net/presentationsThursday, April 12, 12

MongoDB Type Java Type string, int, boolean, double String, int, boolean, double Object ID com.mongodb.ObjectId Regular Expression java.util.regex.Pattern Dates/Times java.util.Date Database References com.mongodb.DBRef Binary Data byte[] Timestamp Data BSONTimestamp Code Data Code, CodeWScope Embedded Documents BasicDBObject