MongoDB: The Definitive Guide - TINET

Transcription

www.it-ebooks.info

www.it-ebooks.info

MongoDB: The Definitive Guidewww.it-ebooks.info

www.it-ebooks.info

MongoDB: The Definitive GuideKristina Chodorow and Michael DirolfBeijing Cambridge Farnham Köln Sebastopol Tokyowww.it-ebooks.info

MongoDB: The Definitive Guideby Kristina Chodorow and Michael DirolfCopyright 2010 Kristina Chodorow and Michael Dirolf. All rights reserved.Printed in the United States of America.Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.O’Reilly books may be purchased for educational, business, or sales promotional use. Online editionsare also available for most titles (http://my.safaribooksonline.com). For more information, contact ourcorporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com.Editor: Julie SteeleProduction Editor: Teresa ElseyCopyeditor: Kim WimpsettProofreader: Apostrophe Editing ServicesProduction Services: Molly SharpIndexer: Ellen Troutman ZaigCover Designer: Karen MontgomeryInterior Designer: David FutatoIllustrator: Robert RomanoPrinting History:September 2010:First Edition.Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks ofO’Reilly Media, Inc. MongoDB: The Definitive Guide, the image of a mongoose lemur, and related tradedress are trademarks of O’Reilly Media, Inc.Many of the designations used by manufacturers and sellers to distinguish their products are claimed astrademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of atrademark claim, the designations have been printed in caps or initial caps.While every precaution has been taken in the preparation of this book, the publisher and authors assumeno responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.ISBN: 978-1-449-38156-1[M]1283534198www.it-ebooks.info

Table of ContentsForeword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1A Rich Data ModelEasy ScalingTons of Features Without Sacrificing SpeedSimple AdministrationBut Wait, That’s Not All 1223342. Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ting and Starting MongoDBMongoDB ShellRunning the ShellA MongoDB ClientBasic Operations with the ShellTips for Using the ShellData TypesBasic Data TypesNumbersDatesArraysEmbedded Documentsid and ooks.info

3. Creating, Updating, and Deleting Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Inserting and Saving DocumentsBatch InsertInserts: Internals and ImplicationsRemoving DocumentsRemove SpeedUpdating DocumentsDocument ReplacementUsing ModifiersUpsertsUpdating Multiple DocumentsReturning Updated DocumentsThe Fastest Write This Side of MississippiSafe OperationsCatching “Normal” ErrorsRequests and Connections2323242525262627363839414243434. Querying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Introduction to findSpecifying Which Keys to ReturnLimitationsQuery CriteriaQuery ConditionalsOR Queries notRules for ConditionalsType-Specific QueriesnullRegular ExpressionsQuerying ArraysQuerying on Embedded Documents where QueriesCursorsLimits, Skips, and SortsAvoiding Large SkipsAdvanced Query OptionsGetting Consistent ResultsCursor . Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65Introduction to IndexingScaling IndexesIndexing Keys in Embedded Documentsvi Table of Contentswww.it-ebooks.info656868

Indexing for SortsUniquely Identifying IndexesUnique IndexesDropping DuplicatesCompound Unique IndexesUsing explain and hintIndex AdministrationChanging IndexesGeospatial IndexingCompound Geospatial IndexesThe Earth Is Not a 2D Plane69696970707075767778796. Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81countdistinctgroupUsing a FinalizerUsing a Function as a KeyMapReduceExample 1: Finding All Keys in a CollectionExample 2: Categorizing Web PagesMongoDB and MapReduce8181828486868789907. Advanced Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93Database CommandsHow Commands WorkCommand ReferenceCapped CollectionsProperties and Use CasesCreating Capped CollectionsSorting Au NaturelTailable CursorsGridFS: Storing FilesGetting Started with GridFS: mongofilesWorking with GridFS from the MongoDB DriversUnder the HoodServer-Side Scriptingdb.evalStored JavaScriptSecurityDatabase ReferencesWhat Is a DBRef?Example 107107Table of Contents viiwww.it-ebooks.info

Driver Support for DBRefsWhen Should DBRefs Be Used?1081088. Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111Starting and Stopping MongoDBStarting from the Command LineFile-Based ConfigurationStopping MongoDBMonitoringUsing the Admin InterfaceserverStatusmongostatThird-Party Plug-InsSecurity and AuthenticationAuthentication BasicsHow Authentication WorksOther Security ConsiderationsBackup and RepairData File Backupmongodump and mongorestorefsync and LockSlave 211211211221231241249. Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127Master-Slave ReplicationOptionsAdding and Removing SourcesReplica SetsInitializing a SetNodes in a Replica SetFailover and Primary ElectionPerforming Operations on a SlaveRead ScalingUsing Slaves for Data ProcessingHow It WorksThe OplogSyncingReplication State and the Local DatabaseBlocking for ReplicationAdministrationDiagnosticsChanging the Oplog Sizeviii Table of 137137138138139139140141141141

Replication with Authentication14210. Sharding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143Introduction to ShardingAutosharding in MongoDBWhen to ShardThe Key to Sharding: Shard KeysSharding an Existing CollectionIncrementing Shard Keys Versus Random Shard KeysHow Shard Keys Affect OperationsSetting Up ShardingStarting the ServersSharding DataProduction ConfigurationA Robust ConfigMany mongosA Sturdy ShardPhysical ServersSharding Administrationconfig CollectionsSharding 15015015015211. Example Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155Chemical Search Engine: JavaInstalling the Java DriverUsing the Java DriverSchema DesignWriting This in JavaIssuesNews Aggregator: PHPInstalling the PHP DriverUsing the PHP DriverDesigning the News AggregatorTrees of CommentsVotingCustom Submission Forms: RubyInstalling the Ruby DriverUsing the Ruby DriverCustom Form SubmissionRuby Object Mappers and Using MongoDB with RailsReal-Time Analytics: PythonInstalling PyMongoUsing 65166167168168168Table of Contents ixwww.it-ebooks.info

MongoDB for Real-Time AnalyticsSchemaHandling a RequestUsing Analytics DataOther Considerations169169170170171A. Installing MongoDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173B. mongo: The Shell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177C. MongoDB Internals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183x Table of Contentswww.it-ebooks.info

ForewordIn the last 10 years, the Internet has challenged relational databases in ways nobodycould have foreseen. Having used MySQL at large and growing Internet companiesduring this time, I’ve seen this happen firsthand. First you have a single server with asmall data set. Then you find yourself setting up replication so you can scale out readsand deal with potential failures. And, before too long, you’ve added a caching layer,tuned all the queries, and thrown even more hardware at the problem.Eventually you arrive at the point when you need to shard the data across multipleclusters and rebuild a ton of application logic to deal with it. And soon after that yourealize that you’re locked into the schema you modeled so many months before.Why? Because there’s so much data in your clusters now that altering the schema willtake a long time and involve a lot of precious DBA time. It’s easier just to work aroundit in code. This can keep a small team of developers busy for many months. In the end,you’ll always find yourself wondering if there’s a better way—or why more of thesefeatures are not built into the core database server.Keeping with tradition, the Open Source community has created a plethora of “betterways” in response to the ballooning data needs of modern web applications. They spanthe spectrum from simple in-memory key/value stores to complicated SQL-speakingMySQL/InnoDB derivatives. But the sheer number of choices has made finding theright solution more difficult. I’ve looked at many of them.I was drawn to MongoDB by its pragmatic approach. MongoDB doesn’t try to be everything to everyone. Instead it strikes the right balance between features and complexity, with a clear bias toward making previously difficult tasks far easier. In otherwords, it has the features that really matter to the vast majority of today’s web applications: indexes, replication, sharding, a rich query syntax, and a very flexible datamodel. All of this comes without sacrificing speed.Like MongoDB itself, this book is very straightforward and approachable. NewMongoDB users can start with Chapter 1 and be up and running in no time. Experienced users will appreciate this book’s breadth and authority. It’s a solid reference foradvanced administrative topics such as replication, backups, and sharding, as well aspopular client APIs.xiwww.it-ebooks.info

Having recently started to use MongoDB in my day job, I have no doubt that this bookwill be at my side for the entire journey—from the first install to production deploymentof a sharded and replicated cluster. It’s an essential reference to anyone seriously looking at using MongoDB.—Jeremy ZawodnyCraigslist Software EngineerAugust 2010xii Forewordwww.it-ebooks.info

PrefaceHow This Book Is OrganizedGetting Up to Speed with MongoDBIn Chapter 1, Introduction, we provide some background about MongoDB: why it wascreated, the goals it is trying to accomplish, and why you might choose to use it for aproject. We go into more detail in Chapter 2, Getting Started, which provides an introduction to the core concepts and vocabulary of MongoDB. Chapter 2 also providesa first look at working with MongoDB, getting you started with the database andthe shell.Developing with MongoDBThe next two chapters cover the basic material that developers need to know to workwith MongoDB. In Chapter 3, Creating, Updating, and Deleting Documents, we describehow to perform those basic write operations, including how to do them with differentlevels of safety and speed. Chapter 4, Querying, explains how to find documents andcreate complex queries. This chapter also covers how to iterate through results andoptions for limiting, skipping, and sorting results.Advanced UsageThe next three chapters go into more complex usage than simply storing and retrievingdata. Chapter 5, Indexing, explains what indexes are and how to use them withMongoDB. It also covers tools you can use to examine or modify the indexes used toperform a query, and it covers index administration. Chapter 6, Aggregation, covers anumber of techniques for aggregating data with MongoDB, including counting, findingdistinct values, grouping documents, and using MapReduce. Chapter 7, AdvancedTopics, is a mishmash of important tidbits that didn’t fit into any of the previous categories: file storage, server-side JavaScript, database commands, and databasereferences.xiiiwww.it-ebooks.info

AdministrationThe next three chapters are less about programming and more about the operationalaspects of MongoDB. Chapter 8, Administration, discusses options for starting the database in different ways, monitoring a MongoDB server, and keeping deployments secure. Chapter 8 also covers how to keep proper backups of the data you’ve stored inMongoDB. In Chapter 9, Replication, we explain how to set up replication withMongoDB, including standard master-slave configuration and setups with automaticfailover. This chapter also covers how MongoDB replication works and options fortweaking it. Chapter 10, Sharding, describes how to scale MongoDB horizontally: itcovers what autosharding is, how to set it up, and the ways in which it impactsapplications.Developing Applications with MongoDBIn Chapter 11, Example Applications, we provide example applications usingMongoDB, written in Java, PHP, Python, and Ruby. These examples illustrate how tomap the concepts described earlier in the book to specific languages and probl

MongoDB users can start with Chapter 1 and be up and running in no time. Experi-enced users will appreciate this book’s breadth and authority. It’s a solid reference for advanced administrative topics such as replication, backups, and sharding, as well as popular client APIs. xi www.it-ebooks.info . Having recently started to use MongoDB in my day job, I have no doubt that this book will .