MongoDB Applied Design Patterns - Pepa.holla.cz

Transcription

www.allitebooks.com

www.allitebooks.com

MongoDB AppliedDesign PatternsRick Copelandwww.allitebooks.com

MongoDB Applied Design Patternsby Rick CopelandCopyright 2013 Richard D. Copeland, Jr. All rights reserved.Printed in the United States of America.Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions arealso available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com.Editors: Mike Loukides and Meghan BlanchetteProduction Editor: Kristen BorgCopyeditor: Kiel Van HornProofreader: Jasmine KwitynMarch 2013:Indexer: Jill EdwardsCover Designer: Karen MontgomeryInterior Designer: David FutatoIllustrator: Kara EbrahimFirst EditionRevision History for the First Edition:2013-03-01:First releaseSee http://oreilly.com/catalog/errata.csp?isbn 9781449340049 for release details.Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’ReillyMedia, Inc. MongoDB Applied Design Patterns, the image of a thirteen-lined ground squirrel, and relatedtrade dress are trademarks of O’Reilly Media, Inc.Many of the designations used by manufacturers and sellers to distinguish their products are claimed astrademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trade‐mark claim, the designations have been printed in caps or initial caps.While every precaution has been taken in the preparation of this book, the publisher and author assume noresponsibility for errors or omissions, or for damages resulting from the use of the information containedherein.ISBN: 978-1-449-34004-9[LSI]www.allitebooks.com

Table of ContentsPreface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiPart I. Design Patterns1. To Embed or Reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Relational Data Modeling and NormalizationWhat Is a Normal Form, Anyway?So What’s the Problem?Denormalizing for PerformanceMongoDB: Who Needs Normalization, Anyway?MongoDB Document FormatEmbedding for LocalityEmbedding for Atomicity and IsolationReferencing for FlexibilityReferencing for Potentially High-Arity RelationshipsMany-to-Many RelationshipsConclusion34678899111213142. Polymorphic Schemas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Polymorphic Schemas to Support Object-Oriented ProgrammingPolymorphic Schemas Enable Schema EvolutionStorage (In-)Efficiency of BSONPolymorphic Schemas Support Semi-Structured Domain DataConclusion17202122233. Mimicking Transactional Behavior. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25The Relational Approach to ConsistencyCompound DocumentsUsing Complex Updates252628iiiwww.allitebooks.com

Optimistic Update with CompensationConclusion2933Part II. Use Cases4. Operational Intelligence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Storing Log DataSolution OverviewSchema DesignOperationsSharding ConcernsManaging Event Data GrowthPre-Aggregated ReportsSolution OverviewSchema DesignOperationsSharding ConcernsHierarchical AggregationSolution OverviewSchema DesignMapReduceOperationsSharding Concerns37373839485052525359636364656567725. Ecommerce. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75Product CatalogSolution OverviewOperationsSharding ConcernsCategory HierarchySolution OverviewSchema DesignOperationsSharding ConcernsInventory ManagementSolution OverviewSchemaOperationsSharding Concerns757580838484858690919192931006. Content Management Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101iv Table of Contentswww.allitebooks.com

Metadata and Asset ManagementSolution OverviewSchema DesignOperationsSharding ConcernsStoring CommentsSolution OverviewApproach: One Document per CommentApproach: Embedding All CommentsApproach: Hybrid Schema DesignSharding Concerns1011011021041101111111111141171197. Online Advertising Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121Solution OverviewDesign 1: Basic Ad ServingSchema DesignOperation: Choose an Ad to ServeOperation: Make an Ad Campaign InactiveSharding ConcernsDesign 2: Adding Frequency CappingSchema DesignOperation: Choose an Ad to ServeShardingDesign 3: Keyword TargetingSchema DesignOperation: Choose a Group of Ads to Serve1211211221231231241241241251261261271278. Social Networking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129Solution OverviewSchema DesignIndependent CollectionsDependent CollectionsOperationsViewing a News Feed or Wall PostsCommenting on a PostCreating a New PostMaintaining the Social GraphSharding1291301301321331341351361381399. Online Gaming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141Solution OverviewSchema Design141142Table of Contentswww.allitebooks.com v

Character SchemaItem SchemaLocation SchemaOperationsLoad Character Data from MongoDBExtract Armor and Weapon Data for DisplayExtract Character Attributes, Inventory, and Room Information for DisplayPick Up an Item from a RoomRemove an Item from a ContainerMove the Character to a Different RoomBuy an terword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155vi Table of Contentswww.allitebooks.com

PrefaceWhether you’re building the newest and hottest social media website or developing aninternal-use-only enterprise business intelligence application, scaling your data modelhas never been more important. Traditional relational databases, while familiar, presentsignificant challenges and complications when trying to scale up to such “big data”needs. Into this world steps MongoDB, a leading NoSQL database, to address thesescaling challenges while also simplifying the process of development.However, in all the hype surrounding big data, many sites have launched their businesson NoSQL databases without an understanding of the techniques necessary to effec‐tively use the features of their chosen database. This book provides the much-neededconnection between the features of MongoDB and the business problems that it is suitedto solve. The book’s focus on the practical aspects of the MongoDB implementationmakes it an ideal purchase for developers charged with bringing MongoDB’s scalabilityto bear on the particular problem you’ve been tasked to solve.AudienceThis book is intended for those who are interested in learning practical patterns forsolving problems and designing applications using MongoDB. Although most of thefeatures of MongoDB highlighted in this book have a basic description here, this is nota beginning MongoDB book. For such an introduction, the reader would be well-servedto start with MongoDB: The Definitive Guide by Kristina Chodorow and Michael Dirolf(O’Reilly) or, for a Python-specific introduction, MongoDB and Python by Niall O’Hig‐gins (O’Reilly).Assumptions This Book MakesMost of the code examples used in this book are implemented using either the Pythonor JavaScript programming languages, so a basic familiarity with their syntax is essentialto getting the most out of this book. Additionally, many of the examples and patternsviiwww.allitebooks.com

are contrasted with approaches to solving the same problems using relational databases,so basic familiarity with SQL and relational modeling is also helpful.Contents of This BookThis book is divided into two parts, with Part I focusing on general MongoDB designpatterns and Part II applying those patterns to particular problem domains.Part I: Design PatternsPart I introduces the reader to some generally applicable design patterns in MongoDB.These chapters include more introductory material than Part II, and tend to focus moreon MongoDB techniques and less on domain-specific problems. The techniques de‐scribed here tend to make use of MongoDB distinctives, or generate a sense of “hey,MongoDB can’t do that” as you learn that yes, indeed, it can.Chapter 1: To Embed or ReferenceThis chapter describes what kinds of documents can be stored in MongoDB, andillustrates the trade-offs between schemas that embed related documents withinrelated documents and schemas where documents simply reference one another byID. It will focus on the performance benefits of embedding, and when the com‐plexity added by embedding outweighs the performance gains.Chapter 2: Polymorphic SchemasThis chapter begins by illustrating that MongoDB collections are schemaless, withthe schema actually being stored in individual documents. It then goes on to showhow this feature, combined with document embedding, enables a flexible and ef‐ficient polymorphism in MongoDB.Chapter 3: Mimicking Transactional BehaviorThis chapter is a kind of apologia for MongoDB’s lack of complex, multidocumenttransactions. It illustrates how MongoDB’s modifiers, combined with documentembedding, can often accomplish in a single atomic document update what SQLwould require several distinct updates to achieve. It also explores a pattern for im‐plementing an application-level, two-phase commit protocol to provide transac‐tional guarantees in MongoDB when they are absolutely required.Part II: Use CasesIn Part II, we turn to the “applied” part of Applied Design Patterns, showing several usecases and the application of MongoDB patterns to solving domain-specific problems.Each chapter here covers a particular problem domain and the techniques and patternsused to address the problem.viii Prefacewww.allitebooks.com

Chapter 4: Operational IntelligenceThis chapter describes how MongoDB can be used for operational intelligence, or“real-time analytics” of business data. It describes a simple event logging system,extending that system through the use of periodic and incremental hierarchicalaggregation. It then concludes with a description of a true real-time incrementalaggregation system, the Mongo Monitoring Service (MMS), and the techniques andtrade-offs made there to achieve high performance on huge amounts of data overhundreds of customers with a (relatively) small amount of hardware.Chapter 5: EcommerceThis chapter begins by describing how MongoDB can be used as a product catalogmaster, focusing on the polymorphic schema techniques and methods of storinghierarchy in MongoDB. It then describes an inventory management system thatuses optimistic updating and compensation to achieve eventual consistency evenwithout two-phase commit.Chapter 6: Content Management SystemsThis chapter describes how MongoDB can be used as a backend for a content man‐agement system. In particular, it focuses on the use of polymorphic schemas forstoring content nodes, the use of GridFS and Binary fields to store binary assets,and various approaches to storing discussions.Chapter 7: Online Advertising NetworksThis chapter describes the design of an online advertising network. The focus hereis on embedded documents and complex atomic updates, as well as making surethat the storage engine (MongoDB) never becomes the bottleneck in the ad-servingdecision. It will cover techniques for frequency capping ad impressions, keywordtargeting, and keyword bidding.Chapter 8: Social NetworkingThis chapter describes how MongoDB can be used to store a relatively complexsocial graph, modeled after the Google product, with users in various circles, al‐lowing fine-grained control over what is shared with whom. The focus here is onmaintaining the graph, as well as categorizing content into various timelines andnews feeds.Chapter 9: Online GamingThis chapter describes how MongoDB can be used to store data necessary for anonline, multiplayer role-playing game. We show how character and world data canbe stored in MongoDB, allowing for concurrent access to the same data structuresfrom multiple players.Preface ix

Conventions Used in This BookThe following typographical conventions are used in this book:ItalicIndicates new terms, URLs, email addresses, filenames, and file extensions.Constant widthUsed for program listings, as well as within paragraphs to refer to program elementssuch as variable or function names, databases, data types, environment variables,statements, and keywords.Constant width boldShows commands or other text that should be typed literally by the user.Constant width italicShows text that should be replaced with user-supplied values or by values deter‐mined by context.This icon signifies a tip, suggestion, or general note.This icon indicates a warning or caution.Using Code ExamplesThis book is here to help you get your job done. In general, if this book includes codeexamples, you may use the code in this book in your programs and documentation. Youdo not need to contact us for permission unless you’re reproducing a significant portionof the code. For example, writing a program that uses several chunks of code from thisbook does not require permission. Selling or distributing a CD-ROM of examples fromO’Reilly books does require permission. Answering a question by citing this book andquoting example code does not require permission. Incorporating a significant amountof example code

This chapter describes how MongoDB can be used to store data necessary for an online, multiplayer role-playing game. We show how character and world data can be stored in MongoDB, allowing for concurrent access to the same data structures from multiple players. Preface ix