Slick DataSharding - Drupal

Transcription

SlickDataSharding y Hagler, Phase2 Technology

Don'tForget.Official DrupalCon London PartyBatman Live World Arena TourBuses leave main entranceFairfield Halls at 4pm

Overview Purpose – Reasons for shardingProblems/Examples of a need for shardingTypes of scaling and shardingSharding options in Drupal

Scale: HorizontalvsVerticalHorizontal ScaleAdd more machines of the same typeVertical ScaleBigger and badder machines

Sharding What is sharding?Types of sharding – Partitioning and FederationHow sharding helpsVs. typical monolithic Drupal database

WhatIssharding?Simply put, sharding isphysically breaking largedata into smallerpieces (shards) of data.The trick is putting themback together again

ReasonsforSharding Sharding for scaling your application Sharding for shared application data Leveraging specialized technologies Caching is a form of federated sharding

HowShardingHelps Scale your applications by reducing data sets inany single database Secure sensitive data by isolating it elsewhere Segregates data

BeSureYou'veTriedEverythingElse MemcachedBoost ModuleLoad balanced web serversMySQL Master/Slave replicateTurning Views into Custom Queries

MoreThingsToTry. Moar memory!Move .htacess to vhost configApache tunesMySQL tunesReplace search with Apache SolrOptimizing PHP (custom compile)Apache Drupal moduleReplace Apache with nginxSwitched to 3rd party services for commentsReplace contrib modules with custom development

Typical Balanced Environment

TypesofshardingPartitioning Horizontal Divides something intotwo parts Unshuffle Reduced index size Hard to doFederation VerticalA set of thingsUses logical divisionsSplit up acrossphysically differentmachines

HorizontalPartitioningScaling your application’s performanceDistributed data loadThis is the Shard of Last Resort

Even/OddPartitions This is not Master/Master replication Rows are divided between physical databases Will require custom database API to properlyachieve split rows Applies to node loads, entity loads, etc Achieved by auto increment by N with differentstarting offsets and application distributes writes inround-robin fashion and via keyed mechanisms to distributereads and reassemble data

Horizontally Partitioned Databases

FederationVertically partitioning data by logical affiliationSharding for shared application dataManageability – distributing data setsSecurity - Allows for exposing certain bits of datato other applications without exposing all

Vertically Scaled Databases

ApplicationShardingNot just sharding dataShard the components of your site

SampleUseCasesCollecting resumes within your existing siteBuilding an ideation tool

ShardingResumeData Accepting resumes for a large corporation Users submit resume via Webform Submit and process data into separatedatabase Resume data is processed by internal HRsoftware to evaluate potential employees

ShardingSchemasSame physical database, different schemasUses database prefixing in settings.php or Different physical databasesUses db set active to switch db connections

DatabasePrefixes Handled in settings.php Uses MySQL’s dot separator to target differentschemas Requires that the MySQL user used by Drupalhas proper permissions Ex: db 1.users and db 2.users

DatabasePrefixes Drupal6 db prefix rs roles''profile fields''profile values');( '','shared .','shared .','shared .','shared .','shared .','shared .','shared .',

DatabasePrefixes Drupal7 databases array ('default' array ('default' array ('prefix' array('default' '','users' 'sessions' 'role' 'authmap' 'users roles' ),),),);'shared .','shared .','shared .','shared .','shared .',

DatabasePrefixes Tips,Tricks,andCaveatsCan share user data between Drupal and Drupal 7with table alters and strict prevention of Drupal 7logins or user savesShould log in with the lower version of Drupal

DifferentPhysicalDatabases Set up additional connections in settings.phpChange connections using db set active()Use db set active() to switch back when doneWatch for schema caching and watchdogerrors

DifferentDatabases Drupal6 db url array ('default' 'mysql://user:pass@host1/db1','second' 'mysql://user:pass@host2/db2','third' 'mysql://user:pass@host3/db3',);

DatabasePrefixes Drupal7 other database'database' 'username' 'password' 'host' 'driver’ ); array 'mysql',Database::addConnectionInfo(’moduleKey', 'default', other database);db set active('moduleKey');// Execute queriesdb set active();

SwitchingDatabases schema drupal get schema('table name');db set active('database key');// Execute queriesDrupal write record('table name', data);db set active();

SavingDatainAnotherDatabase Hook install schema()drupal write record()Keeps web site database smallerCan keep sensitive data offsitePartitioned tables can limit/protect your website database from internal users

SavingDatainAnotherDatabase Resume data is submitted via form Form’s submit function accepts final data Schema loads table definition Connects to the HR instance of MySQL Writes new record Uploads any files to private file space Switches database back HR Director can query new resumes

UsingMongoDBMongoDB is a NoSQL database“Schema-less” – data schema defined in codeFastDocument-basedSimpler to scale vertically than MySQL

MongoUK10gen Conference in London, UKSeptember 19, 201110gen.com/conferences/mongouk-sept-2011

MongoDBandDrupaldrupal.org/project/mongodb7.x allows for field storage, cache, sessions,and blocks to be stored in MongoDBAllows for connections to your own collections

MongoDBData Four levels of objects Connection Database (schema) Collection Cursor (query results) Non-relational database Collections tend to be denormalized

MongoDBDocumentsResumes.Resume: {first name: "John",last name: "Smith",title: "Web Developer",address: {city: "London",country: "UK"},skills: [ 'PHP', 'Drupal', 'MySQL' ],ssn: 123456789,}

Querying MongoDBDocuments applicant applicants- find (array ('username' 'Smith',’ssn': 1,),array ('first name’ 1,'last name’ 1,),);

MongoDB SharingviaREST Simple REST – included as part of MongoDB Sleepy Mongoose – REST interface forMongoDB (Python) MongoDB REST (Node.js)

Ideation RESTInterfaceGet a list of all idea documentshttp://127.0.0.1:28017/ideation/ideas/Get all comments for a specific ideahttp://127.0.0.1:28017/ideation/comments/ ?filter id 4a8acf6e7fbadc242de5b4f3 &limit 10&offset 20Will likely need a dedicated MongoDB REST inteface

ApplicationsonSeparateWebTiers Application sharding is data shardingSeparate Drupal instancesUse mod proxy as a pass-throughCan used multiple load-balanced environments

Proxied Web Clusters

Questions?

6050d.o: tobbySlides: agileapproach.com

With Drupal Tobby Hagler, Phase2 Technology. Don't Forget. Official DrupalCon London Party . Caching is a form of federated sharding. How Sharding Helps . Will require custom database API to properly achieve split rows Applies to node loads, entity loads, etc .