Transcription
PhoenixWe put the SQL back in NoSQLhttps://github.com/forcedotcom/phoenixJames com/
Agendal What/why HBase?Completed
AgendaWhat/why HBase?l What/why Phoenix?l Completed
AgendaWhat/why HBase?l What/why Phoenix?l How does PhoenixCompleted work?l
AgendaWhat/why HBase?l What/why Phoenix?l How does PhoenixCompleted work?l Demol
AgendaWhat/why HBase?l What/why Phoenix?l How does PhoenixCompleted work?l Demol Roadmapl
AgendaWhat/why HBase?l What/why Phoenix?l How does PhoenixCompleted work?l Demol Roadmapl Q&Al
What is HBase?l Developed as part of Apache HadoopCompleted
What is HBase?Developed as part of Apache Hadoopl Runs on top of HDFSl Completed
What is HBase?Developed as part of Apache Hadoopl Runs on top of HDFSl Key/value storeCompletedl
What is HBase?Developed as part of Apache Hadoopl Runs on top of HDFSl Key/value storeCompletedl Map
What is HBase?Developed as part of Apache Hadoopl Runs on top of HDFSl Key/value storeCompletedl MapDistributed
What is HBase?Developed as part of Apache Hadoopl Runs on top of HDFSl Key/value storeCompletedl MapDistributedSparse
What is HBase?Developed as part of Apache Hadoopl Runs on top of HDFSl Key/value storeCompletedl MapSortedDistributedSparse
What is HBase?Developed as part of Apache Hadoopl Runs on top of HDFSl Key/value storeCompletedl MapSortedDistributed ConsistentSparse
What is HBase?Developed as part of Apache Hadoopl Runs on top of HDFSl Key/value storeCompletedl MapSortedDistributed ConsistentSparseMultidimensional
Cluster Architecture
Sharding
Why Use HBase?l If you have lots of dataCompleted
Why Use HBase?l If you have lots of datal Scales linearlyCompleted
Why Use HBase?l If you have lots of datal Scales linearlyl Shards automaticallyCompleted
Why Use HBase?If you have lots of datal Scales linearlyl Shards automaticallyCompletedl If you can live without transactionsl
Why Use HBase?If you have lots of datal Scales linearlyl Shards automaticallyCompletedl If you can live without transactionsl If your data changesl
Why Use HBase?If you have lots of datal Scales linearlyl Shards automaticallyCompletedl If you can live without transactionsl If your data changesl If you need strict consistencyl
What is Phoenix?Completed
What is Phoenix?l SQL skin for HBaseCompleted
What is Phoenix?SQL skin for HBasel Alternate client APIl Completed
What is Phoenix?SQL skin for HBasel Alternate client APIl Embedded JDBCCompleteddriverl
What is Phoenix?SQL skin for HBasel Alternate client APIl Embedded JDBCCompleteddriverl Runs at HBase native speedl
What is Phoenix?SQL skin for HBasel Alternate client APIl Embedded JDBCCompleteddriverl Runs at HBase native speedl Compiles SQL into native HBase callsl
What is Phoenix?SQL skin for HBasel Alternate client APIl Embedded JDBCCompleteddriverl Runs at HBase native speedl Compiles SQL into native HBase callsl So you don’t have to!l
Cluster Architecture
Cluster ArchitecturePhoenix
Cluster ArchitecturePhoenixPhoenix
Phoenix Performance
Why Use Phoenix?
Why Use Phoenix?l Give folks an API they already knowCompleted
Why Use Phoenix?Give folks an API they already knowl Reduce the amount of code neededl Completed
Why Use Phoenix?Give folks an API they already knowl Reduce the amount of code neededl CompletedSELECT TRUNC(date,'DAY’),AVG(cpu)FROM web statWHERE domain LIKE 'Salesforce%’GROUP BY TRUNC(date,'DAY’)
Why Use Phoenix?Give folks an API they already knowl Reduce the amount of code neededl Perform optimizations transparentlyl Completed
Why Use Phoenix?Give folks an API they already knowl Reduce the amount of code neededl Perform optimizations transparentlyCompletedl Aggregationl Skip Scanl Secondary indexing (soon!)l
Why Use Phoenix?Give folks an API they already knowl Reduce the amount of code neededl Perform optimizations transparentlyCompletedl Leverage existing toolingl
Why Use Phoenix?Give folks an API they already knowl Reduce the amount of code neededl Perform optimizations transparentlyCompletedl Leverage existing toolingl SQL client/terminall OLAP enginel
How Does Phoenix Work?Overlays on top of HBase Data Modell Keeps Versioned Schema Respositoryl Query ProcessorCompletedl
Phoenix Data ModelPhoenix maps HBase data model to the relational worldHBase Table
Phoenix Data ModelPhoenix maps HBase data model to the relational worldHBase TableColumn Family AColumn Family B
Phoenix Data ModelPhoenix maps HBase data model to the relational worldHBase TableColumn Family AQualifier 1Qualifier 2Column Family BQualifier 3
Phoenix Data ModelPhoenix maps HBase data model to the relational worldHBase TableColumn Family AQualifier 1Row Key 1ValueQualifier 2Column Family BQualifier 3
Phoenix Data ModelPhoenix maps HBase data model to the relational worldHBase TableColumn Family AQualifier 1Row Key 1Row Key 2Column Family BQualifier 2Qualifier 3ValueValueValue
Phoenix Data ModelPhoenix maps HBase data model to the relational worldHBase TableColumn Family AQualifier 1Row Key 1Qualifier 2Qualifier 3ValueValueValueRow Key 2Row Key 3Column Family BValue
Phoenix Data ModelPhoenix maps HBase data model to the relational worldHBase TableHBase TableColumn Family AColumn Family AQualifier 1Qualifier 2Qualifier 1Qualifier 2Row Key 1Row Key 1Row Key 2Row Key 2Row Key 3Row Key 3Column Family BColumn Family BQualifier 3Qualifier 3ValueValueValueValueValueValueValueValue
Phoenix Data ModelPhoenix maps HBase data model to the relational worldHBase TableHBase TableHBaseColumn FamilyA TableRow Key 1Row Key 1Row Key 1Row Key 2Row Key 2Row Key 2Row Key 3Row Key 3Row Key 3Column Family AColumn Family AQualifier 1Qualifier 2Qualifier 1Qualifier 2Qualifier1Qualifier mn Family BColumn Family BColumn Family BQualifier 3Qualifier 3Qualifier 3ValueValueValue
Phoenix Data ModelPhoenix maps HBase data model to the relational worldHBase TableHBase TableHBaseColumn FamilyA TableRow Key 1Row Key 1Row Key 1Row Key 2Row Key 2Row Key 2Row Key 3Row Key 3Row Key 3Column Family AColumn Family AQualifier 1Qualifier 2Qualifier 1Qualifier 2Qualifier1Qualifier 2ValueValueValueValueValueValueMultiple VersionsValueValueValueColumn Family BColumn Family BColumn Family BQualifier 3Qualifier 3Qualifier 3ValueValueValue
Phoenix Data ModelPhoenix maps HBase data model to the relational worldPhoenix TableHBase TableColumn Family AQualifier 1Row Key 1Qualifier 2Qualifier 3ValueValueValueRow Key 2Row Key 3Column Family BValue
Phoenix Data ModelPhoenix maps HBase data model to the relational worldPhoenix TableHBase TableColumn Family AQualifier 1Row Key 1Qualifier 2Qualifier 3ValueValueValueRow Key 2Row Key 3Column Family BValueKey Value Columns
Phoenix Data ModelPhoenix maps HBase data model to the relational worldPhoenix TableHBase TableColumn Family AQualifier 1Row Key 1Qualifier 2Qualifier 3ValueValueValueRow Key 2Row Key 3Column Family BValueRow Key ColumnsKey Value Columns
Phoenix Metadatal Stored in a Phoenix HBase tableCompleted
Phoenix Metadatal Stored in a Phoenix HBase tablel SYSTEM.TABLECompleted
Phoenix Metadatal l Stored in a Phoenix HBase tableUpdated through DDL commandsCompleted
Phoenix Metadatal l Stored in a Phoenix HBase tableUpdated through DDL commandsl CREATE TABLEl ALTER TABLECompletedl DROP TABLEl CREATE INDEXl DROP INDEX
Phoenix Metadatal l l Stored in a Phoenix HBase tableUpdated through DDL commandsKeeps older versions as schema evolvesCompleted
Phoenix Metadatal l l l Stored in a Phoenix HBase tableUpdated through DDL commandsKeeps older versions as schema evolvesCorrelates timestamps between schema and dataCompleted
Phoenix Metadatal l l l Stored in a Phoenix HBase tableUpdated through DDL commandsKeeps older versions as schema evolvesCorrelates timestamps between schema and dataCompletedl Flashback queries useschema that was in-place then
Phoenix Metadatal l l l l Stored in a Phoenix HBase tableUpdated through DDL commandsKeeps older versions as schema evolvesCorrelates timestamps between schema and dataCompletedAccessible via JDBC metadataAPIs
Phoenix Metadatal l l l l Stored in a Phoenix HBase tableUpdated through DDL commandsKeeps older versions as schema evolvesCorrelates timestamps between schema and dataCompletedAccessible via JDBC metadataAPIsl java.sql.DatabaseMetaDatal Through Phoenix queries!
ExampleOver metrics data for clusters of servers with a schema like this:SERVER METRICSHOSTDATERESPONSE TIMEVARCHARDATEINTEGERGC TIMECPU TIMEIO TIME INTEGERINTEGERINTEGERRow Key
ExampleOver metrics data for clusters of servers with a schema like this:SERVER METRICSHOSTDATERESPONSE TIMEVARCHARDATEINTEGERGC TIMECPU TIMEIO TIME INTEGERINTEGERINTEGERKey Values
ExampleWith 90 days of data that looks like this:SERVER METRICSHOSTDATERESPONSE TIMEsf1.s1sf1.s1 sf3.s1sf3.s1sf7.s9 Jun 5 10:10:10.234Jun 5 11:18:28.4561234Jun 5 10:10:10.234Jun 6 12:46:19.123Jun 4 08:23:23.4562345GC TIME8012500223401234
ExampleWalk through query processing for three scenarios
ExampleWalk through query processing for three scenarios1. Chart Response Time Per Cluster
ExampleWalk through query processing for three scenarios1. Chart Response Time Per Cluster2. Identify 5 Longest GC Times
ExampleWalk through query processing for three scenarios1. Chart Response Time Per Cluster2. Identify 5 Longest GC Times3. Identify 5 Longest GC Times again and again
Scenario 1Chart Response Time Per ClusterSELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Scenario 1Chart Response Time Per ClusterSELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Scenario 1Chart Response Time Per ClusterSELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Scenario 1Chart Response Time Per ClusterSELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Scenario 1Chart Response Time Per ClusterSELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Step 1: ClientIdentify Row Key Ranges from QuerySELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3’, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)Row Key RangesHOSTDATE
Step 1: ClientIdentify Row Key Ranges from QuerySELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3’, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)Row Key RangesHOSTDATE
Step 1: ClientIdentify Row Key Ranges from QuerySELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3’, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)Row Key RangesHOSTDATE
Step 1: ClientIdentify Row Key Ranges from QuerySELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3’, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)Row Key RangesHOSTDATEsf1
Step 1: ClientIdentify Row Key Ranges from QuerySELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3’, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)Row Key RangesHOSTDATEsf1sf3
Step 1: ClientIdentify Row Key Ranges from QuerySELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3’, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)Row Key RangesHOSTDATEsf1sf3sf7
Step 1: ClientIdentify Row Key Ranges from QuerySELECT substr(host,1,3), trunc(date,’DAY’), avg(response time)FROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3’, ‘sf7’)CompletedGROUP BY substr(host, 1, 3), trunc(date,’DAY’)Row Key RangesHOSTDATEsf1t1 - *sf3sf7
Step 2: ClientOverlay Row Key Ranges with RegionsR1sf1R2Completedsf4sf1sf3R3sf6R4sf7
Step 3: ClientExecute Parallel R4sf7
Step 4: ServerFilter using Skip Scansf1.s1SKIPCompletedt0
Step 4: ServerFilter using Skip ScanINCLUDECompletedsf1.s1t1
Step 4: ServerFilter using Skip Scansf1.s2SKIPCompletedt0
Step 4: ServerFilter using Skip ScanINCLUDECompletedsf1.s2t1
Step 4: ServerFilter using Skip ScanSKIPsf1.s3t0
Step 4: ServerFilter using Skip ScanINCLUDEsf1.s3t1
Step 5: ServerIntercept Scan in CoprocessorSERVER METRICSSERVER METRICSHOSTDATEHOSTDATEAGGsf1.s1Jun 2 10:10:10.234sf1Jun 1 sf1.s2Jun 3 23:05:44.975sf1Jun 2 sf1.s2Jun 9 08:10:32.147sf1Jun 3 sf1.s3Jun 1 11:18:28.456sf1Jun 8 sf1.s3Jun 3 22:03:22.142sf1Jun 9 sf1.s4Jun 1 10:29:58.950sf1.s4Jun 2 14:55:34.104sf1.s4Jun 3 12:46:19.123sf1.s5Jun 8 08:23:23.456sf1.s6Jun 1 10:31:10.234
Step 6: ClientPerform Final Merge SortR1R2SERVER METRICSscan1Completedscan2R3R4scan3HOSTDATE AGGsf1Jun 5 sf1Jun 9 sf3Jun 1 sf3Jun 2 sf7Jun 1 sf7Jun 8
Scenario 2Find 5 Longest GC TimesSELECT host, date, gc timeFROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3, ‘sf7’)CompletedORDER BY gc time DESCLIMIT 5
Scenario 2Find 5 Longest GC Times Same client parallelization and server skip scan filtering
Scenario 2Find 5 Longest GC Times Same client parallelization and server skip scan filteringServer holds 5 longest GC TIME value for each scanCompletedR1SERVER METRICSHOSTDATEGC TIMEsf1.s1Jun 2 10:10:10.23422123sf1.s1Jun 3 23:05:44.97519876sf1.s1Jun 9 08:10:32.14711345sf1.s2Jun 1 11:18:28.45610234sf1.s2Jun 3 22:03:22.14210111
Scenario 2Find 5 Longest GC Times Same client parallelization and server skip scan filteringServer holds 5 longest GC TIME value for each scanClient performs final merge sort among parallel scansSERVER METRICSScan1Scan2Scan3HOSTDATEGC TIMEsf1.s1Jun 2 10:10:10.23422123sf1.s1Jun 3 23:05:44.97519876sf1.s1Jun 9 08:10:32.14711345sf1.s2Jun 1 11:18:28.45610234sf1.s2Jun 3 22:03:22.14210111
Scenario 3Find 5 Longest GC TimesCREATE INDEX gc time indexON server metrics (gc time DESC, date DESC)INCLUDE (host, response time)Completed
Scenario 3Find 5 Longest GC TimesCREATE INDEX gc time indexON server metrics (gc time DESC, date DESC)INCLUDE (host, response time)Completed
Scenario 3Find 5 Longest GC TimesCREATE INDEX gc time indexON server metrics (gc time DESC, date DESC)INCLUDE (host, response time)Completed
Scenario 3Find 5 Longest GC TimesCREATE INDEX gc time indexON server metrics (gc time DESC, date DESC)INCLUDE (host, response time)CompletedGC TIME INDEXGC TIMEINTEGERDATEDATEHOSTVARCHARRESPONSE TIME INTEGERRow Key
Scenario 3Find 5 Longest GC TimesCREATE INDEX gc time indexON server metrics (gc time DESC, date DESC)INCLUDE (host, response time)CompletedGC TIME INDEXGC TIMEINTEGERDATEDATEHOSTVARCHARRESPONSE TIME INTEGERKey Value
Scenario 3Find 5 Longest GC TimesSELECT host, date, gc timeFROM server metricsWHERE date CURRENT DATE() – 7AND substr(host, 1, 3) IN (‘sf1’,‘sf3, ‘sf7’)CompletedORDER BY gc time DESCLIMIT 5
DemoPhoenix Stock Analyzerl Fortune 500 companiesl 10 years of historical stock pricesCompletedl Demonstrates Skip Scan in actionl Running locally on my single nodelaptop clusterl
Phoenix RoadmapSecondary Indexingl Count distinct and percentilel Derived tablesCompletedl Hash Joinsl Apache Drill integrationl Cost-based query optimizerl OLAP extensionsl Transactionsl
Thank you!Questions/comments?
Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse Multidimensional. Cluster Architecture . Sharding . Why Use HBase? Completed ! If you have lots of data . Why Use HBase?