Deepgreen MPP With FPGA: A Supercharged Greenplum Data Warehouse Solution

Transcription

Deepgreen MPP with FPGA:A supercharged Greenplum Data Warehouse solutionPresented ByFeng TianFounder

Agenda: Building Next Gen Data Analytics Platform Why NOW? How to Accelerate Data Warehouse with FPGA Use Cases

It’s Time for a complete rewrite New Application Landscape Rich DataTextIoT, GeospatialMedia Intelligent DataQuery getting more complexGeospatialMachine learning/Data miningAI/Deep learningACM Turing Award Winner

Time for a complete rewrite: Hardware Trend Storage HierarchyBig MemorySSD Plenty of Bandwidth Network10, 100 GigE is common Plenty of Bandwidth Today, most Data Workload is bottlenecked on CPU FPGA can relief CPU

A New Golden Age for Computer Architecture Domain Specific Hardware/Software Co-Design Enhanced Security Open Instruction Set Agile Chip Development

Data, Data Everywhere

Actionable Insight

Deepgreen DB: a better Greenplum Data Warehouse GreenplumField tested with widespread adoption in Telco, Financial, Government, Retails, DeepgreenWe squeezed every bit of juice out of x86 CPUs100% compatibleZero code-change to switchComplete rewrite of Query Execution Engine‒ LLVM JIT‒ SIMD‒ Switch binary (without reloading data) to get 3-5x performance boost.‒ Now even faster with FPGA!

Deepgreen: FPGA Hardware Acceleration LLVM JIT SIMDWe squeezed CPU dry Next frontier:FPGA

FPGA In DeepgreenChallengesOur Approach Memory is big, but not big enough Identify the bottleneck Throughput vs Latency New algorithm tuned for FPGA Multi-CPU/Core Offload to FPGA, none preemptive Multiuser environment XLIW: eXtra Long Instruction Word

XLIW: eXtra Long Instruction WordXLIWKXLIWXLIW: HasherData, Data, Data IWXLIW: HasherData, Data, Data KKernel

Use Case 1: Hash JoinSELECT count(*)FROM lineitem L1, orders OWHERE O.o orderkey L1.l orderkeyHashJoinAND EXISTS (SELECT *FROM lineitem L2WHERE L2.l orderkey L1.l orderkeyAND L2.l suppkey L1.l suppkeyHashJoinL2)L1O

Use Case 1: Hash Join ImplementationHash Join HJ Algorithm (expressed trivially):1. Scan left side and build hash table2. Scan right side, and probe hash table3. Output all hits Lots of records joined Hash table is not cache friendlyXLIW for Hash Join Pack a lot of records of left side, send to FPGA tocompute hashes Instead of using hash table, we sort the hashesusing a very fast radix sort. (10x faster thanquicksort) Pack a lot of records from right side, send to FPGAto compute hashes. Sort with radix sort Merge It is a hybrid hash/sort merge join

Use Case 1: Hash Join PerformanceTPCH Q17 and Q20 on AWS F1Time (sec)2502001501002.7X13X faster502X7X faster0Q17GreenplumQ20DeepgreenDeepgreen XLIW

Use Case 2: GeoSpatial Join/* count devices covered by each cell tower */SELECT t.tower id, count(*)FROM towers t, devices dWHERE ST intersects(t.area, d.location)GROUP BY t.tower id

Use Case 2: GeoSpatial JoinGreenplum PostGISGeoSpatial Join XLIW Do not use index PostGIS is the GeoSpatial extension ofPostgreSQL/Greenplum/Deepgreen Naïve Join will never finish Scan outer loop, build an in-memory data structure Still expensive operation, but cheaper thancompute intersection (like building an R-tree) Scan inner loop, probing the in memory data Build index (R-tree)structure (like probing R-tree) Index Nestloop Joino For each polygon, using index to lookup points Check intersection This step is dominating execution timenearby Build/Pack XLIW instruction, send to FPGAo Check the intersects condition

Use Case 2: PerformanceTime (sec)Geospatial JOIN900800700600500400300200 50X faster1000CPU onlyCPU FPGA

Use Case 3: Adding Intelligence (Available Soon) An XLIW for data mining/machine learning Deepgreen Transducer FrameworkAllow user to embed C/Java/Go/Python code in SQLInterleaved with SQL Engine codeFirst class citizen, optimized by query optimizer, executed in parallel, streaming data to/fromSQL query operators like Sort/Join/Aggregate ML libraries, Tensor FlowFor example, Deep Neural Network in FPGA

Current Status and Future Directions Deepgreen DB on AWS F1See our demoOn AWS Market Place (2018) On-prem deployment with Alveo Accelerator CardLooking for early customers We are just scratching the surfaceMore use cases, endless opportunitiesMore to squeeze

Adaptable.Intelligent.

PostgreSQL/Greenplum/Deepgreen Naïve Join will never finish Build index (R-tree) Index Nestloop Join o For each polygon, using index to lookup points nearby o Check the intersects condition Do not use index Scan outer loop, build an in-memory data structure Still expensive operation, but cheaper than