PostgreSQL As GPU Database For Real-Time Analytics

Transcription

Vortrag, Swiss PUG, Zürich, 9. November 2017PostgreSQL as GPU Databasefor Real-Time AnalyticsProf. Stefan Keller, IFS / Geometa Lab HSR(Slides CC-BY)

About ScalabilityScale-up Vertical Add more HW-components (homo- or heterogeneous) Expensive(?) No open source, platform lock-in(?)Scale-out Horizontal Cheap commodity HW as „nodes‟ Flexibly add more nodes Open source Need to relax constraints, even ACID (BASE)?Stefan Keller, "PostgreSQL as GPU Database."2

GPU DatabasesStefan Keller, "PostgreSQL as GPU Database."

GPU DatabasesMore and more GPUs and Memory Bandwith Use Cases: Analytical - not transactional OLTP OLAP Hybrid transactional/analytical processing(HTAP) No need to move data to warehouseSetting: Single-node much simpler to maintain Discrete GPU (rather than FPGA, speciality chips)CPU vs. GPU: CPU is suited for low latency, complex data ops GPU is suited for troughput of homogeneous ops4

GPU Database ReferenceArchitectureMaster of Science in Engineering (MSE)Stefan Keller, "PostgreSQL as GPU Database."

PaperPaper by Heime, Siegmund, Bellatreche, Saake (Universitiesof Magdeburg, Berlin, Passau, Futuroscope/France) onGPU-accelerated database systems: Survey and openchallenges inTransactions on Large-Scale Data-and Knowledge-CenteredSystems XV. Springer Berlin Heidelberg, 2014. Pages 1-35.Weblink: http://bit.ly/1rMOuZC (pdf)Contents: Design Choices Evaluation of 8 GDBMS Reference architecture Insights for all co-processors6

OverviewExemplary architecture of a system with a graphics card:Stefan Keller, "PostgreSQL as GPU Database."7

Architecture of GPU-aware DBMSsDesign choices/space of GPU-aware DBMSsStefan Keller, "PostgreSQL as GPU Database."8

PG-Strom / PostgreSQLStefan Keller, "PostgreSQL as GPU Database."

PostgreSQL - www.postgresql.org “The world's most advanced open source database”.Open source aka BSD/MIT licensePostgreSQL 10 Released October 2017 (since 2002)Fully ACID compliant object-relational database systemReputation for reliability, data integrity, and correctnessBroad communityRuns on all major operating systemsBroad support of SQL and data typesScalable in quantity of data and concurrent usersExtensible: Modules (EXTENSION, Network), ForeignData Wrappers (SQL/MED), Language APIsStefan Keller, "PostgreSQL as GPU Database."10

PG-StromPG-Strom - http://strom.kaigai.gr.jp/ - Version 1.0 “Limit breaker of PostgreSQL” Extension module to accelerate SQL workloads using multi-thousandscores and high bandwidth memory. Open source GPLv2. Requirements PostgreSQL 9.5CUDAMain use cases In-database analytics: realt-time statistics Rapid batch processing: ETL/ELTMain SW architecture design decisions: Heterogeneous scale-up On-the-fly native GPU code generation Asynchronous pipeline execution modeStefan Keller, "PostgreSQL as GPU Database."11

PG-Strom: SW architectureStefan Keller, "PostgreSQL as GPU Database."12

PG-Strom: OverviewSource: http://strom.kaigai.gr.jp/Stefan Keller, "PostgreSQL as GPU Database."13

PG-Strom: Overview ff.Stefan Keller, "PostgreSQL as GPU Database."14

PG-Strom: Features - Data typesData Types: Numeric: ; Date/Time: ; Others: bool, money Text: Limits on text and varchar(x) "GPU cannot process compressed or TOAST'ed data" "ALTER TABLE . SET STORAGE PLAIN" or MAIN Not supported: geometry, geography (PostGIS) See Reference:Data Types for detailsInternals: Custom Scan Provider, can.htmlStefan Keller, "PostgreSQL as GPU Database."15

PG-Strom: Features – SQL workloadsFull Table Scan with scan qualifiers, GPU runs evaluation of scan qualifierand filter out invisible rows Tables Join Parallel version of hash-join algorithm and simple (noneparameterized) nest-loop algorithm are supported Group By/Aggregation GPU runs pre-processing of aggregate operations, toreduce the number of rows to be processed by CPU .Projection When SQL query contains complicated mathematicalformulas, GPU runs calculation of these expression on thedevice, then CPU just references the calculated resultsStefan Keller, "PostgreSQL as GPU Database."16

PG-Strom: Limits Latency 0.2-0.3 sec to initialize GPU device Max. concurrent sessions up to 3-5 Database size: 10 GB data in shared buffer of PostgreSQL, or disk cacheof operating system Tipp: Use pg prewarmSee http://strom.kaigai.gr.jp/install.htmlStefan Keller, "PostgreSQL as GPU Database."17

PG-Strom: Performance Estimations: RDBMS GPU factor 3 Columnar In-Memory factor 10 Pure GPU factor 100 Benchmarks See next slides See Seminar 22. January 2018, 14-16h, HSR RapperswilStefan Keller, "PostgreSQL as GPU Database."18

Stefan Keller, "PostgreSQL as GPU Database."19

PG-Strom: Further developmentVersion 1.x More concurrent sessions Data size: SSD collaboration feature at v2.0 PostGIS?Where is it compared to the Rerefence Architecture? Stefan Keller, "PostgreSQL as GPU Database."20

GPU Databases - öffentlichePräsentationen im SeminarDatabase Systems der HSRMaster of Science in Engineering (MSE)Stefan Keller, "PostgreSQL as GPU Database."

SeminarSW: PG-Storm 1.0 / PostgreSQL 9.5 MapD Open Source Edition PostgreSQL 10, TunedHW: Commodity Server („Pizzabox“) IBM Power8 Server („Pizzabox“)Data, Benchmarks, Docker-Files rDatenbanksystemeHS171822

SeminarBenchmarks: Cold start PG-Storm, MapD, PostgreSQL ( 3x) Warm start PG-Storm, MapD, PostgreSQL ( 3x)Presentations: 4 students German spoken, english reportFinal (public) presentations: 22. January 2018, 14-16h HSR Rapperswil, Room 8.125 Registration: http://techup.ch/tag/htapStefan Keller, "PostgreSQL as GPU Database."23

DiscussionCreditsKohei KaiGaiStefan KellerGeometa Lab at Institute for SoftwareHSR Hochschule für Technik Rapperswilwww.hsr.ch/geometalab@sfkellerStefan Keller, "PostgreSQL as GPU Database."24

PostgreSQL - www.postgresql.org "The world's most advanced open source database". Open source aka BSD/MIT license PostgreSQL 10 Released October 2017 (since 2002) Fully ACID compliant object-relational database system Reputation for reliability, data integrity, and correctness Broad community