Presto: The Definitive Guide - Starburst PDF Free Download

1y ago

22 Views

1 Downloads

8.71 MB

311 Pages

Report/dmca

Download PDF

Transcription

PrestoThe Definitive GuideSQL at Any Scale, on Any Storage,in Any EnvironmentCompliments ofMatt Fuller,Manfred Moser& Martin Traverso

Virtual Book TourStarburst presents Presto: The Definitive GuideRegister Now!Starburst is hosting a virtual booktour series where attendees will:Meet the authors: Meet the authors from the comfortof your own homeMattFuller Meet the Presto creators andparticipate in an Ask Me Anything(AMA) session with the bookauthors Presto creators Meet special guest speakers fromyour favorite podcasts who willmoderate the AMARegister here to save your spot.ManfredMoserMartinTraverso

Praise for Presto: The Definitive GuideThis book provides a great introduction to Presto and teaches you everythingyou need to know to start your successful usage of Presto.—Dain Sundstrom and David Phillips, Creators of the PrestoProjects and Founders of the Presto Software FoundationPresto plays a key role in enabling analysis at Pinterest. This book covers the Prestoessentials, from use cases through how to run Presto at massive scale.—Ashish Kumar Singh, Tech Lead,Bigdata Query Processing Platform, PinterestPresto has set the bar in both community-building and technical excellence for lightningfast analytical processing on stored data in modern cloud architectures. This book isa must-read for companies looking to modernize their analytics stack.—Jay Kreps, Cocreator of Apache Kafka,Cofounder and CEO of ConfluentPresto has saved us all—both in academia and industry—countless hours of work,allowing us all to avoid having to write code to manage distributed query processing.We’re so grateful to have a high-quality open source distributed SQL engine to startfrom, enabling us to focus on innovating in new areas instead of reinventingthe wheel for each new distributed data system project.—Daniel Abadi, Professor of Computer Science,University of Maryland, College Park

Presto: The Definitive GuideSQL at Any Scale, on Any Storage,in Any EnvironmentMatt Fuller, Manfred Moser, and Martin TraversoBeijingBoston Farnham SebastopolTokyo

Presto: The Definitive Guideby Matt Fuller, Manfred Moser, and Martin TraversoCopyright 2020 Matt Fuller, Martin Traverso, and Simpligility Technologies Inc. All rights reserved.Printed in the United States of America.Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions arealso available for most titles (http://oreilly.com). For more information, contact our corporate/institutionalsales department: 800-998-9938 or corporate@oreilly.com.Acquisition Editor: Jonathan HassellDevelopment Editor: Michele CroninProduction Editor: Elizabeth KellyCopyeditor: Sharon WilkeyProofreader: Piper EditorialApril 2020:Indexer: Potomac Indexing, LLCInterior Designer: David FutatoCover Designer: Karen MontgomeryIllustrator: Rebecca DemarestFirst EditionRevision History for the First Edition2020-04-03: First releaseSee http://oreilly.com/catalog/errata.csp?isbn 9781492044277 for release details.The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Presto: The Definitive Guide, the coverimage, and related trade dress are trademarks of O’Reilly Media, Inc.The views expressed in this work are those of the authors, and do not represent the publisher’s views.While the publisher and the authors have used good faith efforts to ensure that the information andinstructions contained in this work are accurate, the publisher and the authors disclaim all responsibilityfor errors or omissions, including without limitation responsibility for damages resulting from the use ofor reliance on this work. Use of the information and instructions contained in this work is at your ownrisk. If any code samples or other technology this work contains or describes is subject to open sourcelicenses or the intellectual property rights of others, it is your responsibility to ensure that your usethereof complies with such licenses and/or rights.This work is part of a collaboration between O’Reilly and Starburst. See our statement of editorial inde‐pendence.978-1-49208-403-7[LSI]

Table of ContentsForeword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiPreface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvPart I.Getting Started with Presto1. Introducing Presto. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3The Problems with Big DataPresto to the RescueDesigned for Performance and ScaleSQL-on-AnythingSeparation of Data Storage and Query Compute ResourcesPresto Use CasesOne SQL Analytics Access PointAccess Point to Data Warehouse and Source SystemsProvide SQL-Based Access to AnythingFederated QueriesSemantic Layer for a Virtual Data WarehouseData Lake Query EngineSQL Conversions and ETLBetter Insights Due to Faster Response TimesBig Data, Machine Learning, and Artificial IntelligenceOther Use CasesPresto ResourcesWebsiteDocumentationCommunity Chat3456777891010111111121212121313v

Source Code, License, and VersionContributingBook RepositoryIris Data SetFlight Data SetA Brief History of PrestoConclusion141415151616172. Installing and Configuring Presto. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Trying Presto with the Docker ContainerInstalling from Archive FileJava Virtual MachinePythonInstallationConfigurationAdding a Data SourceRunning PrestoConclusion1920202121222324243. Using Presto. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Presto Command-Line InterfaceGetting StartedPaginationHistoryAdditional DiagnosticsExecuting QueriesOutput FormatsIgnoring ErrorsPresto JDBC DriverDownloading and Registering the DriverEstablishing a Connection to PrestoPresto and ODBCClient LibrariesPresto Web UISQL with PrestoConceptsFirst ExamplesConclusionvi Table of Contents252528282829303030323235353536373740

Part II.Diving Deeper into Presto4. Presto Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Coordinator and Workers in a ClusterCoordinatorDiscovery ServiceWorkersConnector-Based ArchitectureCatalogs, Schemas, and TablesQuery Execution ModelQuery PlanningParsing and AnalysisInitial Query PlanningOptimization RulesPredicate PushdownCross Join EliminationTopNPartial AggregationsImplementation RulesLateral Join DecorrelationSemi-Join (IN) DecorrelationCost-Based OptimizerThe Cost ConceptCost of the JoinTable StatisticsFilter StatisticsTable Statistics for Partitioned TablesJoin EnumerationBroadcast Versus Distributed JoinsWorking with Table StatisticsPresto ANALYZEGathering Statistics When Writing to DiskHive ANALYZEDisplaying Table 60606162626465666768687070717172725. Production-Ready Deployment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73Configuration DetailsServer ConfigurationLoggingNode ConfigurationJVM Configuration7373757677Table of Contents vii

LauncherCluster InstallationRPM InstallationInstallation Directory StructureConfigurationUninstall PrestoInstallation in the CloudCluster Sizing ConsiderationsConclusion7779808182828283846. Connectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85ConfigurationRDBMS Connector Example PostgreSQLQuery PushdownParallelism and ConcurrencyOther RDBMS ConnectorsSecurityPresto TPC-H and TPC-DS ConnectorsHive Connector for Distributed Storage Data SourcesApache Hadoop and HiveHive ConnectorHive-Style Table FormatManaged and External TablesPartitioned DataLoading DataFile Formats and CompressionMinIO ExampleNon-Relational Data SourcesPresto JMX ConnectorBlack Hole ConnectorMemory ConnectorOther 021031041041061071071087. Advanced Connector Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109Connecting to HBase with PhoenixKey-Value Store Connector Example: AccumuloUsing the Presto Accumulo ConnectorPredicate Pushdown in AccumuloApache Cassandra ConnectorStreaming System Connector Example: KafkaDocument Store Connector Example: Elasticsearchviii Table of Contents109110113115117118120

OverviewConfiguration and UsageQuery ProcessingFull-Text SearchSummaryQuery Federation in PrestoExtract, Transform, Load and Federated QueriesConclusion1201211211221221221291298. Using SQL in Presto. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131Presto StatementsPresto System TablesCatalogsSchemasInformation SchemaTablesTable and Column PropertiesCopying an Existing TableCreating a New Table from Query ResultsModifying a TableDeleting a TableTable Limitations from ConnectorsViewsSession Information and ConfigurationData TypesCollection Data TypesTemporal Data TypesType CastingSELECT Statement BasicsWHERE ClauseGROUP BY and HAVING ClausesORDER BY and LIMIT ClausesJOIN StatementsUNION, INTERSECT, and EXCEPT ClausesGrouping OperationsWITH ClauseSubqueriesScalar SubqueryEXISTS SubqueryQuantified SubqueryDeleting Data from a 66166167167Table of Contents ix

9. Advanced SQL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169Functions and Operators IntroductionScalar Functions and OperatorsBoolean OperatorsLogical OperatorsRange Selection with the BETWEEN StatementValue Detection with IS (NOT) NULLMathematical Functions and OperatorsTrigonometric FunctionsConstant and Random FunctionsString Functions and OperatorsStrings and MapsUnicodeRegular ExpressionsUnnesting Complex Data TypesJSON FunctionsDate and Time Functions and OperatorsHistogramsAggregate FunctionsMap Aggregate FunctionsApproximate Aggregate FunctionsWindow FunctionsLambda ExpressionsGeospatial FunctionsPrepared StatementsConclusionPart 84186187187189190192193194196Presto in Real-World Uses10. Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199AuthenticationPassword and LDAP AuthenticationAuthorizationSystem Access ControlConnector Access ControlEncryptionEncrypting Presto Client-to-Coordinator CommunicationCreating Java Keystores and Java TruststoresEncrypting Communication Within the Presto ClusterCertificate Authority Versus Self-Signed CertificatesCertificate Authenticationx Table of Contents200201203204207209211214216217219

KerberosPrerequisitesKerberos Client AuthenticationCluster Internal KerberosData Source Access and Configuration for SecurityKerberos Authentication with the Hive ConnectorHive Metastore Thrift Service AuthenticationHDFS AuthenticationCluster 11. Integrating Presto with Other Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229Queries, Visualizations, and More with Apache SupersetPerformance Improvements with RubiXWorkflows with Apache AirflowEmbedded Presto Example: Amazon AthenaStarburst Enterprise PrestoOther Integration ExamplesCustom IntegrationsConclusion22923023123123523523623612. Presto in Production. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239Monitoring with the Presto Web UICluster-Level DetailsQuery ListQuery Details ViewTuning Presto SQL QueriesMemory ManagementTask ConcurrencyWorker SchedulingScheduling Splits per Task and per NodeLocal SchedulingNetwork Data ExchangeConcurrencyBuffer SizesTuning Java Virtual MachineResource GroupsResource Group DefinitionScheduling PolicySelector Rules 259260260260262264265265266Table of Contents xi

13. Real-World Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267Deployment and Runtime PlatformsCluster SizingHadoop/Hive Migration Use CaseOther Data SourcesUsers and TrafficConclusion26726827027027127214. Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275xii Table of Contents

ForewordWhat a tremendous ride it has been so far! Looking back at the time when we startedthe Presto project at Facebook in 2012, we certainly thought that we were going tocreate something useful. We always planned to have a successful open source projectand community, and we released Presto in 2013 under the Apache License.How far Presto has come since then, however, is beyond what we imagined. We areproud of the project community’s accomplishments, but, more importantly, we arevery humbled by all the positive feedback and help we have received.Presto has grown tremendously and provided a lot of value to its large community ofusers. You can find fellow Presto community members across the globe, and develop‐ers in Brazil, Canada, China, Germany, India, Israel, Japan, Poland, Singapore, theUnited States, the United Kingdom, and other countries.Launching the Presto Software Foundation in early 2019 was another major mile‐stone. The not-for-profit organization is dedicated to the advancement of the Prestoopen source distributed SQL engine. The foundation is committed to ensuring thatthe project remains open, collaborative, and independent for decades to come.Now, about one year after the launch of the foundation, we can look back at an accel‐erated rate of impressive contributions from a larger community.We are pleased that Matt, Manfred, and Martin created this book about Presto withthe help of O’Reilly. It provides a great introduction to Presto and teaches you every‐thing you need to know to start using it successfully.Enjoy the journey into the depths of Presto and the related world of business intelli‐gence, reporting, dashboard creation, data warehousing, data mining, machine learn‐ing, and beyond.xiii

Of course, make sure to dive into the additional resources and help we offer on thePresto website at https://prestosql.io, the community chat, the source repository, andbeyond.Welcome to the Presto community!— Dain Sundstrom and David PhillipsCreators of the Presto Projects and Founders of thePresto Software Foundationxiv Foreword

PrefaceAbout the BookPresto: The Definitive Guide is the first and foremost book about the Presto dis‐tributed query engine. The book is aimed at beginners and existing users of Prestoalike. Ideally, you have some understanding of databases and SQL, but if not, you candivert from reading and look things up while working your way through this book.No matter your level of expertise, we are sure that you’ll learn something new fromthis book.The first part of the book introduces you to Presto and then helps you get up andrunning quickly so you can start learning how to use it. This includes installation andfirst use of the command-line interface as well as many client- and web-based appli‐cations, such as SQL database management or dashboard and reporting tools, usingthe JDBC driver.The second part of the book advances your knowledge and includes details about thePresto architecture, cluster deployment, many connectors to data sources, and a lot ofinformation about the main power of Presto—querying any data source with SQL.The third part of the book rounds out the content with further aspects you need toknow when running and using a production Presto deployment. This includes WebUI usage, security configuration, and some discussion of real-world uses of Presto inother organizations.xv

Conventions Used in This BookThe following typographical conventions are used in this book:ItalicIndicates new terms, URLs, email addresses, filenames, and file extensions.Constant widthUsed for program listings, as well as within paragraphs to refer to program ele‐ments such as variable or function names, databases, data types, environmentvariables, statements, and keywords.Constant width boldShows commands or other text that should be typed literally by the user.Constant width italicShows text that should be replaced with user-supplied values or by values deter‐mined by context.This element signifies a tip or suggestion.This element signifies a general note.This element indicates a warning or caution.Code Examples, Permissions, and AttributionSupplemental material for the book is documented in greater detail in “Book Reposi‐tory” on page 15.If you have a technical question, or a problem using the code examples, please contactus on the community chat—see “Community Chat” on page 13—or file issues on thebook repository.xvi Preface

This book is here to help you get your job done. In general, if example code is offeredwith this book, you may use it in your programs and documentation. You do notneed to contact us for permission unless you’re reproducing a significant portion ofthe code. For example, writing a program that uses several chunks of code from thisbook does not require permission. Selling or distributing examples from O’Reillybooks does require permission. Answering a question by citing this book and quotingexample code does not require permission. Incorporating a significant amount ofexample code from this book into your product’s documentation does requirepermission.We appreciate, but generally do not require, attribution. An attribution usuallyincludes the title, author, publisher, and ISBN. For example: "Presto: The DefinitiveGuide by Matt Fuller, Manfred Moser, and Martin Traverso (O’Reilly). Copyright2020 Matt Fuller, Martin Traverso, and Simpligility Technologies Inc.,978-1-492-04427-7.If you feel your use of code examples falls outside fair use or the permission givenabove, feel free to contact us permissions@oreilly.com.O’Reilly Online LearningFor more than 40 years, O’Reilly Media has provided technol‐ogy and business training, knowledge, and insight to helpcompanies succeed.Our unique network of experts and innovators share their knowledge and expertisethrough books, articles, and our online learning platform. O’Reilly’s online learningplatform gives you on-demand access to live training courses, in-depth learningpaths, interactive coding environments, and a vast collection of text and video fromO’Reilly and 200 other publishers. For more information, visit http://oreilly.com.How to Contact UsPlease address comments and questions concerning this book to the publisher:O’Reilly Media, Inc.1005 Gravenstein Highway NorthSebastopol, CA 95472800-998-9938 (in the United States or Canada)707-829-0515 (international or local)707-829-0104 (fax)Preface xvii

We have a web page for this book, where we list errata, examples, and any additionalinformation. You can access this page at https://oreil.ly/PrestoTDG.Email bookquestions@oreilly.com to comment or ask technical questions about thisbook.to learn more about our books, courses, and news, visit http://www.oreilly.com.Find us on Facebook: http://facebook.com/oreillyFollow us on Twitter: http://twitter.com/oreillymediaWatch us on YouTube: We would like to thank everyone in the larger Presto community for using Presto,spreading the word, helping other users, contributing to the project, and even com‐mitting to the code or documentation. We are excited to be part of the communityand look forward to many shared successes in the future.A critical part of the Presto community is Starburst. We want to thank everyone atStarburst for their help and really appreciate the work, resources, stability, and sup‐port Starburst provides to the project, its customers using Presto, and the authors,who are part of the Starburst team.Specifically related to the book, we would like to thank everyone who helped us withidea, input, and reviews, including the following, probably incomplete list of people:Anu Sundarsan, Dain Sundstrom, David Phillips, Grzegorz Kokosiński, Jeffrey Breen,Jess Iandiorio, Justin Borgman, Kamil Bajda-Pawlikowski, Karol Sobczak, KevinKline, Megan Sifferlen, Neeraj Soparawala, Piotr Findeisen, Raghav Sethi, ThomasNield, Tom Nats, Will Morrison, and Wojciech Biela.In addition, the authors want to express their personal gratitude:Matt would like to thank his wife, Meghan, and his three children, Emily, Hannah,and Liam, for their patience and encouragement while Matt worked on the book. Thekids’ excitement about their dad becoming an “author” helped Matt through manylong weekends and late nights.Manfred would like to thank his wife, Yen, and his three sons, Lukas, Nikolas, andTobias, not only for putting up with the tech-mumbo-jumbo but also for genuinelysharing an interest and passion for technology, writing, learning, and teaching.Martin would like to thank his wife, Melina, and his four children, Marcos, Victoria,Joaquin, and Martina, for their support and enthusiasm over the past seven years ofworking on Presto.xviii Preface

PART IGetting Started with PrestoPresto is a SQL query engine enabling SQL access to any data source. You can usePresto to query very large data sets by horizontally scaling the query processing.In this first part, you learn about Presto and its use cases. Then you move on to get asimple Presto installation up and running. And finally, you learn about the tools youcan use to connect to Presto and query the data. You get to concentrate on a minimalsetup so you can start using Presto successfully as quickly as possible.

CHAPTER 1Introducing PrestoSo you heard of Presto and found this book. Or maybe you are just browsing this firstsection and wondering whether you should dive in. In this introductory chapter, wediscuss the problems you may be encountering with the massive growth of data cre‐ation, and the value locked away within that data. Presto is a key enabler to workingwith all the data and providing access to it with proven successful tools around Struc‐tured Query Language (SQL).The design and features of Presto enable you to get better insights, beyond thoseaccessible to you now. You can gain these insights faster, as well as get informationthat you could not get in the past because it cost too much or took too long to obtain.And for all that, you end up using fewer resources and therefore spending less of yourbudget, which you can then use to learn even more!We also point you to more resources beyond this book but, of course, we hope youjoin us here first.The Problems with Big DataEverybody is capturing more and more data from device metrics, user behaviortracking, business transactions, location data, software and system testing proceduresand workflows, and much more. The insights gained from understanding that dataand working with it can make or break the success of any initiative, or even acompany.At the same time, the diversity of storage mechanisms available for data has exploded:relational databases, NoSQL databases, document databases, key-value stores, objectstorage systems, and so on. Many of them are necessary in today’s organizations, andit is no longer possible to use just one of them. As you can see in Figure 1-1, dealingwith this can be a daunting task that feels overwhelming.3

Figure 1-1. Big data can be overwhelmingIn addition, all these different systems do not allow you to query and inspect the datawith standard tools. Different query languages and analysis tools for niche systemsare everywhere. Meanwhile, your business analysts are used to the industry standard,SQL. A myriad of powerful tools rely on SQL for analytics, dashboard creation, richreporting, and other business intelligence work.The data is distributed across various silos, and some of them can not even be queriedat the necessary performance for your analytics needs. Other systems, unlike moderncloud applications, store data in monolithic systems that cannot scale horizontally.Without these capabilities, you are narrowing the number of potential use cases andusers, and therefore the usefulness of the data.The traditional approach of creating and maintaining large, dedicated data ware‐houses has proven to be very expensive in organizations across the globe. Most often,this approach is also found to be too slow and cumbersome for many users and usagepatterns.You can see the tremendous opportunity for a system to unlock all this value.Presto to the RescuePresto is capable of solving all these problems, and of unlocking new opportunitieswith federated queries to disparate systems, parallel queries, horizontal cluster scal‐ing, and much more. You can see the Presto project logo in Figure 1-2.4 Chapter 1: Introducing Presto

Figure 1-2. Presto logoPresto is an open source, distributed SQL query engine. It was designed and writtenfrom the ground up to efficiently query data against disparate data sources of all sizes,ranging from gigabytes to petabytes. Presto breaks the false choice between havingfast analytics using an expensive commercial solution, or using a slow “free” solutionthat requires excessive hardware.Designed for Performance and ScalePresto is a tool designed to efficiently query vast amounts of data by using distributedexecution. If you have terabytes or even petabytes of data to query, you are likelyusing tools such as Apache Hive that interact with Hadoop and its Hadoop Dis‐tributed File System (HDFS). Presto is designed as an alternative to these tools tomore efficiently query that data.Analysts, who expect SQL response times from milliseconds for real-time analysis toseconds and minutes, should use Presto. Presto supports SQL, commonly used indata warehousing and analytics for analyzing data, aggregating large amounts of data,and producing reports. These workloads are often classified as online analyticalprocessing (OLAP).Even though Presto understands and can efficiently execute SQL, Presto is not a data‐base, as it does not include its own data storage system. It is not meant to be ageneral-purpose relational database that serves to replace Microsoft SQL Server, Ora‐cle Database, MySQL, or PostgreSQL. Further, Presto is not designed to handle onlinetransaction processing (OLTP). This is also true of other databases designed and opti‐mized for data warehousing or analytics, such as Teradata, Netezza, Vertica, andAmazon Redshift.Presto leverages both well-known and novel techniques for distributed query pro‐cessing. These techniques include in-memory parallel processing, pipelined execu‐tion across nodes in the cluster, a multithreaded execution model to keep all the CPUcores busy, efficient flat-memory data structures to minimize Java garbage collection,and Java bytecode generation. A detailed description of these complex Presto inter‐nals is beyond the scope of this book. For Presto users, these techniques translate intofaster insights into your data at a fraction of the cost of other solutions.Presto to the Rescue 5

SQL-on-AnythingPresto was initially designed to query data from HDFS. And it can do that very effi‐ciently, as you learn later. But that is not where it ends. On the contrary, Presto is aquery engine that can query data from object storage, relational database manage‐ment systems (RDBMSs), NoSQL databases, and other systems, as shown inFigure 1-3.Presto queries data where it lives and does not require a migration of data to a singlelocation. So Presto allows you to query data in HDFS and other distributed objectstorage systems. It allows you to query RDBMSs and other data sources. As such, itcan really query data wherever it lives and therefore be a replacement to the tradi‐tional, expensive, and heavy extract, transform, and load (ETL) processes. Or at aminimum, it can help you with them and lighten the load. So Presto is clearly not justanother SQL-on-Hadoop solution.Figure 1-3. SQL support for a variety of data sources with PrestoObject storage systems include Amazon Web Services (AWS) Simple Storage Service(S3), Microsoft Azure Blob Storage, Google Cloud Storage, and S3-compatible stor‐age such as MinIO and Ceph. Presto can query traditional RDBMSs such as Micro‐soft SQL Server, PostgreSQL, MySQL, Oracle, Teradata, and Amazon Redshift. Prestocan also query NoSQL systems such as Apache Cassandra, Apache Kafka, MongoDB,or Elasticsearch. Presto can query virtually anything and is truly a SQL-on-Anythingsystem.For users, this means that suddenly they no longer have to rely on specific query lan‐guages or tools to interact with the data in those specific systems. They can simplyleverage Presto and their existing SQL skills and their well-understood analytics,dashboarding, and reporting tools. These tools, built on top of using SQL, allowanalysis of those additional data sets, which are otherwise locked in separate systems.Users can even use Presto to query across different systems with the SQL they know.6 Chapter 1: Introducing Presto

Separation of Data Storage and Query Compute ResourcesPresto is not a database with storage; rather, it simply queries data where it lives.When using Presto, storage and compute are decoupled and can be scaled independ‐ently. Presto represents the compute layer, whereas the underlying data sources repre‐sent the storage layer.This allows Presto to scale up and down its compute resources for query processing,based on analytics

The Definitive Guide SQL at Any Scale, on Any Storage, in Any Environment Compliments of. Virtual Book Tour Starburst is hosting a virtual book tour series where attendees will: . Apache