Mastering Apache Pulsar

Transcription

MasteringApache PulsarCloud Native Event Streaming at ScaleCompliments ofJowanza Joseph

Mastering Apache PulsarEvery enterprise application creates data, including logmessages, metrics, user activity, and outgoing messages.Learning how to move these items is almost as important asthe data itself. If you’re an application architect, developer,or production engineer new to Apache Pulsar, this practicalguide shows you how to use this open source event streamingplatform to handle real-time data feeds.Jowanza Joseph, staff software engineer at Finicity, explainshow to deploy production Pulsar clusters, write reliable eventstreaming applications, and build scalable real-time datapipelines with this platform. Through detailed examples,you’ll learn Pulsar’s design principles, reliability guarantees,key APIs, and architecture details, including the replicationprotocol, the load manager, and the storage layer.This book helps you: Understand how event streaming fits in the big data ecosystem Explore Pulsar producers, consumers, and readers forwriting and reading events Build scalable data pipelines by connecting Pulsar withexternal systems Simplify event-streaming application building with PulsarFunctions Manage Pulsar to perform monitoring, tuning, andmaintenance tasks Use Pulsar’s operational measurements to secure aproduction cluster Process event streams using Flink and query event streamsusing PrestoDATAUS 69.99CAN 92.99ISBN: 978-1-492-08490-7“Knowing when andhow you want to usePulsar takes experience.Jowanza Joseph clearlyhas competence borneof it. Reading his bookshrinks the amount oftime required to buildyour own practicalPulsar applications anddeployments.”—Johnny NelsonSenior Machine Learning Engineer,@generativistJowanza Joseph is a software engineerwho leads mesh development onFinicity's Open Banking Platform. He'sused Apache Pulsar on several projectsto process billions of messages perday on fully managed messaging andstream processing platforms. Jowanzahas also worked with streaming andmessaging technologies such asApache Kafka, Akka, and Kubernetesat companies including Pluralsightfor nearly a decade. He's given talksat Strange Loop, Abstractions, OpenSource Summit, and O'Reilly's StrataData & AI Conference.Twitter: tube.com/oreillymedia

PulsarAself-managedenterpriseso chePulsarinstancesthananyteam amnative.io/en/contact

Mastering Apache PulsarCloud Native Event Streaming at ScaleJowanza JosephBeijingBoston Farnham SebastopolTokyo

Mastering Apache Pulsarby Jowanza JosephCopyright 2022 Jowanza Joseph. All rights reserved.Printed in the United States of America.Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions arealso available for most titles (http://oreilly.com). For more information, contact our corporate/institutionalsales department: 800-998-9938 or corporate@oreilly.com.Acquisitions Editor: Jessica HabermanDevelopment Editor: Angela RufinoProduction Editor: Christopher FaucherCopyeditor: Audrey DoyleProofreader: Tom SullivanDecember 2021:Indexer: Judith McConvilleInterior Designer: David FutatoCover Designer: Karen MontgomeryIllustrator: Kate DulleaFirst EditionRevision History for the First Edition2021-12-06:First ReleaseSee http://oreilly.com/catalog/errata.csp?isbn 9781492084907 for release details.The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Mastering Apache Pulsar, the coverimage, and related trade dress are trademarks of O’Reilly Media, Inc.The views expressed in this work are those of the author and do not represent the publisher’s views. Whilethe publisher and the author have used good faith efforts to ensure that the information and instructionscontained in this work are accurate, the publisher and the author disclaim all responsibility for errors oromissions, including without limitation responsibility for damages resulting from the use of or relianceon this work. Use of the information and instructions contained in this work is at your own risk. If anycode samples or other technology this work contains or describes is subject to open source licenses or theintellectual property rights of others, it is your responsibility to ensure that your use thereof complieswith such licenses and/or rights.This work is part of a collaboration between O’Reilly and StreamNative. See our statement of editorialindependence.978-1-098-11364-3[LSI]

Table of ContentsPreface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1. The Value of Real-Time Messaging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Data in MotionResource EfficiencyInteresting ApplicationsBankingMedicalSecurityInternet of ThingsSummary1455689112. Event Streams and Event Brokers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Publish/SubscribeQueuesFailure ModesPush Versus PollThe Need for 71818191921223. Pulsar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Origins of PulsarPulsar Design Principles2324v

lsar EcosystemPulsar FunctionsPulsar IOPulsar SQLPulsar Success StoriesYahoo! 364. Pulsar Internals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37BrokersMessage CacheBookKeeper and ZooKeeper CommunicationSchema ValidationInter-Broker CommunicationPulsar Functions and Pulsar IOApache BookKeeperWrite-Ahead LoggingMessage StoringObject/Blob StoragePravegaMajordodoApache ZooKeeperNaming ServiceConfiguration ManagementLeader ElectionNotification SystemApache KafkaApache DruidPulsar ProxyJava Virtual Machine (JVM)NettyApache SparkApache LuceneSummaryvi Table of 58585959

5. Consumers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61What Does It Mean to Be a Consumer?SubscriptionsExclusiveSharedKey SharedFailoverAcknowledgmentsIndividual AckCumulative AckSchemasConsumer Schema ManagementConsumption ModesBatchingChunkingAdvanced ConfigurationDelayed MessagesRetention PolicyBacklog QuotaConfiguring a ConsumerReplayDead Letter TopicsRetry Letter 9798182836. Producers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85Synchronous ProducersAsynchronous ProducersProducer RoutingRound-Robin RoutingSingle Partition RoutingCustom Partition RoutingProducer 29292929293Table of Contents vii

essionTypeSchema on WriteUsing the Schema RegistryNonpersistent TopicsUse CasesUsing Nonpersistent 017. Pulsar IO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103Pulsar IO ArchitectureRuntimePerformance ConsiderationsUse CasesSimple Event Processing PipelinesChange Data CaptureConsiderationsMessage SerializationPipeline StabilityFailure HandlingExamplesElasticsearchNettyWriting Your 81081091101101111121121148. Pulsar Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115Stream ProcessingPulsar Functions ArchitectureRuntimeIsolationIsolation with Kubernetes Function DeploymentsUse CasesCreating Pulsar FunctionsSimple Event Processingviii Table of Contents115117117118119120120121

Topic HygieneTopic AccountingSummary1221251269. Tiered Storage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127Storing Data in the CloudObject StorageUse CasesReplicationCQRSDisaster RecoveryOffloading DataPulsar OffloadersRetrieving Offloaded DataInteracting with Object Store DataRepopulating TopicsUtilizing Pulsar 4310. Pulsar SQL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145Streams as TablesSQL-on-Anything EnginesApache Flink: An Alternative PerspectivePresto/TrinoHow Pulsar SQL WorksConfiguring Pulsar SQLPerformance ConsiderationsSummary14614815015115115315615611. Deploying Pulsar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157DockerBare MetalMinimum RequirementsGetting StartedDeploying ZooKeeperStarting BookKeeperStarting PulsarPublic Cloud ProvidersAWSAzureGoogle Cloud Platform157159159160160161161163164165166Table of Contents ix

KubernetesSummary16616812. Operating Pulsar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169Apache BookKeeper MetricsServer MetricsJournal MetricsStorage MetricsApache ZooKeeper MetricsServer MetricsRequest MetricsTopic MetricsConsumer MetricsPulsar Transaction MetricsPulsar Function MetricsAdvanced Operating TechniquesInterceptors and TracingPulsar SQL MetricsMetrics 417617717717917918018218318313. The Future. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185Programming Language SupportExtension InterfaceEnhancements to Pulsar FunctionsArchitectural Simplification/ExpansionMessaging Platform BridgesSummary185186186190192195A. Pulsar Admin API. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197B. Pulsar Admin CLI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205C. Geo-Replication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209D. Security, Authentication, and Authorization in Pulsar. . . . . . . . . . . . . . . . . . . . . . . . . 215Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219x Table of Contents

PrefaceWhy I Wrote This BookThroughout my career, I’ve been tasked with learning complex systems as part of myjob. Early on I had to learn how to write MapReduce jobs and understand the intrica‐cies of the Hadoop Distributed File System (HDFS) and the Hadoop ecosystem; yearslater I learned early versions of Apache Spark. Today I’m still tasked with learningabout complex systems for my job. Over the years, well-written technical blog posts,articles, and books have been instrumental in my ability to learn and to apply what Ilearn at work. With this book, I sought to create a resource that would provide a thor‐ough explanation of the value of Apache Pulsar which could be long-lasting and fun.Along with Apache Pulsar as a technology with its trade-offs and consideration is abroader ecosystem and ideas of event streaming. This book provides a nurturingenvironment to work through the event streams paradigm and provide the readerwith context and a road map for adopting event streaming architectures.Who This Book Is ForThis book is targeted at two audiences: those who want to learn about Apache Pulsarand those who are curious about event streaming architectures. For the first audience,this book provides a thorough overview of Apache Pulsar, all of its components, andcode samples for getting started with Pulsar and its ecosystem. For the second audi‐ence, it serves as a primer for adding Apache Pulsar, Apache Kafka, or another eventstreaming technology to your architecture.How I Organized This BookI spend Chapters 1 through 3 explaining the motivation for Apache Pulsar and therise of event streams, as well as provide the reader with more supporting content.In Chapters 4 through 10, I dive deep into the internals of Pulsar, component byxi

component, to give the reader a complete understanding of how Pulsar works. I fin‐ish the book by focusing on the operational considerations of Pulsar. Chapters 11 and12 take a detailed look at deploying Pulsar and operating Pulsar in production. Thesechapters dive deeper into what Pulsar looks like when deployed on systems likeKubernetes and what metrics are available for use as an operator. In Chapter 13, Iimagine what the future of Pulsar will look like in 3–5 years, including ways theproject can expand to meet the growing needs of the community. Finally, AppendicesA through D cover topics like Admin APIs, Security, and GeoReplication. I believethat organizing the book in this way will give the reader the best experience readingthe book end to end as well as using it as a reference manual if needed.Conventions Used in This BookThe following typographical conventions are used in this book:ItalicIndicates new terms, URLs, email addresses, filenames, and file extensions.Constant widthUsed for program listings, as well as within paragraphs to refer to program ele‐ments such as variable or function names, databases, data types, environmentvariables, statements, and keywords.Constant width boldShows commands or other text that should be typed literally by the user.Constant width italicShows text that should be replaced with user-supplied values or by values deter‐mined by context.Using Code ExamplesSupplemental material (code examples, exercises, etc.) is available for download athttp://www.github.com/josep2.If you have a technical question or a problem using the code examples, please sendemail to bookquestions@oreilly.com.This book is here to help you get your job done. In general, if example code is offeredwith this book, you may use it in your programs and documentation. You do notneed to contact us for permission unless you’re reproducing a significant portion ofthe code. For example, writing a program that uses several chunks of code from thisbook does not require permission. Selling or distributing examples from O’Reillybooks does require permission. Answering a question by citing this book and quotingexample code does not require permission. Incorporating a significant amount ofxii Preface

example code from this book into your product’s documentation does requirepermission.We appreciate, but generally do not require, attribution. An attribution usuallyincludes the title, author, publisher, and ISBN. For example: “Mastering Apache Pulsarby Jowanza Joseph (O’Reilly). Copyright 2022 Jowanza Joseph, 978-1-098-11364-3.”If you feel your use of code examples falls outside fair use or the permission givenabove, feel free to contact us at permissions@oreilly.com.O’Reilly Online LearningFor more than 40 years, O’Reilly Media has provided technol‐ogy and business training, knowledge, and insight to helpcompanies succeed.Our unique network of experts and innovators share their knowledge and expertisethrough books, articles, and our online learning platform. O’Reilly’s online learningplatform gives you on-demand access to live training courses, in-depth learningpaths, interactive coding environments, and a vast collection of text and video fromO’Reilly and 200 other publishers. For more information, visit http://oreilly.com.How to Contact UsPlease address comments and questions concerning this book to the publisher:O’Reilly Media, Inc.1005 Gravenstein Highway NorthSebastopol, CA 95472800-998-9938 (in the United States or Canada)707-829-0515 (international or local)707-829-0104 (fax)We have a web page for this book, where we list errata, examples, and any additionalinformation. You can access this page at https://oreil.ly/mastering-apache-pulsar.Email bookquestions@oreilly.com to comment or ask technical questions about thisbook.For news and information about our books and courses, visit http://oreilly.com.Find us on Facebook: http://facebook.com/oreillyFollow us on Twitter: http://twitter.com/oreillymediaWatch us on YouTube: http://youtube.com/oreillymediaPreface xiii

AcknowledgmentsFirst, I want to thank the creators and maintainers of the open source Apache Pulsarproject. Their work brought the project to fruition, and without it, this book wouldnot exist. I also want to thank the editorial and content acquisition teams at O’Reilly.They challenged me to make this book as good as possible, and it would only be afraction as good without their work. I would also like to thank my wife, Bethany. Sheprovided all the illustrations for this book and supported me through the year I spentwriting it. Finally, I’d like to thank the technical editors who provided the invaluablefeedback that made this book possible.xiv Preface

CHAPTER 1The Value of Real-Time MessagingReal-time messaging systems power many of the systems we rely on today for bank‐ing, food, transportation, internet service, and communication, among others. Theyprovide the infrastructure to make many of the daily interactions we have with thesesystems seem like magic. Apache Pulsar is one of these real-time messaging systems,and throughout this book, we’ll dig into what makes it unique. But first we’ll discussthe motivation for building systems like Pulsar so that you have some context for whywe would take on the complexity of a real-time system with all of its moving parts.Data in MotionWhen I was 11 years old, I started a small business selling trading cards in my schoolcafeteria. The business model was simple: I bought packs of Pokémon cards, figuredout which ones were the most valuable through cross-checking on internet forums,and then attempted to sell them to other kids during lunch break. What started as anexciting and profitable venture soon turned into a crowded space with many otherchildren entering the market and trying to make some spending money of their own.My daily take-home profit dropped from about 1 to 25 cents, and I thought aboutquitting the business. I talked to my stepfather about it one evening over dinner,looking for advice from someone who ran a small business too (although one thatwas much more profitable than mine). After listening to me intently, he absorbedwhat I said and took a deep breath. He explained that I needed a competitive advan‐tage, something that would make me stick out in a space that was crowded with manyother kids. I asked him what kinds of things would give me a competitive advantage,and he chuckled. He told me I needed to figure it out myself, and that when I did, Ishould come back and talk to him.1

For weeks I puzzled over what I could be doing differently. Day after day I watchedother children transact in the school cafeteria, and nothing came to me immediately.One day I talked to my friend Edgar, who watched all the Pokémon card transactionsmore intently than I did. I asked him what he was looking at, and he explained that hewas keeping track of all the cards sold that day. He walked from table to table, holdinga ledger (see Figure 1-1) and recording all the transactions he witnessed. Edgar let melook through his notebook, and I saw weeks’ worth of Pokémon card transactions.That’s when it dawned on me that I could use the data he collected to augment myselling strategy and figure out where there was an unmet demand for cards! I toldEdgar to meet me after school to talk about the next steps and a business partnership.Figure 1-1. Edgar’s ledger included the price of each card sold.When school was out, Edgar and I met and came up with a game plan. I pulled out allof my cards, and we went through them and painstakingly created an inventory sheet.I cross-referenced my inventory with the sales Edgar had collected in his notebook.With this information, I felt confident we could be competitive with the other kidsand undercut them where it made sense. Thanks to our new inventory and pricingmodel, Edgar and I spent the next three weeks selling lots of cards and making somemoney. But although during that time our daily profit slowly rose from 25 cents toaround 50 cents, we still weren’t making my original profit of 1 per day, and now wehad to share the revenues, which meant we were working much harder and makingless money. We decided something had to give, and there had to be another way tomake this process easier.When Edgar and I talked about the limitations of our business, one aspect stuck out.There were only two of us, but there were five tables where kids sold Pokémon cards.Frequently, we would begin selling at the wrong table. Our cards were not the cards2 Chapter 1: The Value of Real-Time Messaging

the kids at the table wanted to buy. We would miss out on the market opportunity forthe day, and often for a week or more, while waiting for new customers. We needed away to be at all five tables at once. Furthermore, we needed a way to communicatewith each other in real time across the tables. Edgar and I schemed for a few days andcame up with the plan depicted in Figure 1-2.Figure 1-2. A diagram of our card-selling scheme. At each table, one member of ourcompany had a walkie-talkie and we communicated the prices of transactions to oneother over the walkie-talkie.We recruited three other students who were trying to break into the Pokémon cardselling market. We split the cards we wanted to sell evenly among the five of us, andeach of us went to one of the five cafeteria tables attempting to sell the cards in ourhand. Each of us also had a notebook and a walkie-talkie. When one of us overheardanother kid negotiating the sale of one of their cards, that person would communi‐cate the information to the other four in our group. We would all keep the sameledger of prices, and if someone in our group had the card of interest, the person atthat table would offer it to the buyer at a lower price. With this strategy, we couldalways undercut the competition, and all five of us had a picture of the Pokémon cardmarket for that day. Thanks to the new company strategy, our Pokémon card profitsrose from 50 cents a day to 2.50 a day. Our teachers eventually shut down the busi‐ness, and I haven’t sold Pokémon cards since.This story illustrates the value of data in motion. Before we began collecting andbroadcasting the Pokémon card sales, the data had little value. It did not have a mate‐rial impact on our ability to sell cards. Our walkie-talkies and ledgers were a simplesystem that enabled us to communicate bids and asks in real time across the entiremarket. Armed with that information, we could make informed decisions about ourData in Motion 3

inventory and sell more cards than we were able to before. While our real-time sys‐tem only enriched me by a few quarters a day, the system’s principles enable richexperiences throughout modern life.Resource EfficiencyIn my trading card business, one of the company’s significant advantages was the abil‐ity to collect data once and share it with everyone in the company. That abilityenabled us to take advantage of sales at the cafeteria tables. In other words, it gaveeach person at a table a global outlook. This global outlook decentralized the infor‐mation about sales and created redundancy in our network. Commonly, if one mem‐ber of the crew was writing and missed an update, they could ask everyone else in thecompany what their current state of affairs was and they would be able to update theiroutlook.While my trading card business was small and inconsequential in the larger schemeof things, resource efficiency can be a boon for companies of any size. When you con‐sider modern enterprise, many events happen that have downstream consequences.Consider the simple meetings that every company has. Creating a calendar meetingrequires scheduling time on multiple people’s calendars, reserving a room, setting upvideoconferencing software, and often, ordering refreshments for attendees. Withtools like Google Calendar, we can schedule a meeting with multiple people and coor‐dinate it by simply clicking a few buttons and entering some information into a form(see Figure 1-3). Once that event is created, emails are sent, calendars are tentativelybooked, pizza is ordered, and the room is reserved.Figure 1-3. With event-driven architectures, complex tasks like scheduling meetinginvites across multiple participants become much easier.Without the platforms to manage and choreograph the calendar invite, administra‐tive overhead can grow like a tumor. Administrators would have to make phone calls,collect RSVPs, and put a sticky note on the door of a conference room. Real-time4 Chapter 1: The Value of Real-Time Messaging

systems provide value in other systems we use every day, from customer relationshipmanagement (CRM) to payroll systems.Interesting ApplicationsResource efficiency is one reason to utilize a messaging system, but an enhanced userexperience may ultimately be a more compelling reason. While the software we useserves a utilitarian purpose, enhancing the user experience can make it easier to com‐plete our intended task as well as new and unintended tasks. The user experience canbe enhanced through several methods. The most notable are 1) improving the designto make interfaces easier to navigate and 2) doing more on behalf of the user. Explor‐ing the second of these methods, programs that perform on the user’s behalf canquickly and accurately take an everyday experience and turn it into something magi‐cal. Consider a program that automatically deposits money into your savings accountwhen there is a credit in your checking account. Each time a check clears, the pro‐gram uses the balance and other contexts regarding the account to deposit a certainamount of money in your savings account. Over time, you save money without everfeeling the pain of saving. Messaging systems are the backbone of systems like thisone. In this section we will explore a few examples in more detail and discuss pre‐cisely how a messaging platform enables rich user experiences.BankingBanks provide the capital that powers much of our economy. To buy a home or carand, in many cases, start a business, you will likely need to borrow money from abank. If I were to be kind, I would best describe the process of borrowing moneyfrom a financial institution as excruciating. In many cases, borrowing money requiresthat you print out your bank statements so that the bank’s loan officers can get anunderstanding of your monthly expenses. They may also use these printouts to verifyyour income and tax returns. In many cases, you may provide bank statements, paystubs, tax returns, and other documents to prequalify for a loan, and then provide thesame copies two months later to get the actual loan. While this sounds superfluous inan era of technology, the bank has good reason to be as thorough and intrusive aspossible.For a bank, lending you six figures’ worth of money comes at considerable risk. Byperforming extensive checks on your bank statements and other documents, the bankreduces the risk of approving you for a loan. Banks also face significant regulations,and without a good understanding of your credit, they may face loss of licensing forfailing to conduct due diligence. To modernize this credit approval system, we needto look at the problem through a slightly different lens.Interesting Applications 5

When a customer prequalifies for a loan, the bank agrees it will lend up to a certaindollar amount contingent on the applicant having the same creditworthiness whenthey are ready to act on the loan. Typically, a software system connected to the bankwill send notifications to the customer’s credit card companies for predeterminedevents (such as checking a customer’s credit). The bank is notified in real time if thecustomer does anything that will jeopardize the closing of the loan. Also, based on thecustomer’s behavior, the bank can update in real time how much the customer canborrow and have a clear understanding of the probability of a successful close. Thisend-to-end flow is depicted in Figure 1-4. After the initial data collection for prequa‐lification, a real-time pipeline of transactions and credit card usage is sent to the bankso that there are no surprises.Figure 1-4. Credit card usage and risk are communicated to many downstreamconsumers.This process is superior in many ways to the traditional process of completing a fullapplication at both loan preapproval and approval. It reduces the friction of closingthe loan (where the bank would make money) and puts the borrower in control. Forthe financial institution providing the real-time data, it’s just a matter of routing dataused for another purpose for the lender. The efficiency gained a

O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are . Apache Spark 58 . later I learned early versions of Apache Spark. Today I’m still tasked with