2 Announcements (Wed., Apr. 17) Parallel Databases And Map Reduce

Transcription

4/17/172Announcements (Wed., Apr. 17)Parallel DatabasesandMap Reduce Homework #4 due Monday, April 24, 11:55 pm No other extra credit problem Project Presentations A few project groups still have not signed up – please sign upsoonIntroduction to DatabasesCompSci 316 Spring 2017 Google Cloud code Please redeem your code asap, by May 11 Wednesday April 19 Guest Lecture by Prof. Jun Yang OLAP, Data warehousing, Data mining Included in the final34Where are we now?ü TransactionsWe learntü Basic concepts and SQLü Concurrency controlü RecoveryüRelational Model, QueryLanguages, and DatabaseDesignü SQLü RAü E/R diagramü Normalizationü XMLü DTD and XML schemaü XPath and XQueryü Relational MappingüDBMS Internalsü Storageü Indexingü Query Evaluationü External sortü Join Algorithmsü Query OptimizationParallel DBMSNext Parallel DBMSMap ReduceData Warehousing,OLAP, miningDistributed DBMSNOSQLTodayWednesdayNext MondayWhy Parallel Access To Data?56Parallel DBMS Parallelism is natural to DBMS processingAt 10 MB/s1.2 days to scan1 TBData1,000 x parallel1.5 minute to scan.1 TBDataParallelism:divide a big probleminto many smaller onesto be solved in parallel. Pipeline parallelism: many machines each doing one stepin a multi-step process. Data-partitioned parallelism: many machines doing thesame thing to different pieces of data. Both are natural in amAnySequentialProgramoutputs split N ways, inputs merge M ways1

4/17/177Some Terminology Speed-Up More resources meansproportionally less time forgiven amount of data. Teradata (1979), Tandem (1974, later acquired by HP),. Every major DBMS vendor has some parallel server Reasons for success: Bulk-processing ( partition parallelism)Natural pipeliningInexpensive hardware can do the trickUsers/app-programmers don’t need to think in parallel Scale-Up If resources increased inproportion to increase indata size, time is constant.#ops /sec.(throughput)Ideal graphs DBMSs are the most successful application ofparallelismIdeal:linear speed-up#CPUs(degree of -ism)Ideal:linear scale-upsec./op(response time)DBMS: The parallel Success Story8#CPUs size of databasedegree of -ism9Some Terminology#ops /sec.(throughput)In practiceIdeal:linear speed-up Due to overhead in parallel processing Start-up cost SkewThe slowest processor (e.g. with a hugefraction of data) may become thebottleneck#ops/sec(response time) InterferenceDifferent processors may compete for thesame resourcesArchitecture for Parallel DBMS Units: a collection of processorsActual: sub-linearspeed-up#CPUs(degree of -ism)Starting the operation on many processor,might need to distribute data10 assume always have local cache may or may not have local memory or disk (next) A communication facility to pass informationamong processorsIdeal:linear scale-up a shared bus or a switch Among different computing unitsActual: sub-linearscale-up Whether memory is shared Whether disk is shared#CPUs size of databasedegree of -ism11Shared Memorye.g. SMP Server Easy to program Expensive tobuild Lowcommunicationoverhead: sharedmemory Difficult to scaleup(memorycontention)PPlocalmemoryand diskPInterconnection NetworkGlobal Shared MemoryDDD12Shared Nothingsharedmemoryno twoCPU canaccessthe samestorage areaallcommunicationthrough anetworkconnectione.g. GreenplumInterconnection Network Hard toprogram anddesign parallelalgosPPP Cheap to buildMMM Easy toscaleup andspeedupDDD Considered tobe the bestarchitecture2

4/17/1713Shared Disk14Different Types of DBMS Parallelisme.g. ORACLE RAC Intra-operator parallelism get all machines working to compute a givenoperation (scan, sort, join) OLAP (decision support)localmemory Trade-off but stillinterference likeshared-memory(contention ofmemory and nwbandwidth)MPMMP𝝲 each operator may run concurrently on adifferent site (exploits pipelining) For both OLAP and OLTP (transactiond)P Inter-query parallelism different queries run on different sites For OLTP 𝝲 𝝲 We’ll focus on intra-operator parallelismDD Inter-operator parallelismInterconnection NetworkD𝝲Ack:Slide by Prof. Dan Suciushared diskData Partitioning15Horizontally Partitioning a table (why horizontal?):Range-partition Hash-partitionBlock-partitionor Round Robin16Example R(Key, A, B) Can Block-partition be skewed? no, uniformA.E F.J K.N O.S T.Z Good for equijoins,range queries, group-by Can lead to data skewA.E F.J K.N O.S T.Z Good for equijoins But only if hashedon that attribute Can lead to dataskewShared disk and memory less sensitive to partitioning,Shared nothing benefits from "good" partitioningA.E F.J K.N O.S T.Z Can Hash-partition be skewed? Send i-th tuple toi-mod-n processor Good to spreadload Good when theentire relation isaccessedParallelizing SequentialEvaluation Code “Streams” from different disks or the output ofother operators are “merged” as needed as input to some operator are “split” as needed for subsequent parallel processing Different Split and merge operations appear inaddition to relational operators No fixed formula for conversion Next: parallelizing individual operations on the key: uniform with a good hash function on A: may be skewed, e.g. when all tuples have the same A-value1718Parallel Scans Scan in parallel, and merge. Selection may not require all sites for range or hashpartitioning but may lead to skew Suppose sA 10R and partitioned according to A Then all tuples in the same partition/processor Indexes can be built at each partition3

4/17/1719Parallel Sorting20Parallel Joins Need to send the tuples that will join to the samemachine also for GROUP-BYIdea: Scan in parallel, and range-partition as you go e.g. salary between 10 to 210, #processors 20 salary in first processor: 10-20, second: 21-30, third: 31-40, . Nested loop: Each outer tuple must be compared with each innertuple that might join Easy for range partitioning on join cols, hard otherwiseAs tuples come in, begin “local” sorting on eachResulting data is sorted, and range-partitionedVisit the processors in order to get a full sorted orderProblem: skew!Solution: “sample” the data at start to determine partitionpoints. Sort-Merge: Sorting gives range-partitioning Merging partitioned tables is local21Phase 1Parallel Hash JoinOriginal Relations(R then S).Disk22Dataflow Network for parallel JoinINPUThashfunctionhPartitionsOUTPUT1212B-1B-1B main memory buffersDisk In first phase, partitions get distributed todifferent sites: A good hash function automatically distributes workevenly Good use of split/merge makes it easier to buildparallel versions of sequential join code. Do second phase at each site. Almost always the winner for equi-join23Parallel Aggregates For each aggregate function, need a decomposition: count(S) S count(s(i)), ditto for sum() avg(S) (S sum(s(i))) / S count(s(i)) and so on. Sub-aggregate groups close to the source. Pass each sub-aggregate to its group’s site. Chosen via a hash fn.CountWhich SQL aggregate operators are notgood for parallel execution?CountCount Why? Trivial counter-example: Table partitioned with local secondary index attwo nodes Range query: all of node 1 and 1% of node 2. Node 1 should do a scan of its partition. Node 2 should use secondary index. For group-by:Count24Best serial plan may not be best CountCountTableScanIndexScanA.MN.ZA TableJim Gray & Gordon Bell: VLDB 95 Parallel Database Systems SurveyA.EF.JK.NO.ST.Z4

4/17/1725Big DataMap ReduceSlides by Junghoon KangWhere does Google use MapReduce?InputMapReduceOutputit cannot be storedin one machineit cannot be processed inone machinestore the data setson multiple machinesparallelize computationon multiple machinese.g. Google File SystemSOSP 2003e.g. MapReduceOSDI 2004What is MapReduce? crawled documents web request logsIt is a programming modelthat processes large data by:apply a function to each logical record in the input (map)categorize and combine the intermediate resultsinto summary values (reduce) inverted indices graph structure of web documents summaries of the number of pagescrawled per host the set of most frequent queries in aday27Google’s MapReduce is inspired bymap and reduce functionsin functional programminglanguages.28For example,in Scala functionalprogramming language,scala val lst List(1,2,3,4,5)scala lst.map( x x 1 )res0: List[Int] List(2,3,4,5,6)5

4/17/17For example,in Scala functionalprogramming language,Ok, it makes sensein one machine.Then, how does Google extendthe functional ideato multiple machinesin order toprocess large data?scala val lst List(1,2,3,4,5)scala lst.reduce( (a, b) a b )res0: Int 15Understanding MapReduce(by example)I am a class presidentAn English teacher asks you:“Could you count the number of occurrences ofeach word in this book?”3334Let’s divide the workload amongclassmates.mapThis image cannot currently be displayed.Um Ok.35366

4/17/17And let few combine theintermediate results.reduceA:6B: 2 .G: 12A:5B:7 .G:2Why did MapReducebecome so popular?R:6 .Z: 2H:8 .Q: 1I will collectfrom A GA: 11B: 9 .G:14H QR Z37Is it because Google uses it?38Distributed ComputationBefore MapReduceThings to consider: how to divide the workload among multiple machines? how to distribute data and program to other machines? how to schedule tasks? what happens if a task fails while running? and and .3940MapReduce lowered theknowledge barrierin distributed computation.Distributed ComputationAfter MapReduceThings to consider: how to write Map function? how to write Reduce function?Developers neededbefore MapReduce41Developers neededafter MapReduce427

4/17/174344Map-Reduce StepsInputkey-value pairsMapShufflesort by keyMap-Reduce StepsReduceoutputlistsMapInputkey-value pairssame keyShufflesort by keyReduceoutputlistssame key Input is typically (key, value) pairs but could be objects of any type1.2. Map and Reduce are performed by a number of processes physically located in some processors3. 4.Read DataMap – extract some info ofinterest in (key, value) formShuffle and sortReduce–operate on the values of the samekeye.g. transform, aggregate,summarize, filter–send same keys to the samereduce process5.Output the results (key, final-result)4546Simple Example: Map-ReduceMap FunctionMapInputkey-value pairsShufflesort by keyReduceoutputlistssame key Word counting Inverted indexes Each map process works on a chunk of dataInput: (input-key, value)Output: (intermediate-key, value) -- may not be the same as input key valueExample: list all doc ids containing a word output of map (word, docid) – emits each such pair word is key, docid is value duplicate elimination can be done at the reduce phaseAck:Slide by Prof. Shivnath Babu47Key Players in MapReduceReduce FunctionInputkey-value pairsMapShufflesort by keyReduceoutputlistssame key A Map-Reduce “Job” e.g. count the words in all docscomplex queries can have multiple MR jobs Map or Reduce “Tasks” A group of map or reduce “functions”scheduled on a single “worker” Worker Input: (intermediate-key, list-of-values-for-this-key) – list can include duplicates each map process can leave its output in the local disk, reduce process can retrieveits portion Output: (output-key, final-value) Example: list all doc ids containing a word output will be a list of (word, [doc-id1, doc-id5, .]) if the count is needed, reduce counts #docs, output will be a list of (word, count) a process that executes one task at a time one per processor, so 4-8 per machine Follows whatever the Master asks to do One Master coordinates many workers. assigns a task to each worker488

4/17/17How does MapReducehandle machine failures?Fault ToleranceWorker FailureAlthough the probability of a machine failure is low, The master sends heartbeat to each worker node.the probability of a machine failing among thousands of If a worker node fails, the master reschedules the tasksmachines is common.handled by the worker.Master Failure The whole MapReduce job gets restarted through adifferent master.49505152Example problem: Parallel DBMSR(a,b) is horizontally partitioned across N 3 machines.Each machine locally stores approximately 1/N of the tuples in R.ExamplesThe tuples are randomly organized across machines (i.e., R is blockpartitioned across machines).Show a RA plan for this query and how it will be executed across the N 3 machines.Pick an efficient plan that leverages the parallelism as much as possible. SELECT a, max(b) as topbFROM RWHERE a 0GROUP BY ain class9

1,000 x parallel 1.5 minute to scan. Parallelism: divide a big problem into many smaller ones to be solved in parallel. 5 1 TB Data 1 TB Data Parallel DBMS Parallelism is natural to DBMS processing Pipeline parallelism: many machines each doing one step in a multi-step process. Data-partitioned parallelism: many machines doing the