MapReduce - Cleveland State University

Transcription

MapReduceSUNNIE S CHUNGCIS 612

MapReduce Challenges in distributed processing/analysis Extremely large data sets (petabytes of data) Data is distributed across many systems(potentially thousands) Individual systems are not aware of all data Data is too large to be held or processed by anysingular systemHow to distribute and process a task that worksover petabytes of data? How to parallelize the computation How to distribute the data How to handle failures2

Word Count over a Given Set ofWeb Pagessee bob throwsee spot n we do word count in parallel?11211

MapReduce MapReduce was introduced as a programmingmodel by Google in 2004* Spreads the task of processing data out overmany systems Key-Value based system Elements of divide and conquer Makes use of the concept of hashing Used for extremely parallel dataprocessing/analysis Highly scalable Failure tolerant4* MapReduce: Simplified Data Processing on Large Clusters, OSDI’04: Sixth Symposium on Operating SystemDesign and Implementation, San Francisco, CA, December, 2004

MapReduce How does MapReduce work? Threeprimary steps are used to run a MapReduce job Map Shuffle Reduce Datais read in a parallel fashion across manydifferent nodes in a cluster (Map) TheGroups are identified for processing the input data,then outputdata is then shuffled into these groups (Shuffle) All data with a common group identifier (Key) is thensent to a single location Eachgroup is then processed atomically across thenodes in parallel. (Reduce) Each node will receive one or more groups to process Each group is then processed and the results of theunion of each group is the result of the entire job.5

MapReduce Example 1:Map Function for Word CountMap Function Input: key/value pair Output: Intermediate key/value pairmap(String key, String value):// Key: Document Name// Value: Document Contentsfor each word w in valuesEmitIntermediate(w, “1”)6

MapReduce:Programming Model7Declaration of IndependenceWhen in the Course of human events itbecomes necessary for one people todissolve the political bands which haveconnected them with another and toassume among the powers of the earth,the separate and equal station to whichthe Laws of Nature and of Nature's Godentitle them, a decent respect to theopinions of mankind requires that theyshould declare the causes which impelthem to the separation.map(String key, String value):// Key: Document Name// Value: Document Contentsfor each word w in valuesEmitIntermediate(w, “1”)

8MapReduce Example 1:Reduce Function for Word CountReduce Function Input: intermediate key/value pair Output: resultsreduce(String key, String value):// Key: word// Value: a list of countsint result 0;for each v in values:result parseInt(v);Emit(AsString(result)’

MapReduceProgramming ModelOutput: word, count Pairs9

MapReduce Example 2:Word Count Reads in files from the HDFS Returns the counts of all words located in the files Map:Mapper (line number, line contents)for each word in line contentsemit(word, 1) Reduce:Reducer (word, values[])sum 0for each value in valuessum sum valueemit(word, sum)10

Word-CountingMasterAssign tasksWorker Map Instance“You jump,I jump.”Worker Map InstanceAssign tasks(You, 1)(jump, 1)Worker Reduce InstanceYou, 1I, 1(I, 1)(jump, 1)Both jump.Input file fromDistributed FileSystem (DFS),e.g. GFSWorker Map Instance(Both, 1)(jump, 1)Intermediateresult stored onMapper’s localdiskWorker Reduce InstanceReducerpulls thedataBoth, 1jump, 3Final outputwritten to DFS11

Another Example Count of URL Access Frequency Map Input: Log of Web Page Requests Output: URL,1 Intermediate PairReduce Input: Intermediate Pairs Output: URL, Total Count Pair12

MapReduce A MapReduce (MR) program consists of generallytwo user defined stages. Map: Map(Keyp, Valuep) list(Keyi, Valuei) Takes in a data element Outputs Key-Value pairsReduce: Reduce(Keyi, list(Valuei)) list(Keys, Values) Takes in a Key and a collection of Values Outputs results13

Map Phase Process Map Phase:Many different Map Tasks will be created across the different nodes inthe distributed cluster Eachtask will receive a block of the data to beprocessed Foreach input element in that block, it will make acall to Map(Keyp, Valuep) Foreach call of the Map(Keyp, Valuep) , it will emit alist of Key-Value pairs derived from the input elementin the form list(Keyi, Valuei) All of the list(Keyi, Valuei) from the different Map Tasks and callsto Map will then be shuffled across the network to the differentReduce Tasks Thisshuffling produces the Keyi, list(Valuei) datacollection that are the input to the Reduce Phase14

MapReduce Process Flow15Diagram of a MapReduce Job FlowInput Split from HDFSS05S01S04S07S02S03S06S09S13S12S08S10S11Map PhaseMap Node 1Input: Input SplitsOutput: Key, Value PairsShuffle PhaseData is transferred overthe network and sortedinto groups defined byKeyReduce PhaseOutput to HDFSReduce Node 1Input: Key, Value[] Output: Output ValuesF08Map Node 2Input: Input SplitsOutput: Key, Value PairsReduce Node 1Input: Key, Value[] Output: Output ValuesF05Map Node 3Input: Input SplitsOutput: Key, Value PairsReduce Node 1Input: Key, Value[] Output: Output ValuesF06Map Node 4Input: Input SplitsOutput: Key, Value PairsReduce Node 1Input: Key, Value[] Output: Output ValuesF09F04F07F03F01F02

Reduce Phase Process Reduce Phase: The Reduce phase of MapReduce takes in a groupdefined bya unique key: (Keyi) and a list of values: (list(Valuei)) Many different reduce tasks will be createdacross the different nodes in the distributed cluster Each task will make a call toReduce(Keyi, list(Valuei)) for every reduce group that itreceives Each call will process that reduce group atomically16

The MapReduce Framework(by Google)

Overall MapReduce Word CountProcess18

Automatic Parallel Execution inMapReduce (Google)Handles failures automatically, e.g., restarts tasks if anode fails; runs multiples copies of the same task toavoid a slow task slowing down the whole job

MapReduce: Word CountNode 1Node 220Node 3Data in HDFSInput File 1Input File 2Input File 3This is line oneMapInput:This is line oneMapInput:But this is linetwoMapInput:And I am linethreeOutput: ”this”, 1 ”is”, 1 ”line”, 1 ”one”, 1 Output: ”but”, 1 ”this”, 1 ”is”, 1 ”line”, 1 ”two”, 1 Output: ”and”, 1 ”i”,1 ”am”, 1 ”line”, 1 ”three”, 1 But this is line twoAnd I am line threeNode 1Node 2MapNode 3MapMapReduceShuffleReduceReduceWrite output toHDFSReduceInput: ”this”, [1, 1] ”but”, [1] ”am”, [1] ”two”, [1] Output: ”this”, 2 ”but”, 1 ”am”, 1 ”two”, 1 ReduceReduceInput: ”is”, [1, 1] ”one”, [1] ”and”, [1] Input: ”line”, [1, 1, 1] ”i”, [1] ”three”, [1] Output: ”is”, 2 ”one”, 1 ”and”, 1 Output: ”line”, 3 ”i”, 1 ”three”, 1

MapReduce in Hadoop (1)

MapReduce in Hadoop (2)

MapReduce in Hadoop (3)

Data Flow in a MapReduceProgram in Hadoop InputFormat Map function Partitioner Sorting & Merging Combiner Shuffling Merging Reduce function OutputFormat 1:many

Execution Overview26

Step 1User Program invokes theMap Reduce Library-Splits the input file in Mpieces-Starts up copies of theprogram on a cluster ofmachines27

Step 2Master / Workers-Designate one Master-Designate M Map Workers-Designate R Reduce Workers28

Step 3Execute the Map Function-The map workers read thecontents of the Input Split andgenerate intermediate-The intermediate key,value pairs are buffered into memory.29

Step 4Buffer / Partition / Notify-Periodically, the buffered pairsare written to the local disk.-The information is partitionedinto R regions-The location of these partitions issent to the Master30

Step 5Execute the ReduceFunction-RPC is used to read thePartitioned pairs from their locallocations-Once all data is read, it must besorted31

Step 6Iterate and Reduce-The Reduce Workeriterates over theIntermediate Results.-Unique Keys are passedinto the user definedReduce Function.-The output is appended toa final output file for thisreduce partition32

Step 7Provide Results-Wake up the user program-Return back to user code-Generally these output files willbe passed into some otherapplication versus combiningthe results into a single file.33

The Master Stores the state and identity of allworkers Central control for passing theintermediate file regions from themap tasks to the reduce tasks.34

Fault Tolerance Monitoring Hadoop MapReduce is fault tolerant withhardware/ networking failure, and input/programerror. Slave nodes send heartbeat messages to themaster node periodically. Master considers a node is dead by absence ofheartbeat message No further requests are sent to dead nodes.35

Fault Tolerance Monitoring Three things to recover DataBlock on failed machine[by DFS] Mapwork/result on failed machine[by MapReduce] Reducework/result on failed machine[by MapReduce]36

Data Block Recovery A data block need to be recovered (addingduplicated copy) when A slave node is down A hard disk of a slave is down A data block is corrupted Replication factor is increased (byadministrator)Recall each block has multiple copies on differentnodesAs long as some copy is intact, master node canissue commands of creating new copies of theblock on available nodes to meet the requirednumber of copies.37

Map work recovery If a node is down, all the map work assigned to itwill be executed by other available nodes Finished map work on a failed node also needto be re-executed because the map output isstored locally on the failed node and henceunreachable. If a map work instance hangs, it will be tried againon other available node, the not respondinginstance will be killed. Reducer only needs to load the re-executed Mapoutput from other machines. No further impact on reducer because realreduce work will not start until all maps finish.38

Reduce Work Recovery If a node fails, its unfinished reduce work will beassigned to other available nodes. No impact on maps, i.e. maps need not be reexecuted for failed reducers Because map output is still hold on local hard disk.New reducers only need to pull the output againFinished reduce work on a failed node does notneed to be re-executed because final output hasbeen written into DFS.39

Skipping Bad Records An option in MapReduce of skipping badrecords which cause crashes. Sometimes ignoring some bad records isacceptable. E.g., for statistical analysis onlarge data.A signal is sent to the Master indicating acrash and on which record (line) ithappens.If more than one signals on the samerecord are received, the master can tellnext running node to skip that bad record 40

Fault Tolerance Data replication across nodes. If one machine is dead, other machines canstill get access to the data As long as data is accessible, any machinecan do the job – “push computation to data” –preferably the one which holds a copyStrict Staging Reduce can not start until all Maps finish(though copy and sorting could start inadvance) Chained MR programs do not overlap –Depended job must finish first before adepending job starts Effect: Failure at one stage only lead to reexecution at this stage41

MapReduce MapReduce was introduced as a programming model by Google in 2004* Spreads the task of processing data out over many systems Key-Value based system Elements of divide and conquer Makes use of the concept of hashing Used for extremely parallel data processing/analysis Highly scalable Failure tolerant 4 * MapReduce: Simplified Data Processing on Large Clusters, OSDI'04: Sixth .