- 0
- 0
- 约5.93千字
- 约 10页
- 2025-09-02 发布于四川
- 举报
MapReduce课件
01MapReduceoverview02DiscussionQuestions03MapReduceOutline
Motivation200+processors200+terabytedatabase1010totalclockcycles0.1secondresponsetime5¢averageadvertisingrevenueFrom:/~bryant/presentations/DISC-FCRC07.ppt
Motivation:LargeScaleDataProcessingWanttoprocesslotsofdata(1TB)Wanttoparallelizeacrosshundreds/thousandsofCPUs…WanttomakethiseasyGoogleEarthuses70.5TB:70TBfortherawimageryand500GBfortheindexdata.
21AutomaticparallelizationdistributionCleanabstractionforprogrammersFault-tolerantProvidesstatusandmonitoringtools43MapReduce
BorrowsfromfunctionalprogrammingUsersimplementinterfaceoftwofunctions:map(in_key,in_value)-(out_key,intermediate_value)listreduce(out_key,intermediate_valuelist)-out_valuelistProgrammingModel
mapRecordsfromthedatasource(linesoutoffiles,rowsofadatabase,etc)arefedintothemapfunctionaskey*valuepairs:e.g.,(filename,line).map()producesoneormoreintermediatevaluesalongwithanoutputkeyfromtheinput.
Afterthemapphaseisover,alltheintermediatevaluesforagivenoutputkeyarecombinedtogetherintoalist12(inpractice,usuallyonlyonefinalvalueperkey)3reduce()combinesthoseintermediatevaluesintooneormorefinalvaluesforthatsameoutputkeyreduce
Parallelismmap()functionsruninparallel,creatingdifferentintermediatevaluesfromdifferentinputdatasetsreduce()functionsalsoruninparallel,eachworkingonadifferentoutputkeyAllvaluesareprocessedindependentlyBottleneck:reducephasecan’tstartuntilmapphaseiscompletelyfinished.
Example:Countwordoccurrencesmap(Stringinput_key,Stringinput_value)://input_key:documentname//input_value:documentcontentsforeachwordwininput_value:EmitIntermediate(w,1);reduce(Stringoutput_key,Iteratorintermediate_values)://output_key:aword//output_values:alistofcountsintresult=0;foreachvin
原创力文档

文档评论(0)