MapReduce-Simplified-Data-Processing-on-Large-Clusters课件教学课程.pptVIP

  • 1
  • 0
  • 约7.23千字
  • 约 41页
  • 2023-10-29 发布于河北
  • 举报

MapReduce-Simplified-Data-Processing-on-Large-Clusters课件教学课程.ppt

MapReduce: Simplified Data Processing on Large ClustersThese are slides from Dan Weld’s class at U. Washington(who in turn made his slides based on those by Jeff Dean, Sanjay Ghemawat, Google, Inc.) MotivationLarge-Scale Data ProcessingWant to use 1000s of CPUsBut don’t want hassle of managing thingsMapReduce providesAutomatic parallelization distributionFault toleranceI/O schedulingMonitoring status updates Map/ReduceMap/Reduce Programming model from Lisp (and other functional languages)Many problems can be phrased this wayEasy to distribute across nodesNice retry/failure semantics Map in Lisp (Scheme)(map f list [list2 list3 …])(map square ‘(1 2 3 4))(1 4 9 16)(reduce + ‘(1 4 9 16))(+ 16 (+ 9 (+ 4 1) ) )30(reduce + (map square (map – l1 l2))))Unary operatorBinary operator Map/Reduce ala Googlemap(key, val) is run on each item in setemits new-key / new-val pairsreduce(key, vals) is run for each unique key emitted by map()emits final output count words in docsInput consists of (url, contents) pairsmap(key=url, val=contents):For each word w in contents, emit (w, “1”)reduce(key=word, values=uniq_counts):Sum all “1”s in values listEmit result “(word, sum)” Count, Illustratedmap(key=url, val=contents):For each word w in contents, emit (w, “1”)reduce(key=word, values=uniq_counts):Sum all “1”s in values listEmit result “(word, sum)”see bob throwsee spot runsee 1bob 1 run 1see 1spot 1throw 1bob 1 run 1see 2spot 1throw 1 GrepInput consists of (url+offset, single line)map(key=url+offset, val=line):If contents matches regexp, emit (line, “1”)reduce(key=line, values=uniq_counts):Don’t do anything; just emit line Reverse Web-Link GraphMapFor each URL linking to target, …Output target, source pairs ReduceConcatenate list of all source URLsOutputs: target, list (source) pairs Inverted IndexMapReduce Example uses: distributed grep?distributed sort ?web link-graph reversal term-vector / hostweb access log stats inverted index construction document clustering machine learning

文档评论(0)

1亿VIP精品文档

相关文档