- 1
- 0
- 约7.23千字
- 约 41页
- 2023-10-29 发布于河北
- 举报
MapReduce:Simplified Data Processing on Large ClustersThese are slides from Dan Weld’s class at U. Washington(who in turn made his slides based on those by Jeff Dean, Sanjay Ghemawat, Google, Inc.)
MotivationLarge-Scale Data ProcessingWant to use 1000s of CPUsBut don’t want hassle of managing thingsMapReduce providesAutomatic parallelization distributionFault toleranceI/O schedulingMonitoring status updates
Map/ReduceMap/Reduce Programming model from Lisp (and other functional languages)Many problems can be phrased this wayEasy to distribute across nodesNice retry/failure semantics
Map in Lisp (Scheme)(map f list [list2 list3 …])(map square ‘(1 2 3 4))(1 4 9 16)(reduce + ‘(1 4 9 16))(+ 16 (+ 9 (+ 4 1) ) )30(reduce + (map square (map – l1 l2))))Unary operatorBinary operator
Map/Reduce ala Googlemap(key, val) is run on each item in setemits new-key / new-val pairsreduce(key, vals) is run for each unique key emitted by map()emits final output
count words in docsInput consists of (url, contents) pairsmap(key=url, val=contents):For each word w in contents, emit (w, “1”)reduce(key=word, values=uniq_counts):Sum all “1”s in values listEmit result “(word, sum)”
Count, Illustratedmap(key=url, val=contents):For each word w in contents, emit (w, “1”)reduce(key=word, values=uniq_counts):Sum all “1”s in values listEmit result “(word, sum)”see bob throwsee spot runsee 1bob 1 run 1see 1spot 1throw 1bob 1 run 1see 2spot 1throw 1
GrepInput consists of (url+offset, single line)map(key=url+offset, val=line):If contents matches regexp, emit (line, “1”)reduce(key=line, values=uniq_counts):Don’t do anything; just emit line
Reverse Web-Link GraphMapFor each URL linking to target, …Output target, source pairs ReduceConcatenate list of all source URLsOutputs: target, list (source) pairs
Inverted IndexMapReduce
Example uses: distributed grep?distributed sort ?web link-graph reversal term-vector / hostweb access log stats inverted index construction document clustering machine learning
您可能关注的文档
- science+and+technology英文培训课件.ppt
- S7-1200工作原理课件.pptx
- S7-1200位指令课件教学.pptx
- S7-1200定位课件讲解.pptx
- QC工具的运用课件.ppt
- python基础培训课件.ppt
- PS颜色模式基础知识课件.ppt
- ps教程课件教学.ppt
- 对比VS图课件教学.ppt
- Pompeii英文教学课件.pptx
- 北京北大方正软件职业技术学院《国际贸易地理》2025-2026学年期末试卷.doc
- 九年级英语考点过关默写.docx
- 节约粮食,践行光盘行动从理念到实践的全民行动.pptx
- 北京北大方正软件职业技术学院《国际贸易实务》2025-2026学年期末试卷.doc
- 餐饮服务食品安全规范.docx
- 北京北大方正软件职业技术学院《国际贸易实务英文版》2025-2026学年期末试卷.doc
- 2025年能源管理系统操作流程指南.docx
- 北京北大方正软件职业技术学院《国际结算》2025-2026学年期末试卷.doc
- 2025年智能交通系统运营维护指南.docx
- 北京北大方正软件职业技术学院《国际结算实务》2025-2026学年期末试卷.doc
最近下载
- 2025年天津市春季高考英语真题卷含答案解析.docx VIP
- JB∕T 5088.2-2018 内燃机 旋装式机油滤清器 第2部分:试验方法.docx VIP
- 女性体态改造课件.pptx VIP
- 2026 年山东省高职(专科)单独招生文化素质考试模拟样卷.docx VIP
- 第六章扩散(材料科学基础).ppt
- 女性塑形课件.pptx VIP
- 最新平行线的性质判定专项练习40题.docx VIP
- JB∕T 5088.1-2018 内燃机 旋装式机油滤清器 第1部分:技术条件.docx VIP
- 南京铁道职业技术学院单招职业技能题库及答案.docx VIP
- 2026年河北唐山辅警考试题库附答案.docx VIP
原创力文档

文档评论(0)