- 1、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。。
- 2、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
GOOGLE集群计算之Hadoop技术概要
Google Cluster Computing Faculty Training Workshop Module V: Hadoop Technical Review Overview Hadoop Technical Walkthrough HDFS Databases Using Hadoop in an Academic Environment Performance tips and other tools You Say, “tomato…” Some MapReduce Terminology Job – A “full program” - an execution of a Mapper and Reducer across a data set Task – An execution of a Mapper or a Reducer on a slice of data a.k.a. Task-In-Progress (TIP) Task Attempt – A particular instance of an attempt to execute a task on a machine Terminology Example Running “Word Count” across 20 files is one job 20 files to be mapped imply 20 map tasks + some number of reduce tasks At least 20 map task attempts will be performed… more if a machine crashes, etc. Task Attempts A particular task will be attempted at least once, possibly more times if it crashes If the same input causes crashes over and over, that input will eventually be abandoned Multiple attempts at one task may occur in parallel with speculative execution turned on Task ID from TaskInProgress is not a unique identifier; don’t use it that way MapReduce: High Level Node-to-Node Communication Hadoop uses its own RPC protocol All communication begins in slave nodes Prevents circular-wait deadlock Slaves periodically poll for “status” message Classes must provide explicit serialization Nodes, Trackers, Tasks Master node runs JobTracker instance, which accepts Job requests from clients TaskTracker instances run on slave nodes TaskTracker forks separate Java process for task instances Job Distribution MapReduce programs are contained in a Java “jar” file + an XML file containing serialized program configuration options Running a MapReduce job places these files into the HDFS and notifies TaskTrackers where to retrieve the relevant program code … Where’s the data distribution? Data Distribution Implicit in design of MapReduce! All mappers are equivalent; so map whatever data is local to a particular node in HDFS If lots of data does happen
您可能关注的文档
- Cobol基础.doc
- crm分析.doc
- crm成功的十大秘诀.doc
- cti简介.ppt
- CTR:融媒时代营销密码新解.pptx
- DBA团队管理.pdf.pdf
- DCCI-2012年移动应用生态数据分享-DCCI.pdf
- DCCI_互联网微化学反应7度观察报告.pdf
- Digital IQ Index :China 2012.pdf
- DS150218A-101_W20系列规格说明书(Wi-Fi).pdf
- GrowingIO:创业公司如何利用社群进行用户增长(2016-6-7).pptx
- GWI:2015全球移动商务报告.pptx
- haccp与iso9000质量管理体系的比较.pdf
- haccp体系的审核.pdf
- haccp在蛋黄派生产质量控制中的应用.pdf
- Hay(合益)集团2012最受赞赏公司调研结果_Most_Admired_Companies_Study_Chinese_Part2.pdf
- HCR:中国智能投顾市场发展趋势研究报告.pptx
- Henkuai-2016Q1微信公众号数据洞察季度报告(2016年4月).pptx
- Henkuai-2016Q1自媒体微信公众号数据洞察报告(2016年5月).pptx
- Henkuai-2016年Q1美业微信公众号数据洞察报告.pptx
最近下载
- 2025呼和浩特粮油收储有限公司招聘18名工作人员笔试备考题库及答案解析.docx VIP
- 一种含电极的智能指环、灌胶治具及其封装工艺.pdf VIP
- The Wonderful Wizard of Oz-绿野仙踪(带动画) 课件.pdf VIP
- 2025至2030中国老年照护行业市场发展分析及竞争格局与投资发展报告.docx
- 2010年考研英语真题及解析.pdf VIP
- 浙江省强基联盟2024-2025学年高一上学期10月联考生物试卷.docx VIP
- 第一章 应急管理导论-2.ppt VIP
- 大连理工工程力学课件0.pdf VIP
- 大行距造林中杨树营养面积与大行距经济效益的研究.docx VIP
- 2025学宪法讲宪法知识竞赛题库及参考答案.pptx
文档评论(0)