Take a Close Look at MapReduce.ppt

  1. 1、本文档共42页,可阅读全部内容。
  2. 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
  3. 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载
  4. 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
Take a Close Look at MapReduce.ppt

Take a Close Look at MapReduce Xuanhua Shi Acknowledgement Most of the slides are from Dr. Bing Chen, /chengbin/ Some slides are from SHADI IBRAHIM, /shadi/ What is MapReduce Origin from Google, [OSDI’04] A simple programming model Functional model For large-scale data processing Exploits large set of commodity computers Executes process in distributed manner Offers high availability Motivation Lots of demands for very large scale data processing A certain common themes for these demands Lots of machines needed (scaling) Two basic operations on the input Map Reduce Distributed Grep Distributed Word Count Map+Reduce Map: Accepts input key/value pair Emits intermediate key/value pair Reduce : Accepts intermediate key/value* pair Emits output key/value pair The design and how it works Architecture overview GFS: underlying storage system Goal global view make huge files available in the face of node failures Master Node (meta server) Centralized, index all chunks on data servers Chunk server (data server) File is split into contiguous chunks, typically 16-64MB. Each chunk replicated (usually 2x or 3x). Try to keep replicas in different racks. GFS architecture Functions in the Model Map Process a key/value pair to generate intermediate key/value pairs Reduce Merge all intermediate values associated with the same key Partition By default : hash(key) mod R Well balanced Diagram (1) Diagram (2) A Simple Example Counting words in a large set of documents map(string value)? //key: document name //value: document contents for each word w in value EmitIntermediate(w, “1”); reduce(string key, iterator values)? //key: word //values: list of counts int results = 0; for each v in values result += ParseInt(v); Emit(AsString(result)); How does it work? Locality issue Master scheduling policy Asks GFS for locations of replicas of input file blocks Map tasks typically split into 64MB (== GFS block size) Map tasks scheduled so GFS input block replica are on sa

文档评论(0)

gtez + 关注
实名认证
内容提供者

该用户很懒,什么也没介绍

1亿VIP精品文档

相关文档