- 1、本文档共42页,可阅读全部内容。
- 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
Take a Close Look at MapReduce.ppt
Take a Close Look at MapReduce Xuanhua Shi Acknowledgement Most of the slides are from Dr. Bing Chen, /chengbin/ Some slides are from SHADI IBRAHIM, /shadi/ What is MapReduce Origin from Google, [OSDI’04] A simple programming model Functional model For large-scale data processing Exploits large set of commodity computers Executes process in distributed manner Offers high availability Motivation Lots of demands for very large scale data processing A certain common themes for these demands Lots of machines needed (scaling) Two basic operations on the input Map Reduce Distributed Grep Distributed Word Count Map+Reduce Map: Accepts input key/value pair Emits intermediate key/value pair Reduce : Accepts intermediate key/value* pair Emits output key/value pair The design and how it works Architecture overview GFS: underlying storage system Goal global view make huge files available in the face of node failures Master Node (meta server) Centralized, index all chunks on data servers Chunk server (data server) File is split into contiguous chunks, typically 16-64MB. Each chunk replicated (usually 2x or 3x). Try to keep replicas in different racks. GFS architecture Functions in the Model Map Process a key/value pair to generate intermediate key/value pairs Reduce Merge all intermediate values associated with the same key Partition By default : hash(key) mod R Well balanced Diagram (1) Diagram (2) A Simple Example Counting words in a large set of documents map(string value)? //key: document name //value: document contents for each word w in value EmitIntermediate(w, “1”); reduce(string key, iterator values)? //key: word //values: list of counts int results = 0; for each v in values result += ParseInt(v); Emit(AsString(result)); How does it work? Locality issue Master scheduling policy Asks GFS for locations of replicas of input file blocks Map tasks typically split into 64MB (== GFS block size) Map tasks scheduled so GFS input block replica are on sa
您可能关注的文档
- Multi-scale modeling of the carotid artery.ppt
- My Piece of Thought about How to Operate Writ.ppt
- N-proBNP在心衰诊断、预后、治疗的管理.ppt
- Network Driver in Linux 2.4.ppt
- Network Layer.ppt
- Neural Networks(类神经网路概论) BY 胡兴民老师.ppt
- NIC-based intrusion detectionA feasibility study.ppt
- No.8 Middle School, Anxi.ppt
- Nosocomial infections.ppt
- Objective C语言.ppt
- Teaching Oral English.ppt
- TEVAR术后I型内漏的处理.ppt
- The doctor's advice.ppt
- The eXtensible Markup Language (XML).ppt
- The Factory Pattern(工厂模式).ppt
- The First Period of Unit 1, SEFC I.ppt
- The Mean Value Theorem.ppt
- The NCNR Spin-Polarized Triple-Axis Spectrometer (SPINS).ppt
- The Relationship Between Total and Marginal Values.ppt
- The Science of Electronics Analog Devices.ppt
文档评论(0)