- 1、本文档共9页,可阅读全部内容。
- 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
- 5、该文档为VIP文档,如果想要下载,成为VIP会员后,下载免费。
- 6、成为VIP后,下载本文档将扣除1次下载权益。下载后,不支持退款、换文档。如有疑问请联系我们。
- 7、成为VIP后,您将拥有八大权益,权益包括:VIP文档下载权益、阅读免打扰、文档格式转换、高级专利检索、专属身份标志、高级客服、多端互通、版权登记。
- 8、VIP文档为合作方或网友上传,每下载1次, 网站将根据用户上传文档的质量评分、类型等,对文档贡献者给予高额补贴、流量扶持。如果你也想贡献VIP文档。上传文档
查看更多
The Hadoop Distributed Filesystem
When a dataset outgrows the storage capacity of a single physical machine, it becomes necessary to partition it across a number of separate machines. Filesystems that manage the storage across a network of machines are called distributed filesystems. Since they are network-based, all the complications of network programming kick in, thus making distributed filesystems more complex than regular disk filesystems. For example, one of the biggest challenges is making the filesystem tolerate node failure without suffering
data loss.
Hadoop comes with a distributed filesystem called HDFS, which stands for Hadoop Distributed Filesystem. (You may sometimes see references to “DFS”—informally or in older documentation or configuration—which is the same thing.) HDFS is Hadoop’s flagship filesystem and is the focus of this chapter, but Hadoop actually has a general purpose filesystem abstraction, so we’ll see along the way how Hadoop integrates with other storage systems (such as the local filesystem and Amazon S3).
The Design of HDFS
HDFS is a filesystem designed for storing very large files with streaming data access patterns, running on clusters on commodity hardware. Let’s examine this statement in more detail:
Very large files
“Very large” in this context means files that are hundreds of megabytes, gigabytes, or terabytes in size. There are Hadoop clusters running today that store petabytes of data.*
Streaming data access
HDFS is built around the idea that the most efficient data processing pattern is a write-once, read-many-times pattern. A dataset is typically generated or copied from source, then various analyses are performed on that dataset over time. Each analysis will involve a large proportion, if not all, of the dataset, so the time to read the whole dataset is more important than the latency in reading the first record.
Commodity hardware
Hadoop doesn’t require expensive, highly reliable hardware to run on. It’s designed to run
您可能关注的文档
最近下载
- 人工智能背景下高校教学模式改革研究.docx VIP
- 部编版三年级语文下册期末《非连续性文本阅读》专项精选试卷-附答案.doc VIP
- 人工智能背景下的职业教育混合教学模式构建与实施.docx VIP
- 湖北省武汉市2025届高中毕业生四月调研考试政治试题及答案(武汉四调).pdf
- Unit5大单元教学设计 人教版八年级英语下册.pdf
- 陕西财政云单位核算系统使用手册.docx
- 【营销策划】2024奥迪品牌技巧分析中英双版.pptx
- 肥胖诊疗门诊建设专家指导意见(2024年版).pdf
- 2024奥迪品牌技巧分析中英双版.pdf
- 2024 奥迪分析 : 奥迪品牌、技巧、平台等奥迪分析师日 2024.pdf
文档评论(0)