ppt课件-ppt-thestanforduniversityinfolab.pptVIP

  • 1
  • 0
  • 约6.58千字
  • 约 28页
  • 2017-01-18 发布于湖南
  • 举报
ppt课件-ppt-thestanforduniversityinfolab

CS 345A Data Mining MapReduce Single-node architecture Commodity Clusters Web data sets can be very large Tens to hundreds of terabytes Cannot mine on a single server (why?) Standard architecture emerging: Cluster of commodity Linux nodes Gigabit ethernet interconnect How to organize computations on this architecture? Mask issues such as hardware failure Cluster Architecture Stable storage First order problem: if nodes can fail, how can we store data persistently? Answer: Distributed File System Provides global file namespace Google GFS; Hadoop HDFS; Kosmix KFS Typical usage pattern Huge

文档评论(0)

1亿VIP精品文档

相关文档