培乐园-海量数据之架构和处理6.pdfVIP

  • 6
  • 0
  • 约2.64千字
  • 约 10页
  • 2017-06-04 发布于河南
  • 举报
培乐园-海量数据之架构和处理6

5. Technology 5. Technology • Hardware • Data structure • Algorithm • Distribution Cloud 5. Technology: remember 5. Technology: computing Platform Communication Scheme Data size Platform Communication Scheme Data size PPllaattffoorrmm CCoommmmuunniiccaattiioonn SScchheemmee DDaattaa ssiizzee Peer-to-Peer TCP/IP Petabytes Virtual Clusters MapReduce / MPI Peta,Tera HPC Clusters MPI / MapReduce Terabytes Multicore Multithreading Gigabytes GPU CUDA Gigabytes FPGA HDL Gigabytes 5. Technology: storage • Change: – Tape is Dead – Disk is Tape – Flash is Disk – RAM Locality is King • Distributed: – Distributed DB – Distributed Memory System – DFS 5. Technology: network • 1000Mb Ethernet • 1Gb Ethernet • 10Gb Ethernet as the backbone network • Network Switch? 5. Technology: more • Hadoop Stack • NoSQL NewSQL • MPI, Spark, Mesos • HadoopDB, Storm, S4, Kafka, R on Hadoop • FLASH SSD, Memory, GPU, 参考 • GFS / MapReduce / Bigtable • Hadoop / Hive • Google, Facebook, Amazon, ….. • Data warehouse, Machine learning, …. • …… …… • 很多示意图/架构图来源于学术/交流/互联网,未指明,抱歉 • Thanks ☺ 问题 How to process: • How to process: HHooww ttoo pprroocceessss:: 100 Billion Web pages – 100 Billion Web pages 110000 BBiilllliioonn WWeebb ppaaggeess • Extracting Features

文档评论(0)

1亿VIP精品文档

相关文档