基于Spark平台的GPU调度研究.pptxVIP

  • 0
  • 0
  • 约8.56千字
  • 约 40页
  • 2019-01-10 发布于未知
  • 举报
基于Spark平台的GPU调度研究苏鹏飞ICT-HPCMapReduceDiskInputDiskOutputMapReduceMap 现有编程模型(MapReduce)是一种非循环数据流抽象模型,不适用于需要重复使用数据集的应用:迭代计算(机器学习)交互式数据挖掘工具(R , Excel, Python)分布式内存抽象支持数据延迟查询支持比Map和Reduce更多的函数提供交互式的Scala shell和Python shell保留MapReduce优点容错数据局部性可扩展性数据存储资源管理计算Spark client(app master/driver)Spark workerprogramRDD graphClustermanagerTask threadssc = new SparkContextf = sc.textFile(“…”)f.filter(…) .count()...SchedulerBlock trackerBlock managerShuffle trackerHDFS, HBase, …全称: Resilient Distributed Datasets容错的、只读的、分布式数据集允许用户指定数据的存储级别(内存/外设)提供丰富的并行操作ReduceCollectCount…操作类型变换(Transformation)不计算,仅返回新RDD行动(Action)进行计算,返回新值给driver程序一个例子:日志挖掘Cache 1Base RDDTransformed RDDlines = spark.textFile(“hdfs://...”)errors = lines.filter(_.startsWith(“ERROR”))messages = errors.map(_.split(‘\t’)(2))cachedMsgs = messages.cache()resultsWorkertasksBlock 1Cached RDDDriverParallel operationcachedMsgs.filter(_.contains(“foo”)).countCache 2cachedMsgs.filter(_.contains(“bar”)).count. . .对Wikipedia 全文搜索耗时1s (而对on-disk 数据的搜索耗时为20s)Cache 3WorkerBlock 2WorkerBlock 3好处?依赖关系划分窄依赖RDD的每个分区最多被Child RDD的一个分区使用宽依赖RDD的一个分区被Child RDD的多个分区使用存储级别Storage LevelMeaningMEMORY_ONLYStore RDD as deserialized Java objects in the JVM. If the RDD does not fit in memory, some partitions will not be cached and will be recomputed on the fly each time theyre needed. This is the default level.MEMORY_AND_DISKStore RDD as deserialized Java objects in the JVM. If the RDD does not fit in memory, store the partitions that dont fit on disk, and read them from there when theyre needed.MEMORY_ONLY_SERStore RDD as?serialized?Java objects (one byte array per partition). This is generally more space-efficient than deserialized objects, especially when using a?fast serializer, but more CPU-intensive to read.MEMORY_AND_DISK_SERSimilar to MEMORY_ONLY_SER, but spill partitions that dont fit in memory to disk instead of recomputing them on the fly each time theyre needed.DISK_ONLYStore the RDD partitions only on disk.MEMORY_ONLY_2, MEMORY_AND_DISK_2

文档评论(0)

1亿VIP精品文档

相关文档