Spark Ecosystem Internals
陈 超 @CrazyJvm
开发者最佳实践日@北京3W咖啡
Show of Hands
How familiar are you with Spark?
A. Heard of it, but havent used it before.
B. Kicked the res with some basics.
C. Worked or working on a proof-of-concept deployment.
D. Worked or working on a producon deployment.
outline
• basis internals
• ecosystem
Current Major Release
• released Spark 1.2
Spark : What Why
• Apache Spark is a fast and general engine for
large-scale data processing.
• Speed
• Ease of Use
• Generality
• Integrated with Hadoop
Hadoop Data Sharing
Spark Data Sharing
DAG in-memory
Why Spark Fast?
• Memory based computaon
• DAG
• Thread Model
• Opmizaon(e.g. delay scheduling)
BDAS
one stack to rule them all
Key Concept-RDD
• A list of parons
• A funcon for compung each split
• A list of dependencies on other RDDs
• Oponally, a Paroner for key-value RDDs
• Oponally, a list of preferred locaons to
compute each split on
Immutable!!!
Key Concept-Lineage
unroll paron safely when caching
Key Concept-Dependency
Key Concept-ClusterManager
• Local
• Standalone
• Yarn
• Mesos
Cluster Overview
Schedule
Executor
Shuffle
Sort-based shuffle supported
Shuffle
• Pull-based (not push-based)
• Write intermediate files to disk
• Build hash map within each paron
• Can spill across keys
• A single key-value pair must fit in memory
Beer Metrics System
• Previously: only collect aer task completed
• Now : report when task
您可能关注的文档
最近下载
- 美邦服饰存货管理问题分析.docx VIP
- 私募基金投资意向协议.docx VIP
- 电气工程综合实验.doc VIP
- 树立和践行正确政绩观PPT.pptx VIP
- 统编版小学语文三年级上册第六单元 祖国山河 大单元整体学历案教案 教学设计附作业设计(基于新课标教学评一致性).docx VIP
- 建筑施工与环保.pptx VIP
- QBD-CB-UMD-202106150012 曙光DS600 G30系列磁盘阵列用户手册V1.4.pdf VIP
- 汽车式起重机安全技术规程.doc VIP
- 房屋买卖合同书范本下载(2024版).docx VIP
- 超星尔雅《人工智能与科学之美》满分章节测试答案.docx VIP
原创力文档

文档评论(0)