基于大数据的hadoop并行计算优化处理性能研究-research on hadoop parallel computing optimization processing performance based on big data.docxVIP

  • 33
  • 0
  • 约5.57万字
  • 约 70页
  • 2018-05-18 发布于上海
  • 举报

基于大数据的hadoop并行计算优化处理性能研究-research on hadoop parallel computing optimization processing performance based on big data.docx

基于大数据的hadoop并行计算优化处理性能研究-research on hadoop parallel computing optimization processing performance based on big data

AbstractWith the development and popularization of new generation mobile communication, Internet of Things, and Cloud Computing, data traffic shows explosive growth with increasingly large pressure on data processing. By virtue of its powerful data processing capability, Hadoop MapReduce programming framework has become more mature solutions in the field of text analysis, natural language processing and business data processing.It can meet the data processing bottle-neck of communicating system. But the lack of cost-based optimization of parameters in MapReduce frameworks becomes a major limiting factor as MapReduce usage grows beyond large Web companies to new applications. About 13 of all 200 parameters have major effects on the cluster’s performance. Around the above problems, we design a new parameters configuration analysis system based on the Hadoop tunning in this thesis. Every single task will have the optimized parameters to improve the performance.In this thesis, based on the framework of MapReduce, we propose three new components: Profiler, Judge-Engine and Cost-based Optimizer. The Profiler is designed to collect detailed statistical information from unmodified MapReduce programs; The Judge-Engine works for the fine-grained cost estimation; The Cost-based Optimizer provide the best and simplified parameters based on the ouput of other two components.Through the comparisions with optimized parameters and default parameters in MapReduce’stypical applications: text analysis, natural language processing and business data processing.We have proved the the effectiveness of each component through a comprehensive evaluation from representative MapReduce application domains. The result shows that with help of theses three new components, the new optimization model makes Hadoop parameters’ optimization much easier. Keywords: Hadoop, Performance Optimization, Parameters, MapReduceV目录专用术语注释表1第一章 绪论21.1 课题研究背景21.2 国内外研究现状31.3 本文的主要贡献及组织结构4第二章 Hadoop相关技

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档