- 1、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。。
- 2、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
- 4、该文档为VIP文档,如果想要下载,成为VIP会员后,下载免费。
- 5、成为VIP后,下载本文档将扣除1次下载权益。下载后,不支持退款、换文档。如有疑问请联系我们。
- 6、成为VIP后,您将拥有八大权益,权益包括:VIP文档下载权益、阅读免打扰、文档格式转换、高级专利检索、专属身份标志、高级客服、多端互通、版权登记。
- 7、VIP文档为合作方或网友上传,每下载1次, 网站将根据用户上传文档的质量评分、类型等,对文档贡献者给予高额补贴、流量扶持。如果你也想贡献VIP文档。上传文档
查看更多
A Hadoop Performance Prediction Model Based on Random Forest.doc
A Hadoop Performance Prediction Model Based on Random Forest
MapReduce is a programming model for processing large data sets, and Hadoop is the most popular open-source implementation of MapReduce. To achieve high performance, up to 190 Hadoop configuration parameters must be manually tunned. This is not only time-consuming but also error-pron. In this paper, we propose a new performance model based on random forest, a recently developed machine-learning algorithm. The model, called RFMS, is used to predict the performance of a Hadoop system according to the system’s configuration parameters. RFMS is created from 2000 distinct fine-grained performance observations with different Hadoop configurations. We test RFMS against the measured performance of representative workloads from the Hadoop Micro-benchmark suite. The results show that the prediction accuracy of RFMS achieves 95% on average and up to 99%. This new, highly accurate prediction model can be used to automatically optimize the performance of Hadoop systems.
big data; cloud computing; MapReduce; Hadoop; random forest; micro-benchmark
1 Introduction
he MapReduce programming model is widely used in big-data applications because it is simple to program and can handle large data sets. A popular open-source implementation of MapReduce is Apache Hadoop, which has been used for web indexing [1], machine learning [2], log file analysis [3], financial analysis [4], and bioinformatics research [5].
With Hadoop, a programmer needs to manually tune up to 190 parameters to ensure high system performance. However, without in-depth knowledge of the Hadoop system, the programmer may find such a task tedious and may even seriously degrade system performance. This issue has been confirmed by many researchers [6]-[9].
It is therefore desirable to automatically tune the configuration parameters. To this end, a performance prediction model based on historical observation is required. The key to improving p
您可能关注的文档
- 0~3岁宝宝动作发展简易使用手册.doc
- 100G and Beyond: Trends in Ultrahigh―Speed Communications (Part II).doc
- 100例单唾液酸四己糖神经节苷脂注射液治疗新生儿缺氧缺血性脑病(HIE)临床观察.doc
- 102例肝癌患者由综合护理改善生存质量研究.doc
- 10kV带负荷配电线路更换隔离开关时的作业方法及安全措施研究.doc
- 10kV线路单相接地故障隔离方法探讨.doc
- 10kV配电网中性点接地方式探讨.doc
- 10万年后的人类长怎样.doc
- 12年全家矢志抗癌:只为看到女儿成最美新嫁娘.doc
- 158例多胎妊娠经阴道减胎术的护理措施探讨.doc
- A Multivariate Linear Regression on the Effect of Foreign Trade on Foreign Exchange Rates of Naira Using Bootstrap Approach.doc
- A Novel Video Logging Method based on the Self-Focus Lens Array.doc
- A Research of Regional Difference in R&D Activities in GUANGDONG, PR China.doc
- A Simple Method for the Biodiesel Production by the Reuse of Different Types of Waste Frying Oils.doc
- A System for Detecting Refueling Behavior along Freight Trajectories and Recommending Refueling Alternatives.doc
- AC―LED结温与开启电压关系测量.doc
- An Efficient Dynamic Proof of Retrievability Scheme.doc
- An Evaluation of Press Coverage of Children’s and Women’s Rights in Nigeria.doc
- An Improved Color Cast Detection Method Based on an AB―Chromaticity Histogram.doc
- Analyses and Discussions of the Blackout in Indian Power Grid.doc
文档评论(0)