基于Hadoop的网络日志挖掘资料的设计Word文档.doc

下载文档 降价啦

7
0
约1.01万字
约 14页
2018-02-26 发布于山东
举报
版权申诉
保障服务

基于Hadoop的网络日志挖掘资料的设计Word文档.doc

1、本文档共14页，可阅读全部内容。
2、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。
3、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

基于Hadoop的网络日志挖掘方案的设计关键字：基于,hadoop,网络,网络日志,挖掘,方案,设计基于Hadoop的网络日志挖掘方案的设计本文为Word文档，感谢你的关注！　　摘要：提出一种挖掘指数级别网络日志数据的解决思路，设计了一个高可靠的网络日志数据挖掘方案。针对现有的公开网络日志数据集，在数据预处理阶段实现了基于MapReduce的过滤算法，并且挖掘出支持企业决策的服务信息。对该方案搭建的平台进行优化操作，性能提升了3.26%，最后对方案的高可靠性、日志文件个数对平台I/O速度的影响、平台和单机在查询性能上的对比等方面做了实验。结果表明：该设计方案不仅可靠，而且随着日志文件个数的翻倍增加，读操作耗时平均增加52.58%，写操作耗时平均增加79.69%。随着日志量的增加，单机的查询耗时急剧增长，而平台的查询耗时趋于稳定。随着机器节点的增加，运算耗时以平均8.87%的速度减少。　　关键词：网络日志；数据挖掘；数据清洗； Hadoop； MySQL 　　 TN711?34； TP391.9 A 1004?373X（2017）09?0115?06 　　Abstract： A thought of mining the Web log data with exponent level is put forward. A high reliability Web log data mining scheme was designed. Aiming at the available public Web log dataset， the filtering algorithm based on MapReduce was implemented in the data preprocessing stage to mine the service information supporting the enterprise decision. The platform established with this scheme is optimized， and its performance is increased by 3.26%. The effect of the scheme′s high reliability and log file quantity on the I/O speed of the platform， and the comparison of the platform with the single machine in the aspect of query performance were tested. The results show that the designed scheme is reliable， double increased with the increase of the log file quantity， the time cost of the read operation is increased by 52.58% averagely， and the time cost of the write operation is increased by 79.69%. With the increase of the log quantity， the query time cost of the single machine is increased rapidly， and the query time cost of the platform is stable. With the increase of the machine nodes， the computational time cost is decreased by 8.87% averagely. 　　Keywords： Web log； data mining； data filtering； Hadoop； MySQL 　　0 引言　　随着信息爆炸时代的到来，在日常生活中每天都会产生指数级的数据，特别是网络日志，这就必然带来一系列的问题。一方面数据存储量过于庞大而且存储资源有限，另一方面传统的计算方式使得计算过程周期过长，计算资源得不到合理分配。随着Apache公司Hadoop的诞生，利用低价的集群实现了强大的计算和海量存储，并且在网络日志挖掘这一领域出现了一些应用研究[1?2]。本文的一个目的就是研究Hadoop框架及其生态系统成员，搭建出一种基于Hadoop生态系统成员Zookeeper的高可靠