web日志中加权序列模式挖掘研究计算机应用技术专业论文.docxVIP

  • 4
  • 0
  • 约7.39万字
  • 约 68页
  • 2019-01-30 发布于上海
  • 举报

web日志中加权序列模式挖掘研究计算机应用技术专业论文.docx

web日志中加权序列模式挖掘研究计算机应用技术专业论文

硕士论文 硕士论文 嗽日志中二J缫㈦蚪岑二嬲㈣ 摘 要 序列模式挖掘是在指定的序列数据集中发现满足最小支持度的所有频繁序列,目前 在电子商务领域中获得了广泛的应用。传统序列模式挖掘算法存在两个缺点:第一,对 所有序列和所有项同等对待,而实际上序列和项具有不同的重要性;第二,传统序列模 式挖掘算法在面临大数据集时,仍产生大量的候选序列;传统算法只能通过提高最小支 持度来减少产生的候选序列,而没有提供其他途径。 Web日志中序列模式挖掘过程包括数据集收集、数据集预处理、发掘模式和分析模 式,挖掘过程中每个步骤结果作为下一步骤的输入数据,所以每个步骤均会影响到最终 的模式结果。本文在研究分析SPAM算法和WARM算法的基础上,结合两者的优点提 出了WSPAM算法。WSPAM算法有两个主要特点:第一,WSPAM算法引入权值来挖 掘出少量且重要的序列模式,针对引入权值时向下闭包属性失效问题,使用WARM算 法的加权支持度来解决;第二,WSPAM算法采用11PV Set(事务位置向量集)结构表 示序列,改善了SPAM占用内存大的缺点。最后本文实现了挖掘原型系统,分别采用模 拟数据集和真实日志数据集对WSP舢订算法和SPAM算法进行了测试,实验表明, WSPAM适用于要求内存效率高的场景,SPAM适用于要求时间效率高的场景。 关键词:序列模式挖掘,加权支持度,事务位置向量集,WSPAM算法 硕士论文 硕士论文 『、】■■■誓l■甏女-%0 Abstract Sequential pattern mining refers to mille the set of all frequent sequences that satisfy a minimum support constraint in a sequence dataset and has been widely applied in electronic business.Traditional sequential pattern mining algorithms have two disadvantages.Firstly, previous algorithms treat sequences and items uniformly while they have different importance in objective world.Secondly,previous algorithms still generate an exponentially large numbers of candidate sequences when mining large and high density sequential datasets. There is no alternative ways but increasing minilnuln support to reduce the number of candidate sequences. The process of mining sequential patterns from web log dataset includes several steps: web log dataset collection,dataset preprocessing,pattern mining and pattern analysis,the result of each step is used as input data of its next step,SO every step will affect the final discovered patterns.Based on SPAM which is the fastest sequential pattern mining algorithm, this paper proposes a weighted sequential pattem mining algorithm called WSPAM.WSPAM has two different.aspects鼢SPAM:firstly,our algorithm mainly introduces weight constraint into sequential pattern mining,making mining process focus on those fewer but significant sequential patterns including items with larger weights.And 01117 algorithm adopts weighted

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档