一种基于随机空间树数据流异常检测算法.docVIP

下载本文档

5
0
约1.34万字
约 24页
2018-08-13 发布于福建
举报
版权申诉

一种基于随机空间树数据流异常检测算法.doc

1、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。。
2、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。
4、该文档为VIP文档，如果想要下载，成为VIP会员后，下载免费。
5、成为VIP后，下载本文档将扣除1次下载权益。下载后，不支持退款、换文档。如有疑问请联系我们。
6、成为VIP后，您将拥有八大权益，权益包括：VIP文档下载权益、阅读免打扰、文档格式转换、高级专利检索、专属身份标志、高级客服、多端互通、版权登记。
7、VIP文档为合作方或网友上传，每下载1次，网站将根据用户上传文档的质量评分、类型等，对文档贡献者给予高额补贴、流量扶持。如果你也想贡献VIP文档。上传文档

一种基于随机空间树数据流异常检测算法

一种基于随机空间树的数据流异常检测算法　　摘要：针对现有的数据流异常检测算法的不足，提出一种基于随机空间树的数据流异常检测算法。首先，采取统计策略对数据流特征范围进行估计，分割得到多棵随机空间树（RS?Tree），形成RS森林（RS?Forest）。然后，RS?Forest采用单窗口策略对数据流进行处理，通过打分和模型更新来实现异常检测。针对实例落入的树节点，定义分段恒定密度，求取密度估计值相对于森林中所有树的平均值，并将其作为数据流中每个新来实例的得分。利用相对于森林中所有树的平均得分对每个新来实例进行排序。窗口满后则采用对偶式节点剖度技术进行模型更新，并利用采集的节点尺寸信息对下一轮到达窗口的数据进行打分。利用多种基准数据集进行仿真实验，结果表明RS?Forest算法在大部分数据集下的AUC得分和运行时间性能均优于当前其他基准算法。　　关键词：数据流；异常检测；随机空间树；单窗口策略； AUC得分；运行时间　　中图分类号： TN915.08?34； TP393 文献标识码： A 文章编号： 1004?373X（2017）19?0056?06 　　A data stream anomaly detection algorithm based on randomized space tree 　　QIN Weirong1， WANG Ning2 　　（1. School of Electronics and Information Engineering， Qinzhou University， Qinzhou 535000， China；　　2. School of Information Engineering， Zhengzhou University， Zhengzhou 450001， China）　　Abstract： Aiming at the shortcomings of the available data stream anomaly detection algorithms， a data stream anomaly detection algorithm based on randomized space tree （RS?Tree） is proposed. The statistical strategy is adopted to estimate the characteristic range of data stream， by which several randomized space trees are obtained by means of segmentation to form RS?Forest. The single window policy is used by RS?Forest to process the data stream， and realize anomaly detection by means of scoring and model updating. According to the tree node that an instance falls in， the piecewise constant density is defined to get the average value of the density estimation values relative to all the trees in forest， which is taken as the score of each new instance in data stream. The average score of all the trees relative to forest is employed to sort each new instance. When the windows are occupied， the antithetic node dissection technology is used to update the model， and the acquired node size information is used to mark the data arriving at the window in the next round. The simulation experiments were carried out with variety of benchmark datasets. Its results show that the AUC scoring and run time