时间序列挖掘聚类(PPT 61页).pptVIP

下载本文档

17
0
约1.08万字
约 61页
2019-01-25 发布于天津
举报
版权申诉

时间序列挖掘聚类(PPT 61页).ppt

1、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。。
2、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。
4、该文档为VIP文档，如果想要下载，成为VIP会员后，下载免费。
5、成为VIP后，下载本文档将扣除1次下载权益。下载后，不支持退款、换文档。如有疑问请联系我们。
6、成为VIP后，您将拥有八大权益，权益包括：VIP文档下载权益、阅读免打扰、文档格式转换、高级专利检索、专属身份标志、高级客服、多端互通、版权登记。
7、VIP文档为合作方或网友上传，每下载1次，网站将根据用户上传文档的质量评分、类型等，对文档贡献者给予高额补贴、流量扶持。如果你也想贡献VIP文档。上传文档

时间序列挖掘聚类(PPT 61页)

股票数据聚类：数据的归一化针对股票数据间的股价差距大的问题，采用归一化处理，归一化处理主要解决比较数据间量纲不统一的问题，在对股票进行聚类分析中，股票的相似性集中于股价变化趋势的相似性，而非股价之间的相似性，所以采用以下公式对数据进行归一化处理。股票数据聚类：聚类结果运行层次聚类算法时初始设定聚类簇数为4个，同时设定时间弯折窗口w为3。股票数据聚类：聚类结果运行层次聚类算法时初始设定聚类簇数为4个，同时设定时间弯折窗口w为3。股票数据聚类：聚类结果运行层次聚类算法时初始设定聚类簇数为4个，同时设定时间弯折窗口w为3。股票数据聚类：聚类结果运行层次聚类算法时初始设定聚类簇数为4个，同时设定时间弯折窗口w为3。基于SAX表示的聚类 Hierarchical Clustering Compute pairwise distance, merge similar clusters bottom-up Compared with Euclidean, IMPACTS, and SDA 基于SAX表示的距离 PAA distance lower-bounds the Euclidean Distance 0 20 40 60 80 100 120 - 1.5 - 1 - 0.5 0 0.5 1 1.5 C Q 0 20 40 60 80 100 120 - 1.5 - 1 - 0.5 0 0.5 1 1.5 C Q = baabccbc C ? = babcacca Q ? Euclidean Distance dist() can be implemented using a table lookup. Hierarchical Clustering We can objectively state that SAX is superior, since it correctly assigns each class to its own subtree. 数据类别事先已知：decreasing trend, upward shift and normal classes Clustering Hierarchical Clustering Compute pairwise distance, merge similar clusters bottom-up Compared with Euclidean, IMPACTS, and SDA Partitional Clustering K-means Optimize the objective function by minimizing the sum of squared intra-cluster errors Compared with Raw data 比层次聚类具有更好的可伸缩性 Partitional (K-means) Clustering Working with an approximation of the data gives better results than working with the original data. It has been shown that initializing the clusters centers on a low dimension approximation of the data can improve the quality, this is what clustering with SAX implicitly does. A comparison of the k-means clustering algorithm using SAX and the raw data. The dataset was Space Shuttle telemetry, 1,000 subsequences of length 512. Surprisingly, working with the symbolic approximation produces better results than working with the original data 动态时间序列聚类 Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data 动态时间序列聚类所谓流数据，是指按照一定的时间顺序，以较快的速度连续到达的数据序列，也称为动态时间序列