数据挖掘概要.ppt

下载文档 降价啦

4
0
约6.47千字
约 31页
2017-02-10 发布于湖北
举报
版权申诉
保障服务

数据挖掘概要.ppt

1、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。。
2、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

数据挖掘概要

Any Questions? * Why we want to make this review? There are some motivations Today’s internet evolves more and more complicated. many applications use techniques for avoiding detections, such as (random ports, encrypted data transmission, proprietary communications, etc) On the other side, there are many classification works in research area. Usually, researchers collect data and do an analysis. But we do not find any systematic results Therefore we try to fill the gap * Flows * Here is how k-Nearest Neighbors method works. The k-NN is a method for classifying objects based on the closest k training instances in the n-dimensional feature space. Suppose we have two application classes on two dimensional feature space, Say, Feature X and Y. Feature X could be the size of the first packet in an application flow, And the Feature Y could be, the size of the 2nd packet. At first, K-NN just memorizes the location of all the training instances in the two dimensional feature space, based on their feature values, like this. And then, when a new training instance arrives, it just assigns the testing instance To the majority class of its k-nearest training instances. In this way, this algorithm just considers the inherent euclidean closeness or farness between feature values of given training and testing instances, nothing else, Thus often used to measure discriminative power of features. * * * Data Mining: Concepts and Techniques 智能信息处理32学时费高雷 fgl@ 电子科技大学通信与信息工程学院教师信息费高雷电话邮箱：fgl@ 地址：科研楼B325 研究方向：网络层析成像反演理论与方法复杂多维信息处理提纲引言数据挖掘概念及必要性数据挖掘的主要任务案列分析 Teaching Material 数据挖掘：概念与技术（原书第3版） [美]Jiawei Han?，等?著范明?，孟小峰?译机械工业出版社特点：大量的图解、实例和练习参考：第2版 Reference Books 考核方式成绩构成：期末70%、期中5%、平时25% （平时＝实验15% + 考勤10%）考试方式：期中随堂开卷、期末开卷实验成绩：结果分析50%、报告50% 课堂讲述、课后完成 Content 1、引言 2、认识数据 3、数据预处理 4、数据仓库与联机分析处理（自学） 5、数据立方体技术（自学） 6、挖掘关联规则（重点） 7、高级模式挖掘（自学） 8、分类：基本概念（重点） 9、分类：高级方法（自学） 10、聚类分析：基本概念和方法（重点） 11、高级聚类分析（自学） 12、离群点检测 13、智能信息处理技术前沿数据挖掘技术不断进步