- 1、本文档共49页,可阅读全部内容。
- 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
Data Mining and Scalability数据挖掘和可扩展性.ppt
Data Mining and Scalability Lauren Massa-Lochridge Nikolay Kojuharov Hoa Nguyen Quoc Le Outline Data Mining Overview Scalability Challenges Approaches. Overview – Association rules. Case study - BIRCH – An Efficient Data Clustering Method for VLDB. Case Study – Scientific Data Mining. QA DATA MINING Data Mining: Rationale Data size Data in databases is estimated to double every year. Number of people who look at the data stays constant Complexity The analysis is complex. The characteristics and relationships are often unexpected and unintuitive. Knowledge discovery tools and algorithms are needed to make sense and use of data Data Mining: Rationale (cont’d) As of 2003, France Telecom has largest decision-support DB, ~30 TB; ATT was 2nd with 26 TB database. Some of the largest databases on the Web, as of 2003, include Alexa () internet archive: 7 years of data, 500 TB Internet Archive (),~ 300 TB Google, over 4 Billion pages, many, many TB Applications Business – analyze inventory, predict customer acceptance, etc. Science – find correlation between genes and diseases, pollution and global warming, etc. Government – uncover terrorist networks, predict flu pandemic, etc. Data Mining: Definition Semi-automatic discovery of patterns, changes, anomalies, rules, and statistically significant structures and events in data. Nontrivial extraction of implicit, previously unknown, and potentially useful information from data Data mining is often done on targeted, preprocessed, transformed data. Targeted: data fusion, sampling. Preprocessed: Noise removal, feature selection, normalization. Transformed: Dimension reduction. Data Mining: Evolution Data Mining: Approaches Clustering - identify natural groupings within the data. Classification - learn a function to map a data item into one of several predefined classes. Summarization – describe groups, summary statistics, etc. Association – identify data items that occur frequently together. Prediction – predict values or
您可能关注的文档
- Learn – Serve – Achieve Service-Learning As a Tool for Dropout 学习–服务–实现服务学习作为一种工具,辍学.ppt
- 市场营销-广告【企业营销策划经典】.ppt
- 创新设计2011届高考物理一轮复习=第2讲 两类动力学问题超重和失重_【企业创新】.ppt
- 销售培训_数据备份基础.ppt
- FLUKE805震动诊断仪.ppt
- Simulation of Erosion-Accumulation processes in River Basins在流域侵蚀积累过程的模拟.ppt
- Socrates Grundtvig ELIMINATE POVERTY MEETING - Netcall36hu苏格拉底Grundtvig消除贫困会议netcall36hu.ppt
- 第一章 住宅小区景观设计概述 (NXPowerLite).ppt
- 冀教版4.3角和角的度量.ppt
- 2012届高考复习语文课件(人教版山西用)第2部分第1章第1节 理解常见文言实词在文中的含义.ppt
文档评论(0)