- 14
- 0
- 约2.85万字
- 约 73页
- 2018-01-25 发布于浙江
- 举报
Chapter 2 Data Preprocessing 数据挖掘:概念与技术--PPT 英文版
* Data Mining: Concepts and Techniques * Discretization and Concept Hierarchy Discretization Reduce the number of values for a given continuous attribute by dividing the range of the attribute into intervals Interval labels can then be used to replace actual data values Supervised vs. unsupervised Split (top-down) vs. merge (bottom-up) Discretization can be performed recursively on an attribute Concept hierarchy formation Recursively reduce the data by collecting and replacing low level concepts (such as numeric values for age) by higher level concepts (such as young, middle-aged, or senior) * Data Mining: Concepts and Techniques * Discretization and Concept Hierarchy Generation for Numeric Data Typical methods: All the methods can be applied recursively Binning (covered above) Top-down split, unsupervised, Clustering analysis (covered above) Either top-down split or bottom-up merge, unsupervised Entropy-based discretization: supervised, top-down split Interval merging by ?2 Analysis: unsupervised, bottom-up merge Segmentation by natural partitioning: top-down split, unsupervised * Data Mining: Concepts and Techniques * Entropy-Based Discretization Given a set of samples S, if S is partitioned into two intervals S1 and S2 using boundary T, the information gain after partitioning is Entropy is calculated based on class distribution of the samples in the set. Given m classes, the entropy of S1 is where pi is the probability of class i in S1 The boundary that minimizes the entropy function over all possible boundaries is selected as a binary discretization The process is recursively applied to partitions obtained until some stopping criterion is met Such a boundary may reduce data size and improve classification accuracy * Data Mining: Concepts and Techniques * Interval Merge by ?2 Analysis Merging-based (bottom-up) vs. splitting-based methods Merge: Find the best neighboring intervals and merge them to form larger intervals recursively ChiMerge [Kerber AAAI 1992
您可能关注的文档
- ch11 决策支持系统 管理信息系统课程课件.ppt
- ch11 Current Liabilities 财务会计英文版课件.ppt
- ch11 完全竞争市场中的企业 西方经济学PPT课件(国际经济法专业课).ppt
- ch11 数据共享和成员特性 C++教学课件.ppt
- CH11 战略变革 公司战略管理 教学课件.ppt
- ch11 数据库管理与安全 Access数据库应用基础教程(第三版) 教学课件.ppt
- CH11 面向对象设计与UML 经典软件工程PPT 教学课件.ppt
- CH11-CH12-CH13 行政单位会计概述、资产、负债 非盈利单位会计 教学课件.ppt
- ch11货币政策 金融学(货币银行学)教学课件.ppt
- ch12 - GUI设计基础 Java语言程序设计基础篇课件.ppt
- 安徽省华师联盟2025-2026学年高三上学期1月质量检测生物试卷+答案.doc
- 安徽省华师联盟2025-2026学年高三上学期1月质量检测语文试卷+答案.doc
- 四川省绵阳南山中学实验学校2025-2026学年高三上学期1月月考数学含答案.doc
- 2026届辽宁省大连市高三上学期双基考试物理试卷+答案.doc
- 辽宁名校联盟2026年1月高三上期末联考质量检测化学含答案.doc
- 辽宁名校联盟2026年1月高三上期末联考质量检测生物含答案.doc
- 辽宁名校联盟2026年1月高三上期末联考质量检测英语含答案.doc
- 辽宁名校联盟2026年1月高三上期末联考质量检测政治含答案.doc
- 黑龙江省龙江教育联盟2026年1月高三上学期期末考试化学含答案.doc
- 黑龙江省龙江教育联盟2026年1月高三上学期期末考试生物含答案.doc
原创力文档

文档评论(0)