第三章_数据预处理概要
Data Preprocessing Chapter 3: Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept hierarchy generation Summary Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data noisy: containing errors or outliers inconsistent: containing discrepancies in codes or names No quality data, no quality mining results! Quality decisions must be based on quality data Data warehouse needs consistent integration of quality data Major Tasks in Data Preprocessing Data cleaning Fill in missing values, smooth noisy data, identify or remove outliers, and resolve inconsistencies Data integration Integration of multiple databases, data cubes, or files Data transformation Normalization and aggregation Data reduction Obtains reduced representation in volume but produces the same or similar analytical results Data discretization Part of data reduction but with particular importance, especially for numerical data Forms of data preprocessing Chapter 3: Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept hierarchy generation Summary Data Cleaning Data cleaning tasks Fill in missing values Identify outliers and smooth out noisy data Correct inconsistent data Missing Data Data is not always available E.g., many tuples have no recorded value for several attributes, such as customer income in sales data Missing data may be due to equipment malfunction inconsistent with other recorded data and thus deleted data not entered due to misunderstanding certain data may not be considered important at the time of entry not register history or changes of the data Missing data may need to be inferred. How to Handle Missing Data? Ignore the tuple: usually done when class label is missing (assuming the tasks in classification—not eff
您可能关注的文档
最近下载
- 厦门市同安区事业单位招聘考试题目及答案2025.docx VIP
- 公示A646-0059宗地光明新区观光站综合体项目pdf - 重庆市环境保护.PDF
- 草坪学 全套课件.ppt VIP
- 物理-河南普通高中青桐鸣大联考2024-2025学年2025届高三年级上学期1月期末考试试题和答案.docx VIP
- 弱电工程入侵报警系统(含紧急求助)设计方案全.docx VIP
- 《化学催化催化剂》课件.ppt VIP
- 金相检验二级试题.pdf VIP
- 未遂事故管理制度.docx VIP
- 安徽医科大学2021年春季学期护理专业《健康评估》期末考试试卷.docx VIP
- NB_T 20436-2017压水堆核电厂水化学控制.pdf
原创力文档

文档评论(0)