【计算机】CHAP3_DATA_EXPLORATION.pptVIP

  • 1
  • 0
  • 约1.96万字
  • 约 56页
  • 2018-02-26 发布于江苏
  • 举报
【计算机】CHAP3_DATA_EXPLORATION

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002 (C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002 Data Mining: Exploring Data What is data exploration? Key motivations of data exploration include Helping to select the right tool for preprocessing or analysis Making use of humans’ abilities to recognize patterns People can recognize patterns not captured by data analysis tools Related to the area of Exploratory Data Analysis (EDA) Created by statistician John Tukey Seminal book is Exploratory Data Analysis by Tukey A nice online introduction can be found in Chapter 1 of the NIST Engineering Statistics Handbook /div898/handbook/index.htm Techniques Used In Data Exploration In EDA, as originally defined by Tukey The focus was on visualization Clustering and anomaly detection were viewed as exploratory techniques In data mining, clustering and anomaly detection are major areas of interest, and not thought of as just exploratory In our discussion of data exploration, we focus on Summary statistics Visualization Online Analytical Processing (OLAP) Iris Sample Data Set Many of the exploratory data techniques are illustrated with the Iris Plant data set. Can be obtained from the UCI Machine Learning Repository /~mlearn/MLRepository.html From the statistician Douglas Fisher Three flower types (classes): Setosa Virginica Versicolour Four (non-class) attributes Sepal width and length Petal width and length Summary Statistics Summary statistics are numbers that summarize properties of the data Summarized properties include frequency, location and spread Examples: location - mean spread - standard deviation Most summary statistics can be calculated in a single pass through the data Frequency and Mode The frequency of an attribute value is the percentage of time the value occurs in the data set For example, given the attribute ‘gender’ and a representative population of people, the gender ‘female’ occurs about 5

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档