What is Classification Why is it required什么是分类为什么它需要.pptVIP

  • 16
  • 0
  • 约7.73千字
  • 约 36页
  • 2017-03-09 发布于上海
  • 举报

What is Classification Why is it required什么是分类为什么它需要.ppt

What is Classification Why is it required什么是分类为什么它需要

Decision Trees SLIQ – fast scalable classifier Group 12 -Vaibhav Chopda -Tarun Bahadur Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen Source – http://citeseer.ifi.unizh.ch/mehta96sliq.html Material Includes: lecture notes for CSE634 – Prof. Anita Wasilewska /~cse634 Agenda What is classification … Why decision trees ? The ID3 algorithm Limitations of ID3 algorithm SLIQ – fast scalable classifier for DataMining SPRINT – the successor of SLIQ Classification Process : Model Construction Testing and Prediction (by a classifier) Classification by Decision Tree Induction Decision tree (Tuples flow along the tree structure) Internal node denotes an attribute Branch represents the values of the node attribute Leaf nodes represent class labels or class distribution Classification by Decision Tree Induction Decision tree generation consists of two phases Tree construction At start we choose one attribute as the root and put all its values as branches We choose recursively internal nodes (attributes) with their proper values as branches. We Stop when all the samples (records) are of the same class, then the node becomes the leaf labeled with that class or there is no more samples left or there is no more new attributes to be put as the nodes. In this case we apply MAJORITY VOTING to classify the node. Tree pruning Identify and remove branches that reflect noise or outliers Classification by Decision Tree Induction Wheres the challenge ? Good choice of root attribute Good choice of the internal nodes attributes is a crucial point. Decision Tree Induction Algorithms differ on methods of evaluating and choosing the root and internal nodes attributes. Basic Idea of ID3/C4.5 Algorithm - greedy algorithm - constructs decision trees in a top-down recursive divide-and-conquer manner. Tree STARTS as a single node (root) representing all training dataset (samples) IF the samples are ALL in the same class, THEN the node becomes a LEAF and is l

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档