数据挖掘基本分类方法重点.ppt

  1. 1、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。。
  2. 2、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载
  3. 3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
Confidence Interval for Accuracy For large test sets N 30 , acc has a normal distribution with mean p and variance p 1-p /N Confidence Interval for p: Area 1 - ? Z?/2 Z1- ? /2 Confidence Interval for Accuracy Consider a model that produces an accuracy of 80% when evaluated on 100 test instances: N 100, acc 0.8 Let 1-? 0.95 95% confidence From probability table, Z?/2 1.96 1-? Z 0.99 2.58 0.98 2.33 0.95 1.96 0.90 1.65 N 50 100 500 1000 5000 p lower 0.670 0.711 0.763 0.774 0.789 p upper 0.888 0.866 0.833 0.824 0.811 Comparing Performance of 2 Models Given two models, say M1 and M2, which is better? M1 is tested on D1 size n1 , found error rate e1 M2 is tested on D2 size n2 , found error rate e2 Assume D1 and D2 are independent If n1 and n2 are sufficiently large, then Approximate: Comparing Performance of 2 Models To test if performance difference is statistically significant: d e1 – e2 d ~ N dt,?t where dt is the true difference Since D1 and D2 are independent, their variance adds up: At 1-? confidence level, An Illustrative Example Given: M1: n1 30, e1 0.15 M2: n2 5000, e2 0.25 d |e2 – e1| 0.1 2-sided test At 95% confidence level, Z?/2 1.96 Interval contains 0 difference may not be statistically significant Comparing Performance of 2 Algorithms Each learning algorithm may produce k models: L1 may produce M11 , M12, …, M1k L2 may produce M21 , M22, …, M2k If models are generated on the same test sets D1,D2, …, Dk e.g., via cross-validation For each set: compute dj e1j – e2j dj has mean dt and variance ?t Estimate: Computing Impurity Measure Split on Refund: Entropy Refund Yes 0 Entropy Refund No - 2/6 log 2/6 – 4/6 log 4/6 0.9183 Entropy Children 0.3 0 + 0.6 0.9183 0.551 Gain 0.9 ? 0.8813 – 0.551 0.3303 Missing value Before Splitting: Entropy Parent -0.3 log 0.3 - 0.7 log 0.7 0.8813 Distribute Instances Refund Yes No Refund Yes No Probability that Refund Yes is 3/9 Probability that Refu

文档评论(0)

baobei + 关注
实名认证
内容提供者

该用户很懒,什么也没介绍

1亿VIP精品文档

相关文档