贝叶斯分类、KNN、特征选择、评估.pptVIP

  • 13
  • 0
  • 约2.17万字
  • 约 78页
  • 2020-03-25 发布于浙江
  • 举报
统计机器学习与数据挖掘技术与方法研讨班讲义 kNN vs. Naive Bayes Bias/Variance tradeoff Variance ≈ Capacity kNN has high variance and low bias. Infinite memory NB has low variance and high bias. Decision surface has to be linear (hyperplane) Summary Categorization Training data Over-fitting Generalize Na?ve Bayes Bayesian Methods Bernoulli NB classifier Multinomial NB classifier K-Nearest Neighbor Bias .vs. Variance Feature selection Chi-square test Mutual Information Readings [1] IIR Ch13, Ch14.2 [2] Y. Yang and X. Liu, A re-examination of text categorization methods, presented at Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR99), 1999. Classification Evaluation Most (over)used data set 21578 documents 9603 training, 3299 test articles (ModApte split) 118 categories An article can be in more than one category Learn 118 binary category distinctions Average document: about 90 types, 200 tokens Average number of classes assigned 1.24 for docs with at least one category Only about 10 out of 118 categories are large Common categories (#train, #test) Evaluation: Classic Reuters Data Set Earn (2877, 1087) Acquisitions (1650, 179) Money-fx (538, 179) Grain (433, 149) Crude (389, 189) Trade (369,119) Interest (347, 131) Ship (197, 89) Wheat (212, 71) Corn (182, 56) Reuters Text Categorization data set (Reuters-21578) document REUTERS TOPICS=YES LEWISSPLIT=TRAIN CGISPLIT=TRAINING-SET OLDID=12981 NEWID=798 DATE 2-MAR-1987 16:51:43.42/DATE TOPICSDlivestock/DDhog/D/TOPICS TITLEAMERICAN PORK CONGRESS KICKS OFF TOMORROW/TITLE DATELINE CHICAGO, March 2 - /DATELINEBODYThe American Pork Congress kicks off tomorrow, March 3, in Indianapolis with 160 of the nations pork producers from 44 member states determining industry positions on a number of issues, according to the National Pork Producers Council, NPPC. Delegates to the three day Congress will be considering 26 resolutions concerning various issues, including the future directio

文档评论(0)

1亿VIP精品文档

相关文档