- 3
- 0
- 约6.04千字
- 约 27页
- 2016-02-25 发布于江苏
- 举报
stanford大学-大数据挖掘-introduction1.ppt
CS345A: Data Mining on the Web Course Introduction Issues in Data Mining Bonferroni’s Principle Course Staff Instructors: Anand Rajaraman Jeff Ullman Reach us as cs345a-win0809-staff @ . More info on /class/cs345a. Requirements Homework (Gradiance and other) 20% Go to /pearson Enter class code 83769DC9. If you took CS145 or CS245 in the past year, you should have free access; otherwise you will have to purchase access from Pearson Ed. Project 40% Final Exam 40% Project Software implementation related to course subject matter. Should involve an original component or experiment. More later about available data and computing resources. Possible Projects Many past projects have dealt with collaborative filtering (advice based on what similar people do). E.g., Netflix Challenge. Others have dealt with engineering solutions to “machine-learning” problems. ML-Replacement Projects ML generally requires a large “training set” of correctly classified data. Example: classifying Web pages by topic. Hard to find well-classified data. Exception: Open Directory works for page topics, because work is collaborative and shared by many. Other good exceptions? ML-Replacement – (2) Many problems require thought rather than ML: Tell important pages from unimportant (PageRank). Tell real news from publicity (how?). Distinguish positive from negative product reviews (how?). Etc., etc. Team Projects Working in pairs OK, but … No more than two per project. We will expect more from a pair than from an individual. The effort should be roughly evenly distributed. What is Data Mining? Discovery of useful, possibly unexpected, patterns in data. Subsidiary issues: Data cleaning: detection of bogus data. E.g., age = 150. Entity resolution. Visualization: something better than megabyte files of output. Cultures Databases: concentrate on large-scale (non-main-memory) data. AI (machine-learning): concentrate on complex methods, small data. Statistics: concentrate on models. Models vs. Analytic Proce
您可能关注的文档
最近下载
- 《我家是动物园》教案(2025—2026学年).docx VIP
- QSY 02025-2017 油水井压裂设计规范.pdf VIP
- 《轩辕剑4全全地图超完美攻略--黑龙舞兮云飞扬(最详细的心得体会、可下载、可编辑、可复制)》.doc VIP
- 2025-2026学年湖南省长沙市浏阳市高一(上)期末数学试卷(含答案).pdf VIP
- 一种基于层次分析法与灰色模糊综合评价法的项目效益评估方法.docx VIP
- 设备自检记录(电焊机).doc VIP
- 保时捷Boxster、Boxster S_2013款_汽车使用手册用户操作图解驾驶车主车辆说明书电子版.pdf
- 2024新疆中考语文试题(附参考答案) .pdf VIP
- 颅脑体表标志与脑出血穿刺定位方法.ppt VIP
- 塑料注射成型与模具设计指南_401-450.docx VIP
原创力文档

文档评论(0)