- 0
- 0
- 约6.04千字
- 约 27页
- 2019-04-18 发布于浙江
- 举报
CS345A: Data Mining on the Web Course Introduction Issues in Data Mining Bonferroni’s Principle Course Staff Instructors: Anand Rajaraman Jeff Ullman Reach us as cs345a-win0809-staff @ . More info on /class/cs345a. Requirements Homework (Gradiance and other) 20% Go to /pearson Enter class code 83769DC9. If you took CS145 or CS245 in the past year, you should have free access; otherwise you will have to purchase access from Pearson Ed. Project 40% Final Exam 40% Project Software implementation related to course subject matter. Should involve an original component or experiment. More later about available data and computing resources. Possible Projects Many past projects have dealt with collaborative filtering (advice based on what similar people do). E.g., Netflix Challenge. Others have dealt with engineering solutions to “machine-learning” problems. ML-Replacement Projects ML generally requires a large “training set” of correctly classified data. Example: classifying Web pages by topic. Hard to find well-classified data. Exception: Open Directory works for page topics, because work is collaborative and shared by many. Other good exceptions? ML-Replacement – (2) Many problems require thought rather than ML: Tell important pages from unimportant (PageRank). Tell real news from publicity (how?). Distinguish positive from negative product reviews (how?). Etc., etc. Team Projects Working in pairs OK, but … No more than two per project. We will expect more from a pair than from an individual. The effort should be roughly evenly distributed. What is Data Mining? Discovery of useful, possibly unexpected, patterns in data. Subsidiary issues: Data cleaning: detection of bogus data. E.g., age = 150. Entity resolution. Visualization: something better than megabyte files of output. Cultures Databases: concentrate on large-scale (non-main-memory) data. AI (machine-learning): concentrate on complex methods, small data. Statistics: concentrate on models. Models vs. Analytic Proce
您可能关注的文档
最近下载
- 保卫黄河课件保卫黄河(4295KB).ppt VIP
- (2026春新版)人教版二年级数学下册《第四单元 万以内的加法和减法》教案.docx VIP
- 深度解析(2026)《HYT 147.3-2013海洋监测技术规程 第3部分生物体》.pptx VIP
- 2025年高考物理真题试卷(新课标卷)(及答案).docx VIP
- 中联 ZCC1100H 履带起重机 零件图册 英文.pdf VIP
- 2020模块化钢结构房屋建筑构造.docx VIP
- (2026春新版)苏教版三年级科学下册《第二单元 植物的一生》PPT课件.pptx
- 24节气之3:惊蛰—高考英语【China Daily 外刊双语精读24节气】精读.docx VIP
- 2025年金融风险管理师全面风险管理框架整合与情景应用专题试卷及解析.pdf VIP
- 2025年白酒行业ESG白皮书.pdf
原创力文档

文档评论(0)