矩阵分解与其应用.pptxVIP

  • 3
  • 0
  • 约4.41千字
  • 约 54页
  • 2018-08-28 发布于湖北
  • 举报
Deng Cai (蔡登) College of Computer Science Zhejiang University dengcai@ 1 Matrix Factorization What Is Matrix Factorization? Why Matrix Factorization? Image Recovery Image Recovery Image Recovery Recommendation 5 4 5 2 4 5 1 2 5 5 4 5 2 4 5 2 4 5 2 2 The Matrix  Star Wars  Roman Holiday Titanic  Shrek  Madagascar  Search: Information Retrival Machine Learning Search: Information Retrival Language Model Paradigm in IR Probabilistic relevance model Random variables Bayes’ rule J. Ponte and W.B. Croft, A Language Model Approach to Information Retrieval, ACM SIGIR, 1998. Language Model Paradigm First contribution: prior probability of relevance simplest case: uniform (drops out for ranking) popularity: document usage statistics (e.g. library circulation records, download or access statistics, hyperlink structure) Second contribution: query likelihood query terms q are treated as a sample drawn from an (unknown) relevant document 1 2 1 2 Language Model Paradigm Query generation model: how might a query look like that would ask for a specific document? Maron Kuhns: Indexer manually assigns probabilities for pre-specified set of tags/terms Ponte Croft: Statistical estimation problem Think of a relevant document. Formulate a query by picking some of the keywords as query terms. Environmentalists are blasting a Bush administration proposal to lift a ban on logging in remote areas of national forests, saying the move ignores popular support for protecting forests. Query Likelihood Document-Term Matrix D = Document collection W = Lexicon/Vocabulary A 100 Millionths of a Typical Document-Term Matrix Typical: Number of documents  1.000.000 Vocabulary  100.000 Sparseness 0.1 % Fraction depicted  1e-8 Three Problems Image Recovery Search Recommendation Incomplete Matrix Query Likelihood Matrix Factorization Matrix Factorization Relation to Dimensionality Reduction Dimensionality Reduction Linear transformation Algorithms Sin

文档评论(0)

1亿VIP精品文档

相关文档