IndexingandRepresentationTheVectorSpaceModel.pptVIP

  • 1
  • 0
  • 约1.12万字
  • 约 42页
  • 2017-01-12 发布于辽宁
  • 举报
IndexingandRepresentationTheVectorSpaceModel.ppt

Slide courtesy Ray Larson Indexing and Representation: The Vector Space Model Document represented by a vector of terms Words (or word stems) Phrases (e.g. computer science) Removes words on “stop list” Documents aren’t about “the” Often assumed that terms are uncorrelated. Correlations between term vectors implies a similarity between documents. For efficiency, an inverted index of terms is often stored. Document Representation What values to use for terms Boolean (term present /absent) tf (term frequency) - Count of times term occurs in document. The more times a term t occurs in document

文档评论(0)

1亿VIP精品文档

相关文档