多媒体图像处理_文本最新.pptVIP

  • 1
  • 0
  • 约2.21千字
  • 约 99页
  • 2017-04-18 发布于湖北
  • 举报
Text Processing;;Plan for this lecture;Parsing a document;Complications: Format/language;基于文本的搜索 (跨语言/跨频道/跨来源);Pre-Processing: From token to term;TOKENS AND TERMS;Tokenization;Tokenization;Numbers;Tokenization: language issues;;世界上最长的单词?;Tokenization: language issues;我知道你不知道;中文分词;2008年8月8日晚举世瞩目的北京第二十九届奥林匹克运动会开幕式在国家体育场隆重举行;Tokenization: language issues;Stop words;Stop Words;Search: 独孤天骄的SEO博客;English Stop Words;Chinese Stop Words;Search: “Everything I do, I do it for you”;Normalization to terms;Normalization: other languages;Normalization: other languages;Case folding;Normalization to terms;Thesauri and soundex;Lemmatization;Stemming;Porter’s algorithm;Typical rules in Porter;Other stemmers;Dictionary entries – first cut;PHRASE QUERIES AND POSITIONAL INDEXES;Phrase queries;A first attempt: Biword indexes;Longer phrase queries;Extended biwords;Issues for biword indexes;Solution 2: Positional indexes;Positional index example;Processing a phrase query;Positional index size;Positional index size;Combination schemes;SPELLING CORRECTION;;Spell correction;Document correction;Query mis-spellings;Isolated word correction;Isolated word correction;Edit distance;Weighted edit distance;Using edit distances;Edit distance to all dictionary terms?;n-gram overlap;Example with trigrams;One option – Jaccard coefficient;Matching trigrams;Context-sensitive spell correction;;Context-sensitive correction;General issues in spell correction;SIMILARITY MEASURE;Binary term-document incidence matrix;Term-document count matrices;Bag of words model;Term frequency tf;Example;Log-frequency weighting;Document frequency;Document frequency, continued;idf weight;idf example, suppose N = 1 million;Effect of idf on ranking;Collection vs. Document frequency;tf-idf weighting;Final ranking of documents for a query;Binary → count → weight matrix;Documents as vectors;Queries as vectors;Formalizing vector space proximity;Why distance is a bad idea;Use angle instead of distance;From angles to cosines;From angle

文档评论(0)

1亿VIP精品文档

相关文档