信息检索四Dictionary.ppt

湖南大学计算机与通信学院 刘钰峰 Recap of the previous lecture Basic inverted indexes: Structure: Dictionary and Postings Key step in construction: Sorting Boolean query processing Simple optimization Linear time merging Recall basic indexing pipeline Parsing a document What format is it in? pdf/word/excel/html? What language is it in? What character set is in use? Complications: Format/language Documents being indexed can include docs from many different languages A single index may have to contain terms of several languages. Sometimes a document or its components can contain multiple languages/formats Fre

文档评论(0)

1亿VIP精品文档

相关文档