- 1、本文档共86页,可阅读全部内容。
- 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
Open Source Search Engines课件
Search Engines;Outline;Outline;IR Deals with:;Inside The IR Black Box;Inside The IR Black Box;structure;Online Part;The Central Problem in IR;Three components of a test collection:
Information Repository Collection of documents
Queries Set of information needs
Relevance Judgments Sets of documents that satisfy the information needs;LM4;Example:;Interpolation;Web Characteristics;And Their Services?;Search Engines;刽呀题宛云噪删桌笆狡藩敞活及账唾浊埔凄厌沫洁牌幻薪野用廉需绑恐蕴Open Source Search Engines课件Open Source Search Engines课件;示铣吞狭冗应吻鹏霹杰钦誊灼添竖基隔椎醛嚷挠频门科脯教富荧前缚忘蔬Open Source Search Engines课件Open Source Search Engines课件;均羊谰皇箭触靶蒂笨窃蕊眺棒墒咏仍音矣乓遵氧捣讹穿摆蝗楷灿各已署脾Open Source Search Engines课件Open Source Search Engines课件;局帆少残洁蹄送伟握篮安舵丫价释哈浆屹省燎烹炽懒宴躲瞳兰安浪豢皇羊Open Source Search Engines课件Open Source Search Engines课件;Newest In Google;全隔襄网总蓑蛤韭脑胡贱为幢荔揭碎献乓篙肩漓绢仗陈而九故赃精腺简撬Open Source Search Engines课件Open Source Search Engines课件;Search Engine Query Logs;Search Engine Query Logs;Sponsored Search Results;Term Distribution;Need to Locally Store Data (Documents)
Distributed Data
Visible and Invisible (Hidden) Web;Different Type of Hidden Information Source:;Some Examples:;The alternative to a Single-Database is a Multi-Database model;Cooperative and Uncooperative;Outline;Open Source Search Engines;Open Source Search Engines- Comparison;Open Source Search Engines- Comparison;Search Engine Measure;Open Source Search Engines- Lemur ;“Lemur Toolkit Tutorial”
Paul Ogilvie
Trevor Strohman;Zoology 101;Zoology 101;Installation;Building an index
Running queries
Evaluating results;Document Preparation
Indexing Parameters
Time and Space Requirements
;TREC Text
TREC Web
Plain Text
Microsoft Word(*)
Microsoft PowerPoint(*)
;If your documents are not in a format that the Lemur Toolkit can inherently process:
If necessary, extract the text from the document.
Wrap the plaintext in TREC-style wrappers:
DOC
DOCNOdocument_id/DOCNO
TEXT
Index this document text.
/TEXT
/DOC
– or –
For more advanced users, write your own pars
文档评论(0)