windows上搭建自己的搜索引擎nutch.doc

  1. 1、本文档共5页,可阅读全部内容。
  2. 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
  3. 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载
  4. 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
windows上搭建自己的搜索引擎nutch nutch windows install guider --By Liming Liu ? 1 Install Cygwin 2 Install JDK 3 Install Tomcat 4 Pre-Install nutch 5 Configure and run nutch 6 Begin search 7 Referece ? 1 Install Cygwin Download and install the latest version, must select GCC while selecting packages.   ??? 2 Install JDK Download jdk-1_5_0_06-windows-i586-p.exe and install(acquiescently, C:/Program Files/Java/jdk1.5.0_06 ). ? Set environmental variable: NUTCH_JAVA_HOME: C:/Program Files/Java/jdk1.5.0_06 JAVA_HOME: C:/Program Files/Java/jdk1.5.0_06 ? 3 Install Tomcat Download apache-tomcat-6.0.13.exe and install(acquiescently, C:/Program Files/Apache Software Foundation/Tomcat 6.0).Remember the port, account and password. ? 4 Pre-Install nutch Download nutch-0.9.tar.gz and unzip to nutch-0.9(such as C:/dev/search/netch/nutch-0.9). ? Start Tomcat service, open?http://localhost:8080/manager/html ? Move to “WAR file to deploy”, upload file:?C:/dev/search/netch/nutch-0.9/nutch-0.9.war. ? Close Tomcat service, change directory name “ROOT” in “C:/Program Files/Apache Software Foundation/Tomcat 6.0/webapps” to “ ROOT-backup”, change directory name “nutch-0.9” in “C:/Program Files/Apache Software Foundation/Tomcat 6.0/webapps” to “ ROOT”.( OR do nothing) ? 5 Configure and run nutch Create directory “urls” in “C:/dev/search/netch/nutch-0.9”. Create a file “testurlfile” in directory “urls”. Add line: “?“ to??file “testurlfile”. Find file “C:/dev/search/netch/nutch-0.9/conf/ crawl-urlfilter.txt”, replace “MY.DOMAIN.NAME” with “” ? ? Find file “C:/dev/search/netch/nutch-0.9/conf/ nutch-site.xml”, edit it to this: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? ? !-- Put site-specific property overrides in this file. -- ? configuration ? property 牋name/namet 牋valuenutch/valuee 牋descriptionHTTP User-Agent request header. MUST NOT be empty -n 牋please set this to a single word uniquely related to your organization.l ? 牋NOTE: You should also check other related propertie

文档评论(0)

fc86033 + 关注
实名认证
内容提供者

该用户很懒,什么也没介绍

1亿VIP精品文档

相关文档