网络爬虫技术探究 --毕业设计.doc

下载文档 降价啦

9
0
约 49页
2017-08-17 发布于辽宁
举报
版权申诉
保障服务

网络爬虫技术探究 --毕业设计.doc

1、本文档共49页，可阅读全部内容。
2、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。
3、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

JIU JIANG UNIVERSITY 毕业论文题目网络爬虫技术探究英文题目 Web Spiders Technology Explore 院系信息科学与技术学院专业计算机科学与技术姓名闻泽班级学号 A081129 指导教师邱兴兴二一二年五月摘要网络爬虫是一种自动搜集互联网信息的程序。通过网络爬虫不仅能够为搜索引擎采集网络信息，而且可以作为定向信息采集器，定向采集某些网站下的特定信息，如招聘信息，租房信息，以及网络营销常要的邮箱地址信息等。本文通过JAVA实现了一个基于广度优先算法的爬虫程序。本论文阐述了网络爬虫实现中一些主要问题：为何使用广度优先的爬行策略，以及如何实现广度优先爬行；系统实现过程中的数据存储；网页信息解析等。通过实现这一爬虫程序，可以搜集某一站点的所有URLs，并通过得到的URLs采集到页面的内容，在从内容中提取到需要的内容，如邮箱地址以及页面标题等。再将得到的URLs等采集到数据存到数据库，以便检索。本文从搜索引擎的应用出发，探讨了网络爬虫在搜索引擎中的作用和地位，提出了网络爬虫的功能和设计要求。在对网络爬虫系统结构和工作原理所作分析的基础上，研究了页面爬取、解析等策略和算法，并使用Java实现了一个网络爬虫的程序，对其运行结果做了分析。关键词：网络爬虫，广度优先Abstract The Web Spider is an automated program collects information on the Internet. The Web Spider can not only search engine to collect network information and can be used as directional information collection, directed acquisition of some site specific information, such as recruitment information, rental information, as well as network marketing often have to e-mail address information. JAVA Implementation of an algorithm based on breadth first Spider program. This paper described the data stored in the Web Spider to achieve some of the major questions: Why use a breadth-first crawling strategy, as well as how to implement the breadth-first crawling; system implementation process; web page information to resolve. Through the realization of this Spider can collect all of a sites URLs, URLs collected by and get to the page content, to extract from the content, the content, such as email address and page title. And then get the Urls collected was data saved to the database to retrieve. In this paper, the application of the search engine to explore the role and status of a Web Spider search engine, web Spider functionality and design requirements. Web Spider system structure and working principle of the analysis based on study strategies and algorithms of the page crawling, parsing, etc. and use the Java implem