- 1、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。。
- 2、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
- 4、该文档为VIP文档,如果想要下载,成为VIP会员后,下载免费。
- 5、成为VIP后,下载本文档将扣除1次下载权益。下载后,不支持退款、换文档。如有疑问请联系我们。
- 6、成为VIP后,您将拥有八大权益,权益包括:VIP文档下载权益、阅读免打扰、文档格式转换、高级专利检索、专属身份标志、高级客服、多端互通、版权登记。
- 7、VIP文档为合作方或网友上传,每下载1次, 网站将根据用户上传文档的质量评分、类型等,对文档贡献者给予高额补贴、流量扶持。如果你也想贡献VIP文档。上传文档
查看更多
网络爬虫源代码(Web crawler source code)
网络爬虫源代码(Web crawler source code)
Web crawler source code.Txt, and then a few decades, we come to meet, sent to the crematorium, all burned into ashes, you pile, I pile, who do not know who, all sent to the countryside to do fertilizer. Public, class, Spider, implements, Runnable
{
Private ArrayList URLs; //URL list
Private HashMap indexedURLs; / / URL has retrieved the list
Private int threads; / / initialize thread number
Public, static, void, main (String, argv[]) throws Exception
{
If (argv[0] = null)
{
System.out.println (Missing required argument: [Sit URL]);
Return;
}
Spider Spider = new Spider (argv[0]);
Spider.go ();
}
Public Spider (String strURL)
{
URLs = new, ArrayList ();
Threads = 10;
Urls.add (strURL);
ThreadList = new, ArrayList ();
IndexedURLs = new, HashMap ();
(if) (urls.size = 0)
Throw new IllegalArgumentException (Missing required argument: -u [start url]);
If (threads 1)
(Invalid, number, of, threads: +)
Threads);
}
Public void go (String strURL) throws Exception
{
Each entry point URL / / index
Long start = System.currentTimeMillis ();
For (int i = 0; I threads; i++) {
Thread, t = new, Thread (this, Spide) + (i+1);
T.start ();
ThreadList.add (t);
}
While (threadList.size () , 0) {
Thread child = (Thread) threadList.remove (0);
Child.join ();
}
Long elapsed = System.currentTimeMillis () - start;
}
Public, void, run () {
String url;
{try
While ((url = dequeueURL ()) = = null) {
IndexURL (URL);
}
}catch (Exception, e) {
Logger.info (e.getMessage ());
}
}
Detection of URL / / list container have not been resolved if there is URL, URL is returned by the thread to continue
Public, synchronized, String, dequeueURL (), throws, Exception {
While (true) {
If (urls.size () ) 0
{
Return (String) urls.remove (0);
}
{else
Threads--;
If (threads 0)
{
Wait ();
Threads++;
}
Else
{
NotifyAll ();
Return null;
}
}
}
}
*
* add URL and current URL progressions, and wake up sleep threads
* /
公共使用同步enqueueurl(网址字符串、整型)
{
如果(indexedurls。得到(URL)= = null)
{
添加(URL);
i
您可能关注的文档
- 汽车怠速有那几种_百度知道(There are several car idle _ Baidu know).doc
- 汽车发展(Automobile development).doc
- 汽车经销商营销不想颓废(Auto dealer marketing doesn't want to be decadent).doc
- 沈阳120个小吃好去处(Shenyang, 120 snacks, a good place to go).doc
- 沃尔玛(Wal-Mart).doc
- 汽车维修前台接待流程接待(Front desk reception).doc
- 汽车销售案例(Car sales case).doc
- 汽车美容店的准备工作(Preparations for car beauty shops).doc
- 汽车工业(automotive industry).doc
- 沙漠战规律(Law of desert warfare).doc
- 网络详解(Network explanation).doc
- 网络远程教育-科学的利用远程教育资源 为农村孩子搭建学习平台(Network Distance Education - the scientific use of distance education resources to build learning platform for rural children).doc
- 网上邻居详解(Online neighborhood detailed).doc
- 网购普及假货繁衍 网友大肆评比(Online shopping popular fake reproduction netizens wantonly rating).doc
- 罗汉鱼的饲养(Feeding Lohan).doc
- 网页设计通用模块(Web design universal module).doc
- 网站如何利用好收藏夹(How to make good use of your favorites).doc
- 美丽的校园(Beautiful campus).doc
- 美剧《迷失》主要人物简介(Brief introduction to the main characters of American TV series lost).doc
- 美国汽车业(American auto industry).doc
文档评论(0)