用pyt恩hon实现网络爬虫、蜘蛛.doc

用pyt恩hon实现网络爬虫、蜘蛛

python 中如何提取网页正文啊 谢谢 import urllib.request ? url=/ ? response=urllib.request.urlopen(url) ? page=response.read() ? python提取网页中的文本 import os,sys,datetime??? import httplib,urllib, re??? from sgmllib import SGMLParser??? ?? import types??? ?? class Html2txt(SGMLParser):??? ????def reset(self):??? ????????self.text = ?? ????????self.inbody = True?? ???????? SGMLParser.reset(self)??? ????def handle_data(self,text):??? ????????if self.inbody:??? ????????????self.text += text??? ?? ????def start_head(self,text):??? ????????self.inbody = False?? ????def end_head(self):??? ???????

文档评论(0)

1亿VIP精品文档

相关文档