基于多特征融合的网页对象自动定位技术分析-analysis of web page object automatic location technology based on multi-feature fusion.docxVIP

  • 2
  • 0
  • 约3.46万字
  • 约 57页
  • 2018-05-18 发布于上海
  • 举报

基于多特征融合的网页对象自动定位技术分析-analysis of web page object automatic location technology based on multi-feature fusion.docx

基于多特征融合的网页对象自动定位技术分析-analysis of web page object automatic location technology based on multi-feature fusion

ResearchinAutomaticLocatingTechnologiesofWebPageObjectsBasedonMulti-featuresAbstractWebpageobjectslocatingisthekeytechnologyforwebinformationextraction.Throughthistechnology,thevaluableinformationinwebpagescanbeautomaticallyandexactlylocated.Onthebasisofthis,itbecomeseasytoextractdatafromwebpages.Therefore,Webpageobjectlocatingtechnologyisfundamentalinareasofwebdatamining,verticalsearch,searchengineandsoon.Inthispaper,awebpageobjectlocatingmethodbasedonmultifeaturesfusionisproposed.Byfusingmultilocatingmethods,theprecisionrateandstabilityofwebpageobjectlocatingmethodperformbetterthananyofthesinglemethod.Thelocatingmethodcanbedividedintotwophases,whicharefeatureextractionphaseandwebobjectlocatingphase.Atwebpageobjectfeatureextractionphase,afeaturedescriptionlanguageisfirstlydefinedtoexpressallkindsofwebobjectfeatures.Thelanguageisopenandextendable,inwhichnewfeaturescanbeaddedinthefuture.Then,amethodofextractingwebobjectDOMtreepathfeatureofawebpagewasrealized.Onthebasisofthis,twowebpageobjectlocationmethodareproposedonebyone,whicharecompressedDOMtreebasedmethodandreferencepointmethod.Theabovethreemethodsextractsthreedifferentfeaturesofawebpagerespectively.Toverifythevalidityofthemethod,atestiscontinuedandtheresultsaysthemultifeaturefusionmethodperformsbetterthanothers.Keywords:Verticalsearch,Webpage,Locating,Informationextraction目录第一章绪论11.1课题研究背景11.2课题研究意义21.3课题发展历史31.4课题研究现状31.4.1人工分析方法31.4.2基于包装器的方法41.4.3基于视觉的方法51.4.4现状小结61.5本文研究的主要内容6第二章网页对象定位特征描述语言82.1相关概念8HTML介绍8XML介绍9DOM介绍102.2网页对象的定位特征112.3多特征融合的定位方法描述语言112.3.1特征描述文件112.3.2特征节点132.3.3特征节点的类型132.3.4特征节点集合132.3.5方法节点13第三章网页对象定位特征的抽取153.1网页对象特征抽取基本流程163.2定位目标的选择163.3特征抽取173.3.1网页对象DOM树路径抽取173.3.2网页对象的DOM压缩树路径抽取173.3.3基于参考点的网页对象定位特征抽取193.4特征验证223.4.1总体思路223.4.2特征验证的流程233.5特征保存243.6本章小结24第四章基于多特征融合的网页对象定位264.1基于多特征的网页对象定位方法的初始化274.2基于多特征的网页对象定位方法的基本流程274.3网页预处理284.4网页对象定位284.4.1基于DOM树路径的网页对象定位方法294.4.2基于DOM压缩树的网页对象定位方法294.4.3基于参考点的网页对象定位方法304.5网页对象特征更新314.6本章小结31第五章实验及验证325.1软件介绍325.1.1软硬件的环境配置325.1.2软件运行界面325.2测试方案

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档