基于多特征融合的网页对象自动定位技术分析-analysis of web page object automatic location technology based on multi-feature fusion.docxVIP
- 2
- 0
- 约3.46万字
- 约 57页
- 2018-05-18 发布于上海
- 举报
基于多特征融合的网页对象自动定位技术分析-analysis of web page object automatic location technology based on multi-feature fusion
ResearchinAutomaticLocatingTechnologiesofWebPageObjectsBasedonMulti-featuresAbstractWebpageobjectslocatingisthekeytechnologyforwebinformationextraction.Throughthistechnology,thevaluableinformationinwebpagescanbeautomaticallyandexactlylocated.Onthebasisofthis,itbecomeseasytoextractdatafromwebpages.Therefore,Webpageobjectlocatingtechnologyisfundamentalinareasofwebdatamining,verticalsearch,searchengineandsoon.Inthispaper,awebpageobjectlocatingmethodbasedonmultifeaturesfusionisproposed.Byfusingmultilocatingmethods,theprecisionrateandstabilityofwebpageobjectlocatingmethodperformbetterthananyofthesinglemethod.Thelocatingmethodcanbedividedintotwophases,whicharefeatureextractionphaseandwebobjectlocatingphase.Atwebpageobjectfeatureextractionphase,afeaturedescriptionlanguageisfirstlydefinedtoexpressallkindsofwebobjectfeatures.Thelanguageisopenandextendable,inwhichnewfeaturescanbeaddedinthefuture.Then,amethodofextractingwebobjectDOMtreepathfeatureofawebpagewasrealized.Onthebasisofthis,twowebpageobjectlocationmethodareproposedonebyone,whicharecompressedDOMtreebasedmethodandreferencepointmethod.Theabovethreemethodsextractsthreedifferentfeaturesofawebpagerespectively.Toverifythevalidityofthemethod,atestiscontinuedandtheresultsaysthemultifeaturefusionmethodperformsbetterthanothers.Keywords:Verticalsearch,Webpage,Locating,Informationextraction目录第一章绪论11.1课题研究背景11.2课题研究意义21.3课题发展历史31.4课题研究现状31.4.1人工分析方法31.4.2基于包装器的方法41.4.3基于视觉的方法51.4.4现状小结61.5本文研究的主要内容6第二章网页对象定位特征描述语言82.1相关概念8HTML介绍8XML介绍9DOM介绍102.2网页对象的定位特征112.3多特征融合的定位方法描述语言112.3.1特征描述文件112.3.2特征节点132.3.3特征节点的类型132.3.4特征节点集合132.3.5方法节点13第三章网页对象定位特征的抽取153.1网页对象特征抽取基本流程163.2定位目标的选择163.3特征抽取173.3.1网页对象DOM树路径抽取173.3.2网页对象的DOM压缩树路径抽取173.3.3基于参考点的网页对象定位特征抽取193.4特征验证223.4.1总体思路223.4.2特征验证的流程233.5特征保存243.6本章小结24第四章基于多特征融合的网页对象定位264.1基于多特征的网页对象定位方法的初始化274.2基于多特征的网页对象定位方法的基本流程274.3网页预处理284.4网页对象定位284.4.1基于DOM树路径的网页对象定位方法294.4.2基于DOM压缩树的网页对象定位方法294.4.3基于参考点的网页对象定位方法304.5网页对象特征更新314.6本章小结31第五章实验及验证325.1软件介绍325.1.1软硬件的环境配置325.1.2软件运行界面325.2测试方案
您可能关注的文档
- 基于多分辨率分析的图像融合技术分析-analysis of image fusion technology based on multiresolution analysis.docx
- 基于多分辨率小波的多聚焦图像融合分析-multi - focus image fusion analysis based on multiresolution wavelet.docx
- 基于多分辨率聚类的安全定位算法分析-analysis of secure location algorithm based on multiresolution clustering.docx
- 基于多分类器融合的遥感影像分类方法分析-analysis of remote sensing image classification method based on multi-classifier fusion.docx
- 基于多分类支持向量机的选股模型分析-analysis of stock picking model based on multi-classification support vector machine.docx
- 基于多分辨率分析的遥感图像融合技术分析-analysis of remote sensing image fusion technology based on multiresolution analysis.docx
- 基于多告警源关联分析的僵尸检测方法分析-analysis of zombie detection method based on multi-alarm source correlation analysis.docx
- 基于多幅未标定图像的标识点三维重建方法分析-analysis of identification point 3d reconstruction method based on multiple uncalibrated images.docx
- 基于多感官互动的海峡两岸城市景观设计的对比与研究-comparison and research on landscape design of cross-strait cities based on multi-sensory interaction.docx
- 基于多观测似然比的语音活动检测分析与系统实现-analysis and system implementation of speech activity detection based on multi-observation likelihood ratio.docx
- 小区绿化施工协议书.docx
- 墙面施工协议书.docx
- 1 古诗二首(课件)--2025-2026学年统编版语文二年级下册.pptx
- (2026春新版)部编版八年级道德与法治下册《3.1《公民基本权利》PPT课件.pptx
- (2026春新版)部编版八年级道德与法治下册《4.3《依法履行义务》PPT课件.pptx
- (2026春新版)部编版八年级道德与法治下册《6.2《按劳分配为主体、多种分配方式并存》PPT课件.pptx
- (2026春新版)部编版八年级道德与法治下册《6.1《公有制为主体、多种所有制经济共同发展》PPT课件.pptx
- 初三教学管理交流发言稿.docx
- 小学生课外阅读总结.docx
- 餐饮门店夜经济运营的社会责任报告(夜间贡献)撰写流程试题库及答案.doc
原创力文档

文档评论(0)