基于抽样的deep web模式匹配分析-deep web pattern matching analysis based on sampling.docxVIP

  • 19
  • 0
  • 约4.13万字
  • 约 56页
  • 2018-05-18 发布于上海
  • 举报

基于抽样的deep web模式匹配分析-deep web pattern matching analysis based on sampling.docx

基于抽样的deep web模式匹配分析-deep web pattern matching analysis based on sampling

基于抽样的DeepWeb模式匹配研究摘要随着电子商务的蓬勃发展,Web数据库数量激增,使得DeepWeb成为新的研究热点。网络中DeepWeb模式数量众多,传统的模式匹配方法通常在两个模式间进行,难以胜任DeepWeb模式匹配工作。多源模式匹配方法能够同时匹配多个模式,发掘所有属性间的匹配关系,对DeepWeb资源的高效利用具有重要意义。本文介绍了Deep Web的背景和研究现状,探讨了DeepWeb数据集成、模式匹配及其特点,重点研究了两种经典的多源Deep Web模式匹配框架。DCM框架是一种针对DeepWeb模式特性的匹配技术,能同时完成多个模式间的复杂匹配工作;但DCM框架在处理异常模式集时,查准率低下。针对DCM的缺陷,本文采用抽样的方法消除异常模式的影响,提出基于抽样的模式匹配框架。基于所提出的模式匹配框架,设计了一种DeepWeb模式匹配算法(SMBS),该算法挖掘出更加完整的模式信息,构建统一查询接口模式(GIS)的生成模型,与传统Deep Web模式匹配算法直接构建GIS不同,SMBS算法构建的模型能根据需求生成GIS,提高系统的普适性。以BAMM数据集作为实验数据,分别在正常模式集和异常模式集情况下,对SMBS进行测试,并与DCM、MGS进行比较,测试结果表明SMBS匹配查准率高于DCM。关键词:DeepWeb,模式匹配,相关性挖掘,抽样ResearchofDeepWebSchemaMatchingBased onSamplingAbstractWiththeboomofe-commerceandthesharpriseinthenumberofWebdata- bases,DeepWebhasbecomeanewresearchhotspot.Becauseofthelargenumber ofDeepWebschemas,traditionalschemamatchingmethodshavebeenincapable forthematchingwork.Multi-sourcematchingmethodswhichcanmatchschemas insame time have far-reaching significance inefficient using ofDeepWeb.ThisdissertationintroducesthebackgroundandthecurrentsituationofDeep Web,anddescribestheDeepWebdataintegration,schema matchingandthefea- turesofDeepWebschemamatching.Twoclassicschemamatchingframeworkfor DeepWebareanalyzed.DCM(dualcorrelationmining)frameworkisa schema matchingframeworkfittingforthecharacteristicsofDeepWebschemas,whichcangetthecomplexity matchsbetweenmultipleschemasinsametime.HowevertheDCMframework wouldhavealowprecisionwhensomespecialschemaswereintheset.Toraisethe precision,thisdissertationproposedanewschemamatchingframeworkwhich eliminates theimpact ofspecial schemas bysampling .Based on the proposed matching framework, a schema matching algo- rithm(SMBS)wasdesigned.SMBSdigsoutmorematchinginformation,andbuilds aunifiedqueryinterfaceschema(GIS)generationmodel.Sincetraditionalschema matchingalgorithmsbuildGISdirectly,SMBScangeneratGISondemandbythe generation model, which improves the systems universality.WithBAMMdatasetastheexperimentaldata,SMBSwastestedandcompared withDCMand MGSframework.Ther

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档