基于混合遗传退火算法web信息抽取方法分析与实现-analysis and implementation of web information extraction method base on hybrid genetic annealing algorithm.docx

下载文档 降价啦

7
0
约5.47万字
约 75页
2018-06-05 发布于上海
举报
版权申诉
保障服务

基于混合遗传退火算法web信息抽取方法分析与实现-analysis and implementation of web information extraction method base on hybrid genetic annealing algorithm.docx

1、本文档共75页，可阅读全部内容。
2、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。
3、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

基于混合遗传退火算法web信息抽取方法分析与实现-analysis and implementation of web information extraction method base on hybrid genetic annealing algorithm

摘要摘要随着网络技术的迅速发展，人们越来越依赖于网络获取信息。Web作为海量信息的来源，是一个巨大的数据库，包含着各种各样有价值的信息。如何从这些web源中抽取出潜在的、有用的信息是一个非常令人关注的研究方向。Web信息抽取是使用数据挖掘技术自动地从Web文档和服务中发现和提取信息和知识的技术，是在网络信息处理过程中加速查找速度和提高查找准确率的重要手段之一。本文介绍了web信息抽取技术的基本知识，以及web信息抽取的国内外研究现状。然后说明了利用隐马尔可夫模型(HiddenMarkovModel，HMM)来进行web信息的抽取的过程。首先，本文介绍了HMM的构建和它的典型算法，然后用HMM对已标记的训练集进行论文头部特定信息的抽取。而对于未标记训练的数据集，因为HMM对初始参数的敏感，引入遗传算法(GeneticAlgorithm，GA)对其优化。由于GA容易过早收敛，本文引入另一种优化算法—模拟退火算法(SimulatedAnnealingAlgorithm，SA)与HMM结合，找出最优HMM的初始参数，给出了基于SA-HMM的web信息抽取的整体框架，然后比较GA和SA这两个优化算法的实验效果。为减少两种方法存在的问题对识别过程的影响和克服两种优化算法本身的缺陷，利用混合遗传退火算法(hybridgenetic/simulatedannealingalgorithm，HGSA)寻找HMM初始参数的全局最优解的，提高了系统的效率。通过对实验结果的分析比较，证明基于GA-HMM和SA-HMM的web信息抽取方法都非常有效。而基于HGSA-HMM的web文本信息抽取方法，因综合了两种优化算法的优点，实验效果优于前两种方法。关键词：web信息抽取，隐马尔可夫模型，遗传算法，退火算法，混合遗传退火算法AbstractABSTRACTThereisagreattendencytorelyonthenetworkinformationastheboomingdevelopmentofnetworktechnology.Theweb,asasourceofimmenseinformation,isahugedatabasewhichcontainsavarietyofvaluableinformation.It’sadistinctlyfascinatingresearchdirectiontohowtoextractpotentialbutusefulthingsfromit.Webinformationextractionisatechnologyusingdataminingthatdiscoversandextractsinformationandknowledgeautomaticallyfromthewebdocumentsandservices.Itisoneofthesignificantmethodsthatacceleratessearchingandimprovestheaccurancyofsearchinginthenetworkinformationprocess.Thisthesisintroducesthebasicknowledgeofwebinformationextractiontechnology,aswellasthecurrentstateofwebinformationextractiontechnologyathomeandabroad.ThenusingtheHiddenMarkovModeltoextractwebinformation.Intheprocess,first,thisthesistalksabouttheformulationoftheHiddenMarkovModelsanditstypicalalgorithms.Second,itmakesuseofHiddenMarkovModeltoextractthespecificinformationoftheheadlinefromthemarkedtrainingdatasets.Withregardtotheun-markedtrainingdatasets,theHiddenMarkovModelcanbeoptimizedbygeneticalgorithmbecauseoftheModel’ssensitivenesstoinitialparameters.Duetogeneticalgorithmiseasytoprematurelyconverge,sothisthesisbringinanotherkindofoptimizationalgorithm—theSimulatedAnnealingAlgo