基于文档集的生物信息挖掘模型研究-计算机工程与应用.PDF

下载文档

3
0
约2.09万字
约 5页
2019-07-06 发布于天津
举报
版权申诉
保障服务

基于文档集的生物信息挖掘模型研究-计算机工程与应用.PDF

1、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。。
2、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

102 2016 ，52（24 ） Computer Engineering and Applications 计算机工程与应用基于文档集的生物信息挖掘模型研究孙红敏，姜楠楠，李想 SUN Hongmin, JIANG Nannan, LI Xiang 东北农业大学电信与信息学院，哈尔滨 150030 School of Electrical and Information, Northeast Agricultural University, Harbin 150030, China SUN Hongmin, JIANG Nannan, LI Xiang. Research on biological information mining model based on document set. Computer Engineering and Applications, 2016, 52 （24 ）：102-106. Abstract ：As the quantity of literature increases dramatically, to get the information manully can ’t adapt to the speed of added literature. This paper proposes a new model of biological data mining, utilizing some tools of open source such as Stanford Parser, using some approaches such as natural language processing and statistics. It also analyzes its crucial technique. During the process to test the SBQTL （Soybean Quantitative Trait Loci）using this model, the precision and recall rate are 93.0% and 78.4% respectively. During the process to test the PubMed, the precision and recall rate are 94.3% and 80.0% respectively. So the problem that the researchers who are engaged in biomedicine can find the information they need from large quantity of literature quickly and efficiently is solved, and biologists can find closet information in bio- medicine and verificate the newest science discovery. Thus, people can better understand the phenomenon of biomedicine. Key words ：text mining; Stanford Parser; text preprocessing; dependencies; information extraction 摘要：针对生物医学文献的数量急剧增长，人工从文献中获取所需要的信息已不能适应生物医学文献数量迅速生长的需要。利用Stanford Parser 等开源工具，采用自然语言处理技术、统计学等多种方法，提出了一种新型的生物信息挖掘模型，并对其关键技术进行分析。该模型在对全文文本SBQTL （Soybean Quantitative Trait Loci）测试中父母本信息提取的准确率和召回率分别为93.0% 和78.4% ；在对PubMed 测试中，准确率和召回率分别为94.3% 和 80.0% 。解决了生物医学研究者从海量文献中更有效、快速地找到所需信息的问题，以便生物学家发现隐藏的生物医学知识并验证得到新的科学发现，从而使人们对生物医学现象的认识得到了提高。关键词：文本挖掘；Stanford Parser；文本预处理；依存关系；信息抽取