Qnet-BSTM 一个转录因子结合位点文本挖掘算法.doc

下载文档 降价啦

3
0
约1.44万字
约 7页
2018-02-23 发布于江西
举报
版权申诉
保障服务

Qnet-BSTM 一个转录因子结合位点文本挖掘算法.doc

1、本文档共7页，可阅读全部内容。
2、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。
3、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

Qnet-BSTM 一个转录因子结合位点文本挖掘算法

Qnet-BSTM: 一个转录因子结合位点文本挖掘算法杨青郑广勇熊赟朱扬勇 (复旦大学计算机与信息技术系，上海 200433) (062021131@) 摘要转录调控是后基因组时代研究的热点之一，转录因子结合位点（或顺式调控元件）是一类非常重要的功能元素。构建转录因子结合位点数据库是转录调控中的重要工作，从日益增长的相关文献中挖掘转录因子结合位点是构建转录因子结合位点数据库的重要途径。在借鉴QA（Question Answering）问答系统的基础上，提出以“问题网”Qnet（Question Net）为核心概念的转录因子结合位点文本挖掘算法Qnet-BSTM（Qnet transcription factor Binding Site Text Mining），通过训练经过人工标注的文献数据构造系统模型，然后基于此模型利用QA系统方法对文献全文进行转录因子结合位点挖掘。实验结果表明，Qnet-BSTM算法查全率和查准率分别达到79%和72%以上。关键词转录因子结合位点，文本挖掘，问题网，QA系统，生物信息学中图法分类号 TP391 Qnet-BSTM: An Algorithm for Mining Transcription Factor Binding Site from Literature Yang Qing, Zheng Guangyong, Xiong Yun and Zhu Yangyong (Department of Computing and Information technology, FuDan University, Shanghai 200433) Abstract Transcription regulation is one of the most significant research fields in post genome era, where transcription factor binding site (TFBS) is regarded as a crucial functional element and is concerned widely. Currently, it is an important work to collect and extract information of TFBS from published articles and build binding site database. Here a text mining algorithm named Qnet-BSTM was presented for TFBS information extraction, which was build from the “Question Answering” (QA) system and focused on the concept of Question net. In our work, the system model of Qnet-BSTM algorithm was constructed from literature data read by biological experts, and then the model was employed to handle articles depicted TFBS. Results of our work showed that the Qnet-BSTM algorithm’s recall and precision achieved 79% and 72% respectively. Keywords transcription factor binding site, text mining, question net, QA system, bioinformatics 引言随着转录调控领域的相关数据的增加，国际上已出现一些有关调控区、调控单元和转录因子结合位点的数据库，比较著名的有TRANSFAC[1]、TRRD[2]、Jaspar[3]等，然而大量转录因子结合位点信息主要以文献的形式存在于各文献数据库中，比较大的生物文献数据库有PubMed[4]，MedLine[5]等。转录因子结合位点是一种特殊的基因片段（模式），目前即使是著名的TRANSFAC、TRRD、Jaspar中转录因子结合位点信息也相对较少，人工分析这些已知的结合位点所在的文献构建转录因子结合位点规则库比较困难。目前的基因或蛋白质实体识别算法[6][7]虽然能在基因识别上取得较高的查全率和查准率，但是针对转录因子结合位点这种特殊的基因片段（模式）的算法很少。使用现有的基因