12、国外大学研究所收集sanger研究所chapter2.pdfVIP

下载本文档

1
0
约9.5万字
约 42页
2021-05-26 发布于北京
举报

12、国外大学研究所收集sanger研究所chapter2.pdf

Chapter 2 Enhanced Domain Detection Using Approaches From Speech Recognition Most modern speech recognition techniques use probabilistic models to interpret a sequence of sounds [Cha93, Jel97]. Hidden Markov models, in particular, are used to recognize words. The same techniques have been adapted to ﬁnd domains in protein sequences of amino acids [KBM+94, DEKM98], as discussed in section 1.2. However in both cases, detection of individual constituent domains or words is impeded by noise. One technique which has been successfully used in speech recognition is to use language models to capture the information that certain word combinations are more likely than others, thus improving detection based on context. As discussed in section 1.1, only a limited set of all possible domain combinations are observed, and the pattern of occurrence is highly non-random ([AGT01b, AHT03]). Moreover, particular domain combinations are re-used in many domain architectures [VBB+04]. Thus, language models from speech recognition may also be applicable to the problem of protein domain identiﬁcation. I have successfully used this approach to improve domain prediction in Pfam [CBD03]. Furthermore, diﬀerent species have diﬀerent protein domain repertoires, even to the extent that certain protein domain families are kingdom speciﬁc. More strikingly, domain combinations are highly kingdom speciﬁc ([AGT01b, VBB+04]). Thus, taxonomic context by 29 CHAPTER 2. ENHANCED DOMAIN DETECTION USING APPROACHES 30 FROM SPEECH RECOGNITION itself may also provide extra information for domain detection, and is likely to be even more useful when used in combination with language models of domain context. I have previously used taxonomic information to improve domain identiﬁcatio

您可能关注的文档

文档评论（0）

1亿VIP精品文档

更多 >

12、国外大学研究所收集sanger研究所chapter2.pdfVIP