12、国外大学研究所收集sanger研究所chapter2.pdfVIP

  • 1
  • 0
  • 约9.5万字
  • 约 42页
  • 2021-05-26 发布于北京
  • 举报

12、国外大学研究所收集sanger研究所chapter2.pdf

Chapter 2 Enhanced Domain Detection Using Approaches From Speech Recognition Most modern speech recognition techniques use probabilistic models to interpret a sequence of sounds [Cha93, Jel97]. Hidden Markov models, in particular, are used to recognize words. The same techniques have been adapted to find domains in protein sequences of amino acids [KBM+94, DEKM98], as discussed in section 1.2. However in both cases, detection of individual constituent domains or words is impeded by noise. One technique which has been successfully used in speech recognition is to use language models to capture the information that certain word combinations are more likely than others, thus improving detection based on context. As discussed in section 1.1, only a limited set of all possible domain combinations are observed, and the pattern of occurrence is highly non-random ([AGT01b, AHT03]). Moreover, particular domain combinations are re-used in many domain architectures [VBB+04]. Thus, language models from speech recognition may also be applicable to the problem of protein domain identification. I have successfully used this approach to improve domain prediction in Pfam [CBD03]. Furthermore, different species have different protein domain repertoires, even to the extent that certain protein domain families are kingdom specific. More strikingly, domain combinations are highly kingdom specific ([AGT01b, VBB+04]). Thus, taxonomic context by 29 CHAPTER 2. ENHANCED DOMAIN DETECTION USING APPROACHES 30 FROM SPEECH RECOGNITION itself may also provide extra information for domain detection, and is likely to be even more useful when used in combination with language models of domain context. I have previously used taxonomic information to improve domain identificatio

文档评论(0)

1亿VIP精品文档

相关文档