网站大量收购独家精品文档,联系QQ:2885784924

A probabilistic measure for alignment-free sequence comparison.pdf

A probabilistic measure for alignment-free sequence comparison.pdf

  1. 1、本文档共7页,可阅读全部内容。
  2. 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
  3. 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载
  4. 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
A probabilistic measure for alignment-free sequence comparison

BIOINFORMATICS Vol. 20 no. 18 2004, pages 3455–3461 doi:10.1093/bioinformatics/bth426 A probabilistic measure for alignment-free sequence comparison Tuan D. Pham1,? and Johannes Zuegg2 1School of Computing and Information Technology, Griffith University, Nathan Campus, QLD 4111, Australia and 2Alchemia Ltd, PO Box 6242, Upper Mount Gravatt, QLD 4122, Australia Received on March 1, 2004; revised on June 28, 2004; accepted on July 26, 2004 Advance Access publication July 22, 2004 ABSTRACT Motivation: Alignment-free sequence comparison methods are still in the early stages of development compared to those of alignment-based sequence analysis. In this paper, we introduce a probabilistic measure of similarity between two bio- logical sequences without alignment. The method is based on the concept of comparing the similarity/dissimilarity between two constructed Markov models. Results: The method was tested against six DNA sequences, which are the thrA, thrB and thrC genes of the threonine oper- ons from Escherichia coli K-12 and from Shigella flexneri ; and one random sequence having the same base composi- tion as thrA from E.coli. These results were compared with those obtained from CLUSTAL W algorithm (alignment-based) and the chaos game representation (alignment-free). The method was further tested against a more complex set of 40 DNA sequences and compared with other existing sequence similarity measures (alignment-free). Availability: All datasets and computer codes written in MATLAB are available upon request from the first author. Contact: t.pham@.au INTRODUCTION There have been a number of computational and statistical methods for the comparison of biological sequences developed over the past decade. It still remains a challen- ging problem for the research community of computational biology (Ewens and Grant, 2001; Miller, 2001). Two dis- tinct bioinformatic methodologies for studying the similarity/ dissimilarity of sequences are known as alignment-based and ali

文档评论(0)

l215322 + 关注
实名认证
内容提供者

该用户很懒,什么也没介绍

1亿VIP精品文档

相关文档