defining reference sequences for nocardia species by similarity and clustering analyses of 16s rrna gene sequence data定义参考序列诺卡氏菌属物种相似性和16 s rrna基因序列数据的聚类分析.pdfVIP

  • 1
  • 0
  • 约7.42万字
  • 约 11页
  • 2017-09-01 发布于上海
  • 举报

defining reference sequences for nocardia species by similarity and clustering analyses of 16s rrna gene sequence data定义参考序列诺卡氏菌属物种相似性和16 s rrna基因序列数据的聚类分析.pdf

defining reference sequences for nocardia species by similarity and clustering analyses of 16s rrna gene sequence data定义参考序列诺卡氏菌属物种相似性和16 s rrna基因序列数据的聚类分析

Defining Reference Sequences for Nocardia Species by Similarity and Clustering Analyses of 16S rRNA Gene Sequence Data 1,2 2 1,2 3 4 Manal Helal , Fanrong Kong , Sharon C. A. Chen , Michael Bain , Richard Christen , Vitali Sintchenko1,2* 1 Sydney Medical School, The University of Sydney, Sydney, New South Wales, Australia, 2 Centre for Infectious Diseases and Microbiology, Westmead Hospital, Sydney West Area Health Service, Sydney, New South Wales, Australia, 3 School of Computer Science and Engineering, University of New South Wales, Sydney, New South Wales, Australia, 4 University of Nice Sophia-Antipolis, and CNRS UMR6543, Parc Valrose, Centre de Biochimie, Nice, France Abstract Background: The intra- and inter-species genetic diversity of bacteria and the absence of ‘reference’, or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia. Methods: A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear mapping (LM) of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization.

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档