UsingBayesianNetworkstoAnalyzeExpressionData课程.pptVIP

  • 5
  • 0
  • 约1.1万字
  • 约 25页
  • 2016-12-08 发布于江苏
  • 举报

UsingBayesianNetworkstoAnalyzeExpressionData课程.ppt

Center for Bioinformation Technology (CBIT) A Probabilistic Learning Approach to Whole Genome Operon Prediction Mark Craven, David Page, Jude Shavlik Joseph Bockhorst, Jeremy Glasner RECOMB 00? Talker: Jinsan Yang Abstract Present a computational approach to predict operons in the genomes of prokaryotic organisms. Machine learning methods to induce predictive models from sequence data, gene expression data, functional annotations of genes. Use of multiple models to predict promoters, terminators, operons. Use of dynamic programming method to map every known and putative genes to the most probable operon. Data analysed: E. Coli K-12 genome. Introduction Approach: combining following two steps to predict an operon map for an entire genome. First step: model to estimate the probability that an arbitrary sequence of genes constitute an operon. Second step: dynamic programming method to assign every gene in the given genome to its most probable operon. Multilevel-learning approach Problem Domain (1) Primary task: predict operons in the E. coli genome. E. coli genome: sequenced at the U. of Wisconsin (Blattner et al. 1997), consists of a single circular chromosome of double-stranded DNA, 4,639,221 base pairs, 4,400 genes. Operon: a sequence of one or more genes that are transcribed as a unit. Problem Domain (2) Available data: Complete DNA sequence of the gene Beginning and ending positions of 3,033 genes and 1,372 putative genes. Positions and sequences of 438 known promoters, 289 terminators. Functional annotation codes characterizing 1,668 genes. (3 level, 123-leaf hierarchy) Gene expression data for the activity levels of 4,097 genes /putative genes for 39 experiments. 365 known operons It is estimated that there are several hundred undiscovered operons in E. coli. Generation of negative examples (non-operons) by the fact that most operons do not overlap. Machine Learning Approach For a candidate gene sequence, the probability that the given sequence is an o

文档评论(0)

1亿VIP精品文档

相关文档