- 5
- 0
- 约1.1万字
- 约 25页
- 2016-12-08 发布于江苏
- 举报
Center for Bioinformation Technology (CBIT) A Probabilistic Learning Approach to Whole Genome Operon Prediction Mark Craven, David Page, Jude Shavlik Joseph Bockhorst, Jeremy Glasner RECOMB 00? Talker: Jinsan Yang Abstract Present a computational approach to predict operons in the genomes of prokaryotic organisms. Machine learning methods to induce predictive models from sequence data, gene expression data, functional annotations of genes. Use of multiple models to predict promoters, terminators, operons. Use of dynamic programming method to map every known and putative genes to the most probable operon. Data analysed: E. Coli K-12 genome. Introduction Approach: combining following two steps to predict an operon map for an entire genome. First step: model to estimate the probability that an arbitrary sequence of genes constitute an operon. Second step: dynamic programming method to assign every gene in the given genome to its most probable operon. Multilevel-learning approach Problem Domain (1) Primary task: predict operons in the E. coli genome. E. coli genome: sequenced at the U. of Wisconsin (Blattner et al. 1997), consists of a single circular chromosome of double-stranded DNA, 4,639,221 base pairs, 4,400 genes. Operon: a sequence of one or more genes that are transcribed as a unit. Problem Domain (2) Available data: Complete DNA sequence of the gene Beginning and ending positions of 3,033 genes and 1,372 putative genes. Positions and sequences of 438 known promoters, 289 terminators. Functional annotation codes characterizing 1,668 genes. (3 level, 123-leaf hierarchy) Gene expression data for the activity levels of 4,097 genes /putative genes for 39 experiments. 365 known operons It is estimated that there are several hundred undiscovered operons in E. coli. Generation of negative examples (non-operons) by the fact that most operons do not overlap. Machine Learning Approach For a candidate gene sequence, the probability that the given sequence is an o
您可能关注的文档
- CardiopulmonaryAnatomyandPhysiology课程.ppt
- ing形式作状语要点盘点课件课程.ppt
- sn与an的关系课程.ppt
- 数控(NumericalControl)课程.ppt
- DefiningIPvAddressing课程.ppt
- LinearProgramming课程.ppt
- Mathcad2001数学运算向量和矩阵课程.ppt
- 感压胶与3M胶系课程.ppt
- 高一物理力的合成4课程.ppt
- 如何说话孩子才会听课程.ppt
- 2026年中国冷冻剂密封剂数据监测研究报告.docx
- 安徽大学《人机交互的软件工程方法实验》2022-2023学年第一学期期末试卷.doc
- 北京理工大学《传感器与检测技术》2022-2023学年第一学期期末试卷.doc
- 2026年中国冗余电源模块数据监测研究报告.docx
- 2025年杭州市西湖区初三9月大联考语文试题含解析.doc
- 延安职业技术学院《数字化服装款式设计》2023-2024学年第一学期期末试卷.doc
- 2025年福建省福州市台江区福州华伦中学初三下学期四调考试语文试题理试题含解析.doc
- 内蒙古美术职业学院《虚拟现实应用设计》2023-2024学年第一学期期末试卷.doc
- 安阳工学院《电路与模电》2023-2024学年第一学期期末试卷.doc
- 2024-2025学年广东省莞市东华中学九上化学期末达标检测模拟试题含解析.doc
原创力文档

文档评论(0)