- 9
- 0
- 约2.34万字
- 约 91页
- 2018-06-27 发布于福建
- 举报
生物信息学中离散数学方法
Visualization of Real DNA DataAndA Few Nice Math Therefrom Bailin Hao (郝柏林) T-Life Research Center, Fudan University Institute of Theoretical Physics, Academia Sinica Santa Fe Institute /~hao/ / Not much biology A little statistics: Poisson distribution, Markovian prediction A little discrete mathematics: combinatorics, graph theory, formal language theory Everything from real biological data Biology Produces Huge Amount of Data Genomic sequences and complete genomes Protein sequences Gene expression data, microarray, gene regulation, protein interaction and network, metabolic network, immune network We will only look at DNA and protein sequences Biological Symbolic Sequences DNA (RNA): 1D, directed, unbranching heteropolymers made of 4 kinds of bases (nucleotides). Length: 104 to 108 nt Proteins: 1D, directed, unbranching heteropolymers made of 20 kinds of amino acids. Length: 102 to 104 AA Genome Projects World Wide(29 June 2008) Published:827(700 prokaryotes) On-going prokaryotes:90A + 1842B On-going eukaryotes: 936 On-going metagenomes:130 Total:3825 GenBank Rel. 166(15 June 2008) Sequences: 88 554 578 Nucleotides: 92 008 611 867 Average length: 1039 (almost the same in the last 8 years) Necessity to Look at Real Data Theorem 3 in Shannon’s seminal 1948 paper that laid the foundation of modern information theory: an intuitive explanation. Among 4N possible sequences of length N made of 4 kinds of letters: a huge subset of typical sequences and a tiny subset of atypical sequences. Biological sequences resulted from billions years of evolution and natural selection. They must belong to the atypical subset. They should be studied almost individually. A Simple-Minded Approach:Counting the Number of K-strings Different names: K-words, K-grams, K-strings, K-tuples The E. coli (strain K12) genome: a loop made of 4639475 letters Take K=8, there are 48=65536 string types. Do they all appear? If random, each string typ
您可能关注的文档
- 物流系统基本概念.ppt
- 物流行业管理咨询策划研究方案 物流配送车辆优化调度一种神经网络算法.doc
- 物流行业管理咨询策划研究方案 外包管理在物流中运用.ppt
- 物流行业管理咨询策划研究方案 现代物流管理基础物流基本内涵和发展阶段.ppt
- 物流行业管理咨询策划研究方案 连锁便利店物流营运管理.ppt
- 物流行业管理咨询策划研究方案 来自ISL编码规则.doc
- 物流行业管理咨询策划研究方案 电子商务在中国现状研究——对电子物流分析.doc
- 物流行业管理咨询策划研究方案 现代物流管理基础3物流活动基本范围.ppt
- 物流行业管理咨询策划研究方案 第三方物流在流通领域中价值体现.ppt
- 物流行业管理咨询策划研究方案报告 物流供应链 0105077汽车制造企业实施电子商务模式及对策.doc
最近下载
- (人教版2026新教材)数学二年级下册新教材解读课件.pptx
- 松下sj-mr220中文使用说明书.pdf VIP
- 融优学堂明式家具赏析(中国美术学院)章节测验答案.docx
- 2025年铁道统计公报 .pdf VIP
- 北汽新能源EU5维修手册OBC.pptx VIP
- ISO10292-1994建筑玻璃.多层玻璃稳态U值(热透过率)的计算.PDF VIP
- 北汽新能源EU5维修手册-电路图.pdf VIP
- TCNEA-核电工程班组建设评价指南及编制说明.pdf VIP
- 基层行低利率环境对金融增加值的影响分析.pdf VIP
- 2025-2026学年小学音乐鲁教版五四学制2024一年级下册-鲁教版(五四学制)(2024)教学设计合集.docx
原创力文档

文档评论(0)