BNFO 602Lecture 2.ppt

下载文档

6
0
约7.67千字
约 30页
2017-08-13 发布于甘肃
举报
版权申诉
保障服务

BNFO 602Lecture 2.ppt

1、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。。
2、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

BNFO 602Lecture 2 Usman Roshan Sequence Alignment Widely used in bioinformatics Proteins and genes are of different lengths due to error in sequencing and genetic variation across species Involves identifying evolutionary events: insertions, deletions, and substitutions Goal is to “align” sequences such that number of mutations is minimized DNA Sequence Evolution Sequence alignments They tell us about Function or activity of a new gene/protein Structure or shape of a new protein Location or preferred location of a protein Stability of a gene or protein Origin of a gene or protein Origin or phylogeny of an organelle Origin or phylogeny of an organism And more… Pairwise alignment X: ACA, Y: GACAT Match=8, mismatch=2, gap-5 ACA-- -ACA- --ACA ACA---- GACAT GACAT GACAT G--ACAT 8+2+2-5-5 -5+8+8+8-5 -5-5+2+2+2 2-5-5-5-5-5-5 Score = 2 14 -4 -28 Optimal alignment An alignment can be specified by the traceback matrix. How do we determine the traceback for the highest scoring alignment? Needleman-Wunsch algorithm for global alignment First proposed in 1970 Widely used in genomics/bioinformatics Dynamic programming algorithm Needleman-Wunsch Input: X = x1x2…xn, Y=y1y2…ym (X is seq2 and Y is seq1) Define V to be a two dimensional matrix with len(X)+1 rows and len(Y)+1 columns Let V[i][j] be the score of the optimal alignment of X1…i and Y1…j. Let m be the match cost, mm be mismatch, and g be the gap cost. Dynamic programming Initialization: for i = 1 to len(seq2) { V[i][0] = i*g; } For i = 1 to len(seq1) { V[0][i] = i*g; } Recurrence: for i = 1 to len(seq2){ for j = 1 to len(seq1){ V[i-1][j-1] + m(or mm) V[i][j] = max { V[i-1][j] + g V[i][j-1] + g if(maximum is V[i-1][j-1] + m(or mm)) then T[i][j] = ‘D’ else if (maximum is V[i-1][j] + g) then T[i][j] = ‘U’ else then T[i][j] = ‘L’ } } Example Input: seq2: ACA seq1: GACAT m = 5 mm = -4 gap = -20 seq2 is lined along the rows and seq2 is along the columns Affine gap penalties Affine g