Smith-Waterman (SW) algorithm

来源：互联网发布：买家怎样申请淘宝介入编辑：程序博客网时间：2024/04/29 06:28

Smith-Waterman (SW) algorithm

1. 什么是Smith-Waterman (SW) algorithm

The Smith-Waterman (SW) algorithm is in essence a derivation of the Needleman-Wunsch (NW) algorithm in which penalties are assigned to mismatched pairs, insertions and deletions. Assigning penalties to mismatches and gaps focuses the scope of the algorithm. Rather than lining up entire sequences, the algorithm is able to examine all subsequences found in the two sequences, and return only the highest scoring subsequence alignment(s) found.

上面是Smith-Waterman (SW) algorithm 算法的定义，本文以最简单的一个实际的例子来说明 Smith-Waterman (SW) algorithm 打分矩阵是怎么算的，回溯的过程是怎么回溯的。

2.Smith-Waterman (SW) algorithm 的主要两步操作

1.计算打分矩阵
2.打分矩阵的回溯，计算出最相似的字符串部分

3.举例说明

A = “CGATCGATCGATATAGTG”
B = “TAGCTAGATCCGAGAT”

构成矩阵

现在要做的事就是这个打分矩阵是怎么计算出来的
In the SW system, the scoring of a cell depends on a variety of user specified weights. These weights are for matches, mismatches, gaps, and gap extensions. By manipulating the different weights, the outcome of an alignment can be drastically altered. For example if great a weight is assigned to the mismatch score, and a lesser weight is assigned to gap penalties, the resulting alignment would contain no mismatches and a large number of gaps. Conversely, maximizing gap penalties and minimizing mismatch penalties can result in alignments containing a greater number of mismatches and a small number of gaps.

我们现在来看上图中？的位置的值怎么计算？
In the subsequent scoring of A and B, the following weights were used:
Match score = 10
Mismatch score = -5
Gap penalty = 10
Gap extension penalty = 8
我们发现？位置的 T（竖坐标）、G（横坐标）,T!=G ,所以我们的Match score=-5

分这三个块分别计算其中的值，然后取最大的值作为“？”处的值
In scoring cell M13,13 (labeled with a ”?”) the maximum score as it is derived from the equation in figure 10 is implemented. The equation in Figure 10 reveals that the possible scores for this cell are: 22 (diagonal score + mismatch score: (27-5)), 22 (greatest column gap score: 40-(10+(8*1)), 11 (greatest row gap score: 45-(10+(8*3)), and 0. As it is the largest of these scores, a 22 is entered into the cell. Scoring proceeds to the right and down.

4.打分矩阵回溯

怎么回溯？

1.从最大的值得位置开始回溯，整个打分矩阵最大的值是67，所以从67的位置“左上方”回溯，那为什么就回溯到57了？，我们先定义67的坐标为（x,y）,那么67要回溯就得与(x-1,y-1)这个点同行，同列范围查找（图中红色的方框），找到最大的值作为，67的回溯位置，这样就到了57的位置，然后，以57的位置继续回溯，直到碰到第一个值为零的位置停止。
2.再来练习一个值得回溯

A’ G - - A T C G A T C G - A T A T
B’ G C T A - - G A T C C G A G A T

这样读从10到15 水平上要走3步 G C T A
但是竖直的方向只能走一步所有需要等两步才到 A
所以是
G C T A
G 等等 A
其他情况依次类推

0 0