字符串相似性

来源:互联网 发布:wps mac版本 编辑:程序博客网 时间:2024/04/27 20:57
def levenshtein_distance(first, second):      """Find the Levenshtein distance between two strings."""      if len(first) > len(second):          first, second = second, first      if len(second) == 0:          return len(first)      first_length = len(first) + 1      second_length = len(second) + 1      distance_matrix = [range(second_length) for x in range(first_length)]      for i in range(1, first_length):          for j in range(1, second_length):              deletion = distance_matrix[i-1][j] + 1              insertion = distance_matrix[i][j-1] + 1              substitution = distance_matrix[i-1][j-1]              if first[i-1] != second[j-1]:                  substitution += 1              distance_matrix[i][j] = min(insertion, deletion, substitution)        return distance_matrix[first_length-1][second_length-1]  

同态规划,

1个字符串经过删除、替换、增加可得到另一个字符串,而越少这些动作说明越相似,

用c ( i , j )表示字符串 f 的子串 f [ : i ] ,s的子串 s [ : j ]要经过多少个步骤才相同,

我反向推导一下, c ( i , j ) 可能是 c ( i-1 , j ) 或者 c ( i , j-1 )增加一个字符,如果 f [ i ] == s [ j ] , c ( i-1 , j -1) ,如果不等,则替换 ,即 c ( i-1 , j-1 ) + 1


http://blog.csdn.net/dongle2001/article/details/1472235

原创粉丝点击