最大公共子序列 Longest Common Subsequence
问题描述:
给定两个序列(C中可认为是数组或字符串,python中可认为是list),找到二者的最大公共子序列。子序列中元素相对顺序不变,且不一定连续,例如“abcdef”中,"abc","ace"都算作子序列,当然不难得出一个结论,一个长度为n的序列,子序列成分为2^n个(回想下排列组合)
递归解法:
对于指数复杂度问题,往往不能一步到位(直接穷举当然是不能接受的),所以考虑是否能通过迂回的办法,尝试解决掉它的子问题。对于两个序列x,y,长度分别为n,m,可以发现x,y的LCS结果可以由它的三个子问题其中之一得出:
1. LCS(x1...n-1,y1...m)
2. LCS(x1...n, y1...m-1)
3. LCS(x1...n-1,y1...m-1) + 公共尾元素
PYTHON代码:
def lcs_len(x, y): """This function returns length of longest common sequence of x and y.""" if len(x) == 0 or len(y) == 0: return 0 xx = x[:-1] # xx = sequence x without its last element yy = y[:-1] if x[-1] == y[-1]: # if last elements of x and y are equal return lcs_len(xx, yy) + 1 else: return max(lcs_len(xx, y), lcs_len(x, yy)) def lcs_len(x, y): """This function returns length of longest common sequence of x and y.""" if len(x) == 0 or len(y) == 0: return 0 xx = x[:-1] # xx = sequence x without its last element yy = y[:-1] if x[-1] == y[-1]: # if last elements of x and y are equal return lcs_len(xx, yy) + 1 else: return max(lcs_len(xx, y), lcs_len(x, yy))
动态规划解法 O(n^2):
显然,递归操作引入了很多重复计算。动态规划正好能解决这一问题,它的一个英文解释非常到位:whenever the results of subproblems are needed, they have already been computed, and can simply be looked up in a table。即所有子问题的计算都能由查表来完成!先来看下代码:
def lcs(x, y): n = len(x) m = len(y) table = dict() # a hashtable, but we'll use it as a 2D array here for i in range(n+1): # i=0,1,...,n for j in range(m+1): # j=0,1,...,m if i == 0 or j == 0: table[i, j] = 0 elif x[i-1] == y[j-1]: table[i, j] = table[i-1, j-1] + 1 else: table[i, j] = max(table[i-1, j], table[i, j-1]) # Now, table[n, m] is the length of LCS of x and y. # Let's go one step further and reconstruct # the actual sequence from DP table: def recon(i, j): if i == 0 or j == 0: return "" elif x[i-1] == y[j-1]: return recon(i-1, j-1) + str(x[i-1]) elif table[i-1, j] > table[i, j-1]: return recon(i-1, j) else: return recon(i, j-1) return recon(n, m) def lcs(x, y): n = len(x) m = len(y) table = dict() # a hashtable, but we'll use it as a 2D array here for i in range(n+1): # i=0,1,...,n for j in range(m+1): # j=0,1,...,m if i == 0 or j == 0: table[i, j] = 0 elif x[i-1] == y[j-1]: table[i, j] = table[i-1, j-1] + 1 else: table[i, j] = max(table[i-1, j], table[i, j-1]) # Now, table[n, m] is the length of LCS of x and y. # Let's go one step further and reconstruct # the actual sequence from DP table: def recon(i, j): if i == 0 or j == 0: return "" elif x[i-1] == y[j-1]: return recon(i-1, j-1) + str(x[i-1]) elif table[i-1, j] > table[i, j-1]: return recon(i-1, j) else: return recon(i, j-1) return recon(n, m)
代码中用到了一个2D的表,table(i,j)则表示子问题(i,j)的LCS_LEN,经过分析,它的值只可能是table(i-1,j-1),table(i,j-1),table(i-1,j)之一,所以从上到下,从左到右的赋值方式不会出现table(i,j)无法赋值的情况; 当然求得LCS_LEN并不是我们的最终目的,特别是在应用中,一般都需要得到这个LCS,那么可以通过table来求得结果(见代码)。
动态规划解法——优化 O(NlogN):
查了一些资料,貌似是有nlogn的解法,但需要输入满足一定条件,由于不具有普遍性,所以姑且认为研究它的意义不大,如果后面碰到,我们再做分析吧
补充:软件开发 , C++ ,