当前位置:编程学习 > C/C++ >>

最大公共子序列 Longest Common Subsequence

问题描述:
        给定两个序列(C中可认为是数组或字符串,python中可认为是list),找到二者的最大公共子序列。子序列中元素相对顺序不变,且不一定连续,例如“abcdef”中,"abc","ace"都算作子序列,当然不难得出一个结论,一个长度为n的序列,子序列成分为2^n个(回想下排列组合)

递归解法:
      对于指数复杂度问题,往往不能一步到位(直接穷举当然是不能接受的),所以考虑是否能通过迂回的办法,尝试解决掉它的子问题。对于两个序列x,y,长度分别为n,m,可以发现x,y的LCS结果可以由它的三个子问题其中之一得出:
        1. LCS(x1...n-1,y1...m)
        2. LCS(x1...n,  y1...m-1)
        3. LCS(x1...n-1,y1...m-1) + 公共尾元素

PYTHON代码:

 def lcs_len(x, y): 
    """This function returns length of longest common sequence of x and y."""    
    if len(x) == 0 or len(y) == 0: 
        return 0 
     
    xx = x[:-1]   # xx = sequence x without its last element     
    yy = y[:-1] 
     
    if x[-1] == y[-1]:  # if last elements of x and y are equal 
        return lcs_len(xx, yy) + 1 
    else: 
        return max(lcs_len(xx, y), lcs_len(x, yy)) 

def lcs_len(x, y):
    """This function returns length of longest common sequence of x and y."""  
    if len(x) == 0 or len(y) == 0:
        return 0
   
    xx = x[:-1]   # xx = sequence x without its last element   
    yy = y[:-1]
   
    if x[-1] == y[-1]:  # if last elements of x and y are equal
        return lcs_len(xx, yy) + 1
    else:
        return max(lcs_len(xx, y), lcs_len(x, yy))


动态规划解法 O(n^2):
        显然,递归操作引入了很多重复计算。动态规划正好能解决这一问题,它的一个英文解释非常到位:whenever the results of subproblems are needed, they have already been computed, and can simply be looked up in a table。即所有子问题的计算都能由查表来完成!先来看下代码:

 

 def lcs(x, y): 
    n = len(x) 
    m = len(y) 
    table = dict()  # a hashtable, but we'll use it as a 2D array here 
     
    for i in range(n+1):     # i=0,1,...,n 
        for j in range(m+1):  # j=0,1,...,m 
            if i == 0 or j == 0: 
                table[i, j] = 0 
            elif x[i-1] == y[j-1]: 
                table[i, j] = table[i-1, j-1] + 1 
            else: 
                table[i, j] = max(table[i-1, j], table[i, j-1]) 
                 
                # Now, table[n, m] is the length of LCS of x and y.                 
                # Let's go one step further and reconstruct 
                # the actual sequence from DP table: 
                 
    def recon(i, j): 
        if i == 0 or j == 0: 
            return "" 
        elif x[i-1] == y[j-1]: 
            return recon(i-1, j-1) + str(x[i-1]) 
        elif table[i-1, j] > table[i, j-1]:  
            return recon(i-1, j) 
        else: 
            return recon(i, j-1) 
         
    return recon(n, m) 

def lcs(x, y):
    n = len(x)
    m = len(y)
    table = dict()  # a hashtable, but we'll use it as a 2D array here
   
    for i in range(n+1):     # i=0,1,...,n
        for j in range(m+1):  # j=0,1,...,m
            if i == 0 or j == 0:
                table[i, j] = 0
            elif x[i-1] == y[j-1]:
                table[i, j] = table[i-1, j-1] + 1
            else:
                table[i, j] = max(table[i-1, j], table[i, j-1])
               
                # Now, table[n, m] is the length of LCS of x and y.               
                # Let's go one step further and reconstruct
                # the actual sequence from DP table:
               
    def recon(i, j):
        if i == 0 or j == 0:
            return ""
        elif x[i-1] == y[j-1]:
            return recon(i-1, j-1) + str(x[i-1])
        elif table[i-1, j] > table[i, j-1]:
            return recon(i-1, j)
        else:
            return recon(i, j-1)
       
    return recon(n, m)

        代码中用到了一个2D的表,table(i,j)则表示子问题(i,j)的LCS_LEN,经过分析,它的值只可能是table(i-1,j-1),table(i,j-1),table(i-1,j)之一,所以从上到下,从左到右的赋值方式不会出现table(i,j)无法赋值的情况; 当然求得LCS_LEN并不是我们的最终目的,特别是在应用中,一般都需要得到这个LCS,那么可以通过table来求得结果(见代码)。
动态规划解法——优化 O(NlogN):


        查了一些资料,貌似是有nlogn的解法,但需要输入满足一定条件,由于不具有普遍性,所以姑且认为研究它的意义不大,如果后面碰到,我们再做分析吧

 

 

补充:软件开发 , C++ ,
CopyRight © 2022 站长资源库 编程知识问答 zzzyk.com All Rights Reserved
部分文章来自网络,