Wildcard Matching

来源：互联网发布：蜗杆数控车床编程事例编辑：程序博客网时间：2024/05/16 05:09

题目

Implement wildcard pattern matching with support for '?' and '*'.

'?' Matches any single character.'*' Matches any sequence of characters (including the empty sequence).The matching should cover the entire input string (not partial).The function prototype should be:bool isMatch(const char *s, const char *p)Some examples:isMatch("aa","a") ? falseisMatch("aa","aa") ? trueisMatch("aaa","aa") ? falseisMatch("aa", "*") ? trueisMatch("aa", "a*") ? trueisMatch("ab", "?*") ? trueisMatch("aab", "c*a*b") ? false

思路一

递归，大数据超时。

class Solution {public:    bool isMatch(const char *s, const char *p) {        // Start typing your C/C++ solution below        // DO NOT write int main() function        if(*p=='\0')            return *s=='\0';            if(*p!='*') {            if(*s==*p)                return isMatch(s+1,p+1);            if(*p=='?' && *s!='\0')                return isMatch(s+1,p+1);            return false;        }        while(*s!='\0') {            if(isMatch(s,p+1))                 return true ;            s++;        }        return isMatch(s,p+1);            }};

思路二

动态规划 DP Think Process:

The recursive method is intuitive and gives great insight of the matching process. If we neglect the boundary cases for a moment, there are three basic cases when we try to match the characters ins andp:

No match. Simply return false. This is the last case in the program.
Single match. Either *s == *p, or *p == '?'. The return value of this case depends on the result of the rest parts of boths andp (recursion call), which start at the 'diagonal' position by advancing boths andp by 1 (++s, ++p)
Star case, i.e. when *p == '*'. This is where the complication comes in. A star can match 0, 1, 2, ..., to the end of strings. So, as long as one of the substrings match (recursion call), after advance over'*', it returns true. This case returns false only after exhausting all possible substrings without a match.

After we have some sense on the dependencies of each step, learned from the recursive function calls, we can set up our dynamic programming frame. For examples = "abcdef" andp = "a?c*f"

matrix setup

The strings are indexed 1 for convenience. Now let's directly apply the rules learned from the recursion method:

The arrow means "depends on" or "recursion call". The cells without a match can be pre-filled withFALSE's. The tail cell '\0' '\0' is markedTRUE.

recursion process

We eventually want to know cell(0,0), but we have to know cell(1)first;

s[1] == s[2] gives case 2, so cell(1) depends on cell(2);
p[2] == '?' gives case 2, so cell(2) depends on cell(3);
s[3] == p[3] gives case 2, so cell(3) depends on cell(4);
p[4] == '*' gives case 3, so cell(4) depends on all the crimson shaded cells. As long as one of the shaded cells is TRUE, Cell(4) is TRUE. Cell(4) depends the right cells because it can match 0 characters.

...

p[5] == s[6] gives case 2, so cell(5) depends on the tail '\0','\0' case, which is TRUE. So cell(5) = TRUE.

Then we trackback, just as the recursive functions. cell(0) = cell(1) = cell(2) = cell(3) = cell(4) = cell(5) =TRUE. At last the function returnsTRUE.

And then we do the really dynamic programming. Note that the problem is symmetric, which means you can match the strings from left to right, or from right to left, they are identical. In the recursion method, the actual result propagates from the bottom right corner to the up left corner. In dynamic programming, we want to start with row one, so we can flip the whole dependency graph. Again the arrows mean dependencies, or get value from.

DP process

All the non-matching cells are pre-filled with FALSE's. The only initialTRUE is at cell(0), which is also the case when you match two NULL strings. So now you just need a matrix size(s)*size(p), and fill the cells row by row according to the three rules:

No matching: fill FALSE;
Matching, or '?': copy the value from previous diagonal cell
'*': Look up all cells to the left, and look up the cells to the left of previous row, and the cell directly above ---- if there is at least oneTRUE, fillTRUE; otherwise fillFALSE

Finally return the value of the last cell.

There are some more tricks in practice. Firstly, successive '*' is equivalent to a single'*', so we may suppress them together. After doing this, the number of'*'s is at most size(p)/2. So the worst run time is O(m*n + m^2), where m=size(p) and n=size(s).

Also consider that after removing all '*'s, size(p) <= size(s), which means m is at most 2n for the worst case, so that m = O(n). Thus the worst run time is O(m*n).

Secondly, the matrix is updated row by row, and even the '*' case requires two latest rows. So it is possible to have a space efficient way solve the matching problem by using two size(s) arrays.

代码实现：

class Solution {public:    bool isMatch(const char *s, const char *p) {        // Start typing your C/C++ solution below        // DO NOT write int main() function        int lens = strlen(s);        int lenp = strlen(p);        vector<vector<bool> >  result(lens, vector<bool>(lenp,false));        result[lens][lenp] = true;  // s="" and p="", return true        string S(s);        string P(p);        for(int i=lenp-1; i>=0; i--) {            if(P[i]=='*' && result[lens][i+1])                result[lens][i]=true;            else                result[lens][i]=false;                    }        for(int i=lenp-1;i>=0;i--)            for(int j=lens-1;j>=0;j--) {                if(S[j]==P[i] || P[i]=='?')                    result[j][i]=result[j+1][i+1];                if(P[i]=='*') {                    for(int k=j;k<=lens;k++) {                        if(result[k][i+1]) {                            result[j][i] = true;                            break;                        }                        result[j][i]=false;                    }                                    }                            }                   return result[0][0];            }};

Wildcard matching algorithms