LC10. Regular Expression Matching

Implement regular expression matching with support for '.' and '*'.

Posted by freeCookie🍪 on December 28, 2016

Regular Expression Matching

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true

正则表达式的实现,自己的初步想法是利用迭代,每次遇到’.’ / ‘*’ 就分情况讨论,如果匹配到了就进行下一步匹配,利用指针记录匹配到的位置,如果匹配到最后一个就返回true。

public class Solution {
    public boolean isMatch(String s, String p) {
        char[] sarr = s.toCharArray();
        char[] parr = p.toCharArray();
        // empty match
        if(parr.length == 0)    return sarr.length == 0;
        // next char is '*' or only one char
        int i = 0, j = 0;
        while(j < parr.length){
            if(j == parr.length - 1 || parr[j+1] != '*'){
                if(i >= sarr.length || (parr[j] != '.' && parr[j] != sarr[i])) return false;
                else{
                    i++;
                    j++;
                }
            }else{
                int m = i;
                while(i < sarr.length && (sarr[i] == parr[j] || parr[j] == '.')){
                    if(i == sarr.length-1 && j == parr.length-2)    return true;
                    i++;
                    if(i == sarr.length){
                        System.out.println(m);
                        int jj = j + 2;
                        while(jj < parr.length){
                            if(jj < parr.length-1 && parr[jj+1] == '*') jj += 2;
                            else if(parr[jj] == parr[j] || parr[jj] == '.') jj++;
                            else    return false;
                        }
                        return jj-j-2 <= sarr.length-m;
                    }
                }
             j+=2;
             } // else
        }
        return i == sarr.length;
    } // isMatch
}

但是只有402/445的test case能通过。当遇到’.*’ 时匹配的数量在不同corner case的情况并没有处理好然后就放弃了

Recursion

用discuss里的迭代方法就可以巧妙的避免’.*‘的各种情况讨论了。

public class Solution {
    public boolean isMatch(String s, String p) {
        char[] sarr = s.toCharArray();
        char[] parr = p.toCharArray();
        // empty match
        if(parr.length == 0)    return sarr.length == 0;
        // next char is '*' or only one char
        if(parr.length == 1 || parr[1] != '*'){
            if(sarr.length < 1 || (parr[0] != '.' && parr[0] != sarr[0]))   return false;
            else return isMatch(s.substring(1), p.substring(1));
        }
        // next char is '*'
        int i = 0;
        while(i < sarr.length && (sarr[i] == parr[0] || parr[0] == '.')){
            if(isMatch(s.substring(i), p.substring(2)))   return true;
            i++;
        }
        return isMatch(s.substring(i), p.substring(2)); 
    } // isMatch
}

但是操作String的速度应该是比较慢的。O(n^2), O(n)

这个问题也可以利用DP来解,当前字符能否匹配根据不同的匹配情况是和之前的字符匹配相关联的。O(n^2), O(n^2)

DP solution

public class Solution {
    public boolean isMatch(String s, String p) {
    // dp solution -> res[i][j] means s[0-i] matches p[0-j]
    char[] sarr = s.toCharArray();
    char[] parr = p.toCharArray();
    boolean[][] dp = new boolean[sarr.length+1][parr.length+1];
    dp[0][0] = true;
    for (int i = 0; i < p.length(); i++) {
        if (p.charAt(i) == '*' && dp[0][i-1]) {
            dp[0][i+1] = true;
        }
    }
    for (int i = 0 ; i < sarr.length; i++) {
        for (int j = 0; j < parr.length; j++) {
            if (parr[j] == '.' || parr[j] == sarr[i]) {
                dp[i+1][j+1] = dp[i][j];
            }else if (parr[j] == '*') {
                if (parr[j-1] != sarr[i] && parr[j-1] != '.') {
                    dp[i+1][j+1] = dp[i+1][j-1];
                } else {
                    dp[i+1][j+1] = (dp[i+1][j] || dp[i][j+1] || dp[i+1][j-1]);
                }
            }
        }
    }
    return dp[sarr.length][parr.length];
    } // isMatch
}

也可以减少对空间的使用。

DP O(n) space

public boolean isMatch(String s, String p) {
		/**
		 * This solution is assuming s has no regular expressions.
		 * 
		 * dp: res[i][j]=is s[0,...,i-1] matched with p[0,...,j-1];
		 * 
		 * If p[j-1]!='*', res[i][j] = res[i-1][j-1] &&
		 * (s[i-1]==p[j-1]||p[j-1]=='.'). Otherwise, res[i][j] is true if
		 * res[i][j-1] or res[i][j-2] or
		 * res[i-1][j]&&(s[i-1]==p[j-2]||p[j-2]=='.'), and notice the third
		 * 'or' case includes the first 'or'.
		 * 
		 * 
		 * Boundaries: res[0][0]=true;//s=p="". res[i][0]=false, i>0.
		 * res[0][j]=is p[0,...,j-1] empty, j>0, and so res[0][1]=false,
		 * res[0][j]=p[j-1]=='*'&&res[0][j-2].
		 * 
		 * O(n) space is enough to store a row of res.
		 */

		int m = s.length(), n = p.length();
		boolean[] res = new boolean[n + 1];
		res[0] = true;

		int i, j;
		for (j = 2; j <= n; j++)
			res[j] = res[j - 2] && p.charAt(j - 1) == '*';

		char pc, sc, tc;
		boolean pre, cur; // pre=res[i - 1][j - 1], cur=res[i-1][j]

		for (i = 1; i <= m; i++) {
			pre = res[0];
			res[0] = false;
			sc = s.charAt(i - 1);

			for (j = 1; j <= n; j++) {
				cur = res[j];
				pc = p.charAt(j - 1);
				if (pc != '*')
					res[j] = pre && (sc == pc || pc == '.');
				else {
					// pc == '*' then it has a preceding char, i.e. j>1
					tc = p.charAt(j - 2);
					res[j] = res[j - 2] || (res[j] && (sc == tc || tc == '.'));
				}
				pre = cur;
			}
		}

		return res[n];
	}

O(n^2), O(n)