正则表达式的解析简单例子 udacity学习

来源：互联网发布：艾默生网络能源编辑：程序博客网时间：2024/06/04 08:18

学习了udacity正则表达式解析，程序需要分解，复用思想，逐步加深。希望记录在此加深印象。

简化正则表达式里只包含5类特殊符号

特殊例子匹配值

* a* '',a,aa,...
? a? '',a
. . a,b,c,1,2,3,...
^ ^b b,ba,bb,... 以b开头
$ a$ ba,bba,... 以a结尾
'' '' ''
a a a
ba ba ba

问题：编写两个函数，search和match

search（pattern, text)在text的任意地方找到匹配模式

match(pattern, text)在text的开始位置找到匹配模式

首先把search问题变成match问题，pattern匹配从左到右，逐个匹配

def search(pattern,  text):    '''Return True if pattern appears anywhere in text'''    # search function uses match function        if pattern.startswith('^'):        return match(pattern[1:],  text) #第一个是^，则在text中找剩余模式    else:        return match('.*' + pattern,  text) #第一个不是^，因为是在任意位置，所以前面加上'.*'，这样可以匹配任意字符

                                            #相当于任意位置匹配，从而转化为match问题，此处巧妙

由于已经处理了^符号，match中只需处理剩余4个特殊符号

def match(pattern,  text):    '''    Return True if pattern appears at the start of text.    '''    if pattern == '': #如果pattern是空字符串，则返回真，因为任意字符串都包含空字符串        return True    elif pattern == '$': #如果pattern是$，则只有text是空字符串可能，如果是，返回真，否则，返回假        return (text ==  '')    elif len(pattern) > 1 and  pattern[1] in '*?': #如果pattern不为'','$'且长度大于1，且pattern[1]在'*?'中        p,  op,  pat =  pattern[0],  pattern[1],  pattern[2:] #将pattern分为三个部分,p,op,pat        if op ==  '*':  #如果op为'*'，则按照*规则匹配，可以匹配0个或多个字符            return match_star(p, pat, text)        elif op == '?': #如果op为'?', 则按照?规则匹配，可以匹配0个或1个字符            if match1(p, text) and  match(pat, text[1:]):  #如果?修饰的字符存在，则p与text调用首字符匹配函数，同时将剩余的                                                           #pattern和剩余的text继续匹配，此处用到递归                return True            else:      #如果?修饰的字符不存在，则直接将剩余的pattern，pat与text匹配                return match(pat, text)    else:  #如果pattern不是以上情况，则模式首字符与text匹配，同时剩余模式和剩余text也匹配，此处又用到递归        return (match1(pattern[0],  text) and                match(pattern[1:],  text[1:]))

def match1(p,  text): #首字符匹配    '''    Return true if first character of text matches pattern character p.    '''    if not text: return False  #如果text为None，则False    return p == '.' or  p == text[0] #p如果为'.'，则可以匹配任意字符，必然为真，如果p与text[0]相等，则也必然为真。首字符就这                                     #两种情况

def match_star(p, pattern, text): #  *匹配    '''    Return true if any number of char p,    followed by pattern, matches text.    '''    return (match(pattern,  text) or #如果*修饰的字符不存在，则直接进行pattern和text匹配             (match1(p, text) and     #或者，如果*修饰的字符存在，则首字符匹配，剩余的text仍用*匹配规则             match_star(p, pattern, text[1:])))

def test():    assert  search('baa*!',  'Sheep said baaaa!') ==  True    assert  search('baa*!', 'Sheep said baaaa numbug') == False    assert  match('baa*!', 'Sheep said baaaa!') == False    assert  match('baa*!',  'baaaaaaaaa! said the sheep') == True    assert  search('def', 'abcdefg') == True    assert  search('def$',  'abcdef') == True    assert  search('def$',  'abcdefg') == False    assert  search('^start',  'not the start') == False    assert  match('start',  'not the start') == False    assert  match('a*b*c*', 'just anything') == True    assert  match('x?', 'text') == True    assert  match('text?', 'text') == True    assert  match('text?', 'tex') == True    def words(text): return text.split()    assert  all(match('aa*bb*cc*$', s)                for s in words('abc aaabbccc aaaabcccc'))    assert  not any(match('aa*bb*cc*$', s)                    for s in words('ac aaabbcccd aaaa-b-cccc'))    assert  all(search('^ab.*aca.*a$', s)                for s in words('abracadabra abacaa about-acacia-fa'))    assert  all(search('t.p', s)                for s in words('tip top tap atypical tepid stop'))    assert  not any(search('t.p', s)                    for s in words('TYPE teepee tp'))    return 'test passes'

正则表达式的解析 简单例子 udacity学习

正则表达式的解析简单例子 udacity学习