正则表达式

来源：互联网发布：网络语言cs是什么意思编辑：程序博客网时间：2024/06/05 14:58

reference：

(1) https://docs.python.org/2/library/re.html
(2) http://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html

Note:'ex' means example

一：起步

import re
re.findall(r'\d{5}',"i love data 10000 year")
['10000']
pattern=re.compile('\d{11}')
pattern.findall("my mobile number is 15967110036")
['15967110036']

二：字符集

"." 任何字符,不包含换行符(default),而dotall模式下,包含换行符
ex:
re.findall('.','\n')
[]
re.findall('.','\n',re.DOTALL)
['\n']

"\d" 数字[0-9]
"\D" 非数字[ ^{0-9]
"\s"空字符 [ \t\n\r\f\v] 记住:\r回车 \f换页 \v垂直制表
"\S"非空字符
"\w"[a-zA-Z0-9]
"\W"[}a-zA-Z0-9]
"[]" 自定义字符 [1-4,a,b,c]

ex:
re.findall(r'\w{1,5}',"i love data 10000 years")
['i', 'love', 'data', '10000', 'years']

三：个数

"*" [0,无穷]
"+" [1,无穷]
"?" [0,1]
{m} m个
{m,n} [m,n]个

贪婪模式的限制
"*?"
"+?"
"??"

ex:
re.findall(r'&.*&',"&i& love data 10000 &years&")
['&i& love data 10000 &years&']
re.findall(r'&.*?&',"&i& love data 10000 &years&")
['&i&', '&years&']

四：边界限制

"^" Matches the start of the string, and in MULTILINE mode also matches immediately after each newline
"\A" Matches only at the start of the string
"$" Matches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline.
"\Z" Matches only at the end of the string.
"\b" 匹配单词边界
"\B" 匹配非单词边界

\b与\A的区别:
ex:
re.findall(r'\bdata','.datailovc')
['data']
re.findall(r'\Adata','.datailovc')
[]
re.findall(r'data\b','.ilovcdata')
['data']

'^'和'\A'的区别' '$' 和'\Z'的区别

不在multiLine模式下，没有区别。
ex
re.findall(r'\Aa','abc')
['a']
re.findall(r'^a','abc')
['a']

re.findall(r'^a','abc\nabc',re.M)
['a', 'a']
re.findall(r'\Aa','abc\nabc',re.M)
['a']

五：匹配处理

()
(?iLmsux)
(?:...)非分组版本
(?P ...)
(?P=name)
(?#...)
(?=...)
(?!...)
(?<=...)
(?<!...)

()与(?:) 区别
ex:
re.search(r'(my) mobile number is (\d{11})','my mobile number is 15967110036').group(0)
'my mobile number is 15967110036'
re.search(r'(my) mobile number is (\d{11})','my mobile number is 15967110036').group(1)
'my'
re.search(r'(my) mobile number is (\d{11})','my mobile number is 15967110036').group(2)
'15967110036'

re.search(r'(?:my) mobile number is (\d{11})','my mobile number is 15967110036').group(1)
'15967110036'

(?#):
re.search(r'my(?#this is comment) mobile number is (\d{11})','my mobile number is 15967110036').group(0)
'my mobile number is 15967110036'

(?iLmsux)
re.search(r"(?i)L{3}123","lll123").group(0)

(?P<>)
re.search(r'(?P<wode>my) mobile number is (\d{11})','my mobile number is 15967110036').group(1)
'my'
re.search(r'(?P<wode>my) mobile number is (\d{11})','my mobile number is 15967110036').group("wode")
'my'

(?P=name)
ex:
re.search(r"(?P<fuhao>"\d{3})daf(?P=fuhao)","123daf123").group(0)
'123daf123'

(?!)
ex
(?!...)
re.findall(r'1(?!2)\d{10}',"my accountMobile is 15967110036")
['15967110036']

(?<=...)
ex
re.findall(r'(?<=\s)\d{11}',"my accountMobile is 15967110036")
['15967110036']

(?<!...)
ex
re.findall(r'(?<!\s)\d{11}',"my accountMobile is 15967110036")
[]

(?(id/name)yes-pattern|no-pattern)
ex:
print re.search(r'(\d{2})abc(?(1)\d|abc)',"12abc3").group(0)
12abc3

六：模式

re.IGNORECASE #忽略大小写
re.LOCALE #/usr/share/i18n/locales
Make \w, \W, \b, \B, \s and \S dependent on the current locale.
re.MULTILINE:多行模式
re.DOTALL:是否包含换行符
re.UNICODE:Unicode是国际组织制定的可以容纳世界上所有文字和符号的字符编码方案( Make \w, \W, \b, \B, \d, \D, \s and \S dependent on the Unicode character properties database)
re.VERBOSE:
a = re.compile(r"""\d + # the integral part
. # the decimal point
\d * # some fractional digits""", re.X)
b = re.compile(r"\d+.\d*")

七：函数以及例子

1:search和match的区别
match只匹配字符串的开头是否匹配,search对位置没有限制
例如:
if(re.match('b','abc')):print 0
...
if(re.search('b','abc')):print 0
...
0

2:re.split()的用法
re.split('\W+','Words,words,words')
['Words', 'words', 'words']
re.split('(\W+)','Words,words,words')
['Words', ',', 'words', ',', 'words']
re.split('(\W+)','Words,words,words',maxsplit=1)
['Words', ',', 'words,words']

3:re.sub的用法 & subn

re.sub(r"abc","123","abcabc")
'123123'
re.sub(r"a","123","abcabc")
'123bc123bc

re.subn(r"a","123","abcabc")
('123bc123bc', 2)
re.subn(r"a","123","aacabc")
('123123c123bc', 3)

0 0