正则表达式
来源:互联网 发布:网络语言cs是什么意思 编辑:程序博客网 时间:2024/06/05 14:58
reference:
(1) https://docs.python.org/2/library/re.html
(2) http://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html
Note:'ex' means example
一:起步
import re
re.findall(r'\d{5}',"i love data 10000 year")
['10000']pattern=re.compile('\d{11}')
pattern.findall("my mobile number is 15967110036")
['15967110036']
二:字符集
"." 任何字符,不包含换行符(default),而dotall模式下,包含换行符
ex:
re.findall('.','\n')
[]
re.findall('.','\n',re.DOTALL)
['\n']
"\d" 数字[0-9]
"\D" 非数字[ 0-9]
"\s"空字符 [ \t\n\r\f\v] 记住:\r回车 \f换页 \v垂直制表
"\S"非空字符
"\w"[a-zA-Z0-9]
"\W"[ a-zA-Z0-9]
"[]" 自定义字符 [1-4,a,b,c]
ex:
re.findall(r'\w{1,5}',"i love data 10000 years")
['i', 'love', 'data', '10000', 'years']
三:个数
"*" [0,无穷]
"+" [1,无穷]
"?" [0,1]
{m} m个
{m,n} [m,n]个
贪婪模式的限制
"*?"
"+?"
"??"
ex:
re.findall(r'&.*&',"&i& love data 10000 &years&")
['&i& love data 10000 &years&']
re.findall(r'&.*?&',"&i& love data 10000 &years&")
['&i&', '&years&']
四:边界限制
"^" Matches the start of the string, and in MULTILINE mode also matches immediately after each newline
"\A" Matches only at the start of the string
"$" Matches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline.
"\Z" Matches only at the end of the string.
"\b" 匹配单词边界
"\B" 匹配非单词边界
\b与\A的区别:
ex:
re.findall(r'\bdata','.datailovc')
['data']
re.findall(r'\Adata','.datailovc')
[]
re.findall(r'data\b','.ilovcdata')
['data']
'^'和'\A'的区别' '$' 和'\Z'的区别
不在multiLine模式下,没有区别。
ex
re.findall(r'\Aa','abc')
['a']
re.findall(r'^a','abc')
['a']
re.findall(r'^a','abc\nabc',re.M)
['a', 'a']
re.findall(r'\Aa','abc\nabc',re.M)
['a']
五:匹配处理
()
(?iLmsux)
(?:...)非分组版本
(?P ...)
(?P=name)
(?#...)
(?=...)
(?!...)
(?<=...)
(?<!...)
()与(?:) 区别
ex:
re.search(r'(my) mobile number is (\d{11})','my mobile number is 15967110036').group(0)
'my mobile number is 15967110036'
re.search(r'(my) mobile number is (\d{11})','my mobile number is 15967110036').group(1)
'my'
re.search(r'(my) mobile number is (\d{11})','my mobile number is 15967110036').group(2)
'15967110036'
re.search(r'(?:my) mobile number is (\d{11})','my mobile number is 15967110036').group(1)
'15967110036'
(?#):
re.search(r'my(?#this is comment) mobile number is (\d{11})','my mobile number is 15967110036').group(0)
'my mobile number is 15967110036'
(?iLmsux)
re.search(r"(?i)L{3}123","lll123").group(0)
(?P<>)
re.search(r'(?P<wode>my) mobile number is (\d{11})','my mobile number is 15967110036').group(1)
'my'
re.search(r'(?P<wode>my) mobile number is (\d{11})','my mobile number is 15967110036').group("wode")
'my'
(?P=name)
ex:
re.search(r"(?P<fuhao>"\d{3})daf(?P=fuhao)","123daf123").group(0)
'123daf123'
(?!)
ex
(?!...)
re.findall(r'1(?!2)\d{10}',"my accountMobile is 15967110036")
['15967110036']
(?<=...)
ex
re.findall(r'(?<=\s)\d{11}',"my accountMobile is 15967110036")
['15967110036']
(?<!...)
ex
re.findall(r'(?<!\s)\d{11}',"my accountMobile is 15967110036")
[]
(?(id/name)yes-pattern|no-pattern)
ex:
print re.search(r'(\d{2})abc(?(1)\d|abc)',"12abc3").group(0)
12abc3
六:模式
re.IGNORECASE #忽略大小写
re.LOCALE #/usr/share/i18n/locales
Make \w, \W, \b, \B, \s and \S dependent on the current locale.
re.MULTILINE:多行模式
re.DOTALL:是否包含换行符
re.UNICODE:Unicode是国际组织制定的可以容纳世界上所有文字和符号的字符编码方案( Make \w, \W, \b, \B, \d, \D, \s and \S dependent on the Unicode character properties database)
re.VERBOSE:
a = re.compile(r"""\d + # the integral part
. # the decimal point
\d * # some fractional digits""", re.X)
b = re.compile(r"\d+.\d*")
七:函数以及例子
1:search和match的区别
match只匹配字符串的开头是否匹配,search对位置没有限制
例如:
if(re.match('b','abc')):print 0
...
if(re.search('b','abc')):print 0
...
0
2:re.split()的用法
re.split('\W+','Words,words,words')
['Words', 'words', 'words']
re.split('(\W+)','Words,words,words')
['Words', ',', 'words', ',', 'words']
re.split('(\W+)','Words,words,words',maxsplit=1)
['Words', ',', 'words,words']
3:re.sub的用法 & subn
re.sub(r"abc","123","abcabc")
'123123'
re.sub(r"a","123","abcabc")
'123bc123bc
re.subn(r"a","123","abcabc")
('123bc123bc', 2)
re.subn(r"a","123","aacabc")
('123123c123bc', 3)
- 【正则表达式】正则表达式
- 正则表达式
- 正则表达式
- 正则表达式
- 正则表达式
- 正则表达式
- 正则表达式
- 正则表达式
- 正则表达式
- 正则表达式
- 正则表达式
- 正则表达式
- 正则表达式
- 正则表达式
- 正则表达式
- 正则表达式
- 正则表达式
- 正则表达式
- QT_pro_模板
- C#只允许开启一个执行文件
- java 的类和接口的变量调用
- LeetCode--Linked List Cycle II
- Ubuntu有线网络无法连接解决
- 正则表达式
- Android Studio Eclipse Code Formatter
- Sublime Text 3搭建C++编译环境
- ABAP程序发送带附件的邮件
- 搜狗输入法自动化性能测试
- hibernate的缓存
- [iPhone高级] 基于XMPP的IOS聊天客户端程序(IOS端一)
- 在Ubuntu14.04上编译Android4.0.1出现的几个问题
- LeetCode02—Median of Two Sorted Arrays