python中的正则表达式

来源：互联网发布：禁用3g网络编辑：程序博客网时间：2024/06/10 19:42

1.re的简介
使用python的re模块，尽管不能满足所有复杂的匹配情况，但足够在绝大多数情况下能够有效地实现对复杂字符串的分析并提取出相关信息。python 会将正则表达式转化为字节码，利用 C 语言的匹配引擎进行深度优先的匹配。

import re
print re.doc
可以查询re模块的功能信息，下面会结合几个例子说明。

2.re的正则表达式语法
正则表达式语法表如下：
语法意义说明
“.” 任意字符
“^” 字符串开始 ‘^hello’匹配’helloworld’而不匹配’aaaahellobbb’
“$” 字符串结尾与上同理
“” 0 个或多个字符（贪婪匹配） <>匹配chinaunix
“+” 1 个或多个字符（贪婪匹配）与上同理
“?” 0 个或多个字符（贪婪匹配）与上同理
?,+?,?? 以上三个取第一个匹配结果（非贪婪匹配） <>匹配
{m,n} 对于前一个字符重复m到n次，{m}亦可 a{6}匹配6个a、a{2,4}匹配2到4个a
{m,n}? 对于前一个字符重复m到n次，并取尽可能少 ‘aaaaaa’中a{2,4}只会匹配2个
“\” 特殊字符转义或者特殊序列
[] 表示一个字符集 [0-9]、[a-z]、[A-Z]、[^0]
“|” 或 A|B,或运算
(…) 匹配括号中任意表达式
(?#…) 注释，可忽略
(?=…) Matches if … matches next, but doesn’t consume the string. ‘(?=test)’ 在hellotest中匹配hello
(?!…) Matches if … doesn’t match next. ‘(?!=test)’ 若hello后面不为test，匹配hello
(?<=…) Matches if preceded by … (must be fixed length). ‘(?<=hello)test’ 在hellotest中匹配test

正则表达式特殊序列表如下：
特殊序列符号意义
\A 只在字符串开始进行匹配
\Z 只在字符串结尾进行匹配
\b 匹配位于开始或结尾的空字符串
\B 匹配不位于开始或结尾的空字符串
\d 相当于[0-9]
\D 相当于[^0-9]
\s 匹配任意空白字符:[\t\n\r\r\v]
\S 匹配任意非空白字符:[^\t\n\r\r\v]
\w 匹配任意数字和字母:[a-zA-Z0-9]
\W 匹配任意非数字和字母:[^a-zA-Z0-9]
3.re的主要功能函数
常用的功能函数包括：compile、search、match、split、findall（finditer）、sub（subn）
compile
re.compile(pattern[, flags])
作用：把正则表达式语法转化成正则表达式对象
flags定义包括：
re.I：忽略大小写
re.L：表示特殊字符集 \w, \W, \b, \B, \s, \S 依赖于当前环境
re.M：多行模式
re.S：’ . ’并且包括换行符在内的任意字符（注意：’ . ’不包括换行符）
re.U：表示特殊字符集 \w, \W, \b, \B, \d, \D, \s, \S 依赖于 Unicode 字符属性数据库
更多用法可以在http://www.devexception.com/sitemap_index.xml上查找
search
re.search(pattern, string[, flags])
search (string[, pos[, endpos]])
作用：在字符串中查找匹配正则表达式模式的位置，返回 MatchObject 的实例，如果没有找到匹配的位置，则返回 None。

match
re.match(pattern, string[, flags])
match(string[, pos[, endpos]])
作用：match() 函数只在字符串的开始位置尝试匹配正则表达式，也就是只报告从位置 0 开始的匹配情况，而 search() 函数是扫描整个字符串来查找匹配。如果想要搜索整个字符串来寻找匹配，应当用 search()。

一些例子：
1.测试正则表达式是否匹配字符串的全部或部分

regex=ur”” #正则表达式
if re.search(regex, subject):
do_something()
else:
do_anotherthing()

2.测试正则表达式是否匹配整个字符串

regex=ur”/Z” #正则表达式末尾以/Z结束
if re.match(regex, subject):
do_something()
else:
do_anotherthing()

3.创建一个匹配对象，然后通过该对象获得匹配细节(Create an object with details about how the regex matches (part of) a string)

regex=ur”” #正则表达式
match = re.search(regex, subject)
if match:
# match start: match.start()
# match end (exclusive): atch.end()
# matched text: match.group()
do_something()
else:
do_anotherthing()

4.获取正则表达式所匹配的子串(Get the part of a string matched by the regex)

regex=ur”” #正则表达式
match = re.search(regex, subject)
if match:
result = match.group()
else:
result = “”

获取捕获组所匹配的子串(Get the part of a string matched by a capturing group)

regex=ur”” #正则表达式
match = re.search(regex, subject)
if match:
result = match.group(1)
else:
result = “”

获取有名组所匹配的子串(Get the part of a string matched by a named group)

regex=ur”” #正则表达式
match = re.search(regex, subject)
if match:
result = match.group”groupname”)
else:
result = “”

将字符串中所有匹配的子串放入数组中(Get an array of all regex matches in a string)

result = re.findall(regex, subject)

8.遍历所有匹配的子串(Iterate over all matches in a string)

for match in re.finditer(r”<(.?)/s.*?//1>”, subject)
# match start: match.start()
# match end (exclusive): atch.end()
# matched text: match.group()

9.通过正则表达式字符串创建一个正则表达式对象(Create an object to use the same regex for many operations)

reobj = re.compile(regex)

10.用法１的正则表达式对象版本（use regex object for if/else branch whether (part of) a string can be matched）

reobj = re.compile(regex)
if reobj.search(subject):
do_something()
else:
do_anotherthing()

11.用法２的正则表达式对象版本（use regex object for if/else branch whether a string can be matched entirely）

reobj = re.compile(r”/Z”)　＃正则表达式末尾以/Z 结束
if reobj.match(subject):
do_something()
else:
do_anotherthing()

12.创建一个正则表达式对象，然后通过该对象获得匹配细节（Create an object with details about how the regex object matches (part of) a string）

reobj = re.compile(regex)
match = reobj.search(subject)
if match:
# match start: match.start()
# match end (exclusive): atch.end()
# matched text: match.group()
do_something()
else:
do_anotherthing()

13.用正则表达式对象获取匹配子串（Use regex object to get the part of a string matched by the regex）

reobj = re.compile(regex)
match = reobj.search(subject)
if match:
result = match.group()
else:
result = “”

14.用正则表达式对象获取捕获组所匹配的子串（Use regex object to get the part of a string matched by a capturing group）

reobj = re.compile(regex)
match = reobj.search(subject)
if match:
result = match.group(1)
else:
result = “”

15.用正则表达式对象获取有名组所匹配的子串（Use regex object to get the part of a string matched by a named group）

reobj = re.compile(regex)
match = reobj.search(subject)
if match:
result = match.group(“groupname”)
else:
result = “”

16.用正则表达式对象获取所有匹配子串并放入数组（Use regex object to get an array of all regex matches in a string）

reobj = re.compile(regex)
result = reobj.findall(subject)

17.通过正则表达式对象遍历所有匹配子串（Use regex object to iterate over all matches in a string）

reobj = re.compile(regex)
for match in reobj.finditer(subject):
# match start: match.start()
# match end (exclusive): match.end()
# matched text: match.group()
字符串替换

1.替换所有匹配的子串

用newstring替换subject中所有与正则表达式regex匹配的子串
result = re.sub(regex, newstring, subject)

2.替换所有匹配的子串（使用正则表达式对象）

reobj = re.compile(regex)
result = reobj.sub(newstring, subject)

字符串拆分

1.字符串拆分

result = re.split(regex, subject)

2.字符串拆分（使用正则表示式对象）

reobj = re.compile(regex)
result = reobj.split(subject)

以上内容转自：http://blog.csdn.net/suiyunonghen/article/details/3763261
http://www.xuebuyuan.com/2042477.html

0 0