python 正则表达式

来源：互联网发布：uvz mac 编辑：程序博客网时间：2024/06/05 11:28

Python提供re模块实现正则表达式

贪婪模式与非贪婪模式

贪婪模式总是尝试匹配尽可能多的字符；

非贪婪的则相反，总是尝试匹配尽可能少的字符

比如，正则表达式"ab*"如果用于查找"abbbc"，将找到"abbb"。而如果使用非贪婪的数量词"ab*?"，将找到"a"。

Python默认是贪婪匹配。如果需要非贪婪匹配，加个?就行：re.match(r'\d+?','102300').groups()

反斜杠转义的特殊处理（python原生字符串）：

多数编程语言的正则表达式用"\"作为转义字符，比如，Python的字符串“A\\1”实际是“A\1”。而匹配文本中的字符"\"，在正则表达式里就要4个反斜杠"\\\\"，如java中。

Python用原生字符串(加r前缀)简化这个问题，即加r前缀的字符串就不用考虑转义。比如，用r"\\"匹配文本中的字符"\"，匹配一个字符r’\w’相当于’\\w’，r'C\1'相当于’C\\1’。

使用Re匹配正则表达式

Python中eval()和exec()调用一个代码对象比调用一个字符串快，因为代码对象不用编译。

所以python的正则表达式可以直接使用，也可以用re.compile()预编译后用提升效率。但两种方式函数名一致，match、search，group等等

1，直接使用（现编译）：

直接用正则表达式去match目标字符串，用时现编译正则表达式字符串

import re

test = 'test sring for this test'

if re.match(r'正则表达式', test):

    print 'ok'

else:

    print 'failed'

2，预编译：

如果一个正则表达式重复使用多次，则可以预编译，重复用时直接匹配，不需要每次编译，从而提高效率。

预编译实现用Pattern类的工厂方法re.compile(pattern[,flags])，就是将字符串形式的正则表达式“pattern”编译为Pattern对象，而flags是一个参数应对特殊情况

一般步骤：

1，先将正则表达式的字符串形式编译为Pattern实例，

2，然后用Pattern实例匹配文本

3，最后使用Match实例获得信息

# encoding: UTF-8

import re

pattern = re.compile(r'hello') # 将正则表达式编译成Pattern对象

match = pattern.match('hello world!') #使用Pattern匹配文本，获得匹配结果，无法匹配时将返回None

if match:

# 使用Match获得分组信息

printmatch.group()

group分组

匹配后提取子串功能，函数有

1)group([group1,…]):

获得一个或多个分组截获的字符串；指定多个参数时将以元组形式返回。group1可以使用编号也可以使用别名；编号0代表整个匹配的子串；不填写参数时，返回group(0)；没有截获字符串的组返回None；截获了多次的组返回最后一次截获的子串。

2)groups([default]):

以元组形式返回全部分组截获的字符串。相当于调用group(1,2,…last)。default表示没有截获字符串的组以这个值替代，默认为None。

3)groupdict([default]):

返回以有别名的组的别名为键、以该组截获的子串为值的字典，没有别名的组不包含在内。default含义同上。

举个例子：

m = re.match(r'^(\d{3})-(\d{3,8})$', '010-12345')

m.group(0) #'010-12345'

m.group(1) # '010'

m.group(2) # '12345'

用正则表达式识别连续的空格：re.split(r'\s+', 'a b c')得到['a', 'b', 'c']

其他内容可参考

http://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html

http://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000/001386832260566c26442c671fa489ebc6fe85badda25cd000

Python核心编程

下面列出Python正则表达式的几种匹配用法：

1.测试正则表达式是否匹配字符串的全部或部分

regex=ur"" #正则表达式

if re.search(regex, subject):

do_something()

else:

do_anotherthing()

2.测试正则表达式是否匹配整个字符串

regex=ur"\Z" #正则表达式末尾以\Z结束

if re.match(regex, subject):

do_something()

else:

do_anotherthing()

3.创建一个匹配对象，然后通过该对象获得匹配细节(Create an object with details about how the regex matches (part of)a string)

regex=ur"" #正则表达式

match = re.search(regex, subject)

if match:

#match start: match.start()

#match end (exclusive): atch.end()

#matched text: match.group()

do_something()

else:

do_anotherthing()

4.获取正则表达式所匹配的子串(Get thepart of a string matched by the regex)

regex=ur"" #正则表达式

match = re.search(regex, subject)

if match:

result= match.group()

else:

result= ""

5. 获取捕获组所匹配的子串(Get thepart of a string matched by a capturing group)

regex=ur"" #正则表达式

match = re.search(regex, subject)

if match:

result= match.group(1)

else:

result= ""

6. 获取有名组所匹配的子串(Get thepart of a string matched by a named group)

regex=ur"" #正则表达式

match = re.search(regex, subject)

if match:

result = match.group"groupname")

else:

result = ""

7. 将字符串中所有匹配的子串放入数组中(Getan array of all regex matches in a string)

result = re.findall(regex, subject)

8.遍历所有匹配的子串(Iterateover all matches in a string)

for match inre.finditer(r"<(.*?)\s*.*?/\1>", subject)

#match start: match.start()

#match end (exclusive): atch.end()

#matched text: match.group()

9.通过正则表达式字符串创建一个正则表达式对象(Createan object to use the same regex for many operations)

reobj = re.compile(regex)

10.用法１的正则表达式对象版本（useregex object for if/else branch whether (part of) a string can be matched）

reobj = re.compile(regex)

if reobj.search(subject):

do_something()

else:

do_anotherthing()

11.用法２的正则表达式对象版本（useregex object for if/else branch whether a string can be matched entirely）

reobj = re.compile(r"\Z")　＃正则表达式末尾以\Z 结束

if reobj.match(subject):

do_something()

else:

do_anotherthing()

12.创建一个正则表达式对象，然后通过该对象获得匹配细节（Create an object with details about how the regex object matches(part of) a string）

reobj = re.compile(regex)

match = reobj.search(subject)

if match:

#match start: match.start()

#match end (exclusive): atch.end()

#matched text: match.group()

do_something()

else:

do_anotherthing()

13.用正则表达式对象获取匹配子串（Useregex object to get the part of a string matched by the regex）

reobj = re.compile(regex)

match = reobj.search(subject)

if match:

result= match.group()

else:

result= ""

14.用正则表达式对象获取捕获组所匹配的子串（Useregex object to get the part of a string matched by a capturing group）

reobj = re.compile(regex)

match = reobj.search(subject)

if match:

result= match.group(1)

else:

result= ""

15.用正则表达式对象获取有名组所匹配的子串（Useregex object to get the part of a string matched by a named group）

reobj = re.compile(regex)

match = reobj.search(subject)

if match:

result= match.group("groupname")

else:

result= ""

16.用正则表达式对象获取所有匹配子串并放入数组（Use regex object to get an array of all regex matches in a string）

reobj = re.compile(regex)

result = reobj.findall(subject)

17.通过正则表达式对象遍历所有匹配子串（Useregex object to iterate over all matches in a string）

reobj = re.compile(regex)

for match in reobj.finditer(subject):

#match start: match.start()

#match end (exclusive): match.end()

#matched text: match.group()

字符串替换

1.替换所有匹配的子串

#用newstring替换subject中所有与正则表达式regex匹配的子串

result = re.sub(regex, newstring, subject)

2.替换所有匹配的子串（使用正则表达式对象）

reobj = re.compile(regex)

result = reobj.sub(newstring, subject)

字符串拆分

1.字符串拆分

result = re.split(regex, subject)

2.字符串拆分（使用正则表示式对象）

reobj = re.compile(regex)

result = reobj.split(subject)

0 0