Python正则表达式实例演练

来源:互联网 发布:数据科学 r语言实战 编辑:程序博客网 时间:2024/05/16 18:55

参考自:http://blog.jobbole.com/74844/

1、首先python中要使用regex,必须导入模块 re

>>>import re

\的使用,要想输出元字符(. * - + \ $ ^ 等),必须在前边加\ , 才能输出。

>>> string='this is a\nstring'
>>> print string
this is a
string

>>> string1=r'this is a\nstring'  #前边加r可以当做一个正则式,后边全是字符,没有其他含义
>>> print string1
this is a\nstring

2、下面我们看一下,re中常用的几个函数:

re.match(regex,string)方法,只有当被搜索字符串的开头匹配到定义的regex时,才能查找到匹配对象。

>>> import re
>>> re.match(r'hello','hello world hello boys!')
<_sre.SRE_Match object at 0x7fcaa1b0c308>   #returnamatch_object
>>> re.match(r'world','hello world hello boys!')
>>>

re.search(regex,string)方法,这个不会只限制与在开头进行匹配,而是在整个string中进行匹配,但是只返回匹配到的第一个。

>>> match_obj = re.search(r'world','hello world ,hello world!') #returnamatch_object
>>> match_obj.group(0)
'world'

re.findall(regex,string)方法,这个是查找所有的符合regex的element, 返回的时一个list

>>> print re.findall(r'world','hello world ,hello world!')    
['world', 'world']

3、详细讲解一下前边用到的group函数

>>> contactInfo = 'Doe, John: 555-1212'
>>> match = re.search(r'(\w+), (\w+): (\S+)', contactInfo)
>>> match.group(0)
'Doe, John: 555-1212'
>>> match.group(1)
'Doe'
>>> match.group(2)
'John'
>>> match.group(3)
'555-1212'

>>> re.findall(r'(\w+), (\w+): (\S+)', contactInfo)
[('Doe', 'John', '555-1212')]

>>> re.findall(r'\w+, \w+: \S+', contactInfo)

['Doe, John: 555-1212']

可以看出findall()并不适用分组的形式。

4、下面看一个匹配电话号码的例子

import re#the regex of  landline num  and phone num print "the landline regex: "landline = '0538-1234567'r_landline = r'\d{4}-?\d{7}'MatchObject_landline = re.match(r_landline,landline)if MatchObject_landline==None:    print "match fail"else:    print "match success"print "the phone num  regex: "phone_num  = '+8618811112222'r_phone_num = r'\+\d{2}\d{11}'MatchObject_phone = re.match(r_phone_num,phone_num)if MatchObject_phone==None:    print "match fail"else:    print "match success"

5、再来看一个匹配电子邮件的小例子:

import re#before '@' is a str of length between 3 to 10 ,behind '@' is one more char ,the end  is '.com' , '.cn' or .'.org'email = r'\w{3,10}@\w+(\.com|\.cn)'match_object =  re.match(email,'zhangsan@qq.com') #return one Match_Objectprint match_object.group(); #print the content of match_object

OK,thank you!




0 0