python正则表达式复习2
来源:互联网 发布:玄牝之门 知乎 编辑:程序博客网 时间:2024/05/16 06:05
大小写不区分匹配, 使用\b
import retext = 'This is some text -- with punctuation.'# 匹配以T开头的单词pattern = r'\bT\w+'with_case = re.compile(pattern)# 不区分大小写without_case = re.compile(pattern, re.IGNORECASE)print 'Text:\n %r' % textprint 'Pattern:\n %s' % patternprint 'Case-sensitive:'for match in with_case.findall(text): print ' %r' % matchprint 'Case-insensitive:'for match in without_case.findall(text): print ' %r' % match
结果
Text:
‘This is some text – with punctuation.’
Pattern:
\bT\w+
Case-sensitive:
‘This’
Case-insensitive:
‘This’
‘text’
多行匹配与单行匹配
text = 'This is some text -- with punctuation.\nA second line.'# 查找开头和结尾的非空白字符pattern = r'(^\w+)|(\w+\S*$)'single_line = re.compile(pattern)# 加上多行匹配,即处理\nmultiline = re.compile(pattern, re.MULTILINE)print 'Text:\n %r' % textprint 'Pattern:\n %s' % patternprint 'Single Line :'for match in single_line.findall(text): print ' %r' % (match,)print 'Multiline :'for match in multiline.findall(text): print ' %r' % (match,)
结果
Text:
‘This is some text – with punctuation.\nA second line.’
Pattern:
(^\w+)|(\w+\S*$)
Single Line :
(‘This’, ”)
(”, ‘line.’)
Multiline :
(‘This’, ”)
(”, ‘punctuation.’)
(‘A’, ”)
(”, ‘line.’)
注:多行考虑时,\n后面的字符串被当作另外一行考虑
dotall,点包括了换行符
text = 'This is some text -- with punctuation.\nA second line.'pattern = r'.+'no_newlines = re.compile(pattern)# 默认'.'是不匹配换行符的,加上re.DOTALL标记,即包括换行符dotall = re.compile(pattern, re.DOTALL)print 'Text:\n %r' % textprint 'Pattern:\n %s' % patternfor match in no_newlines.findall(text): print ' %r' % matchprint 'Dotall :'for match in dotall.findall(text): print ' %r' % match
结果
Text:
‘This is some text – with punctuation.\nA second line.’
Pattern:
.+
No newlines :
‘This is some text – with punctuation.’
‘A second line.’
Dotall :
‘This is some text – with punctuation.\nA second line.’
unicode匹配,re.UNICODE
import codecsimport sys# 设置标准输出为utf-8格式sys.stdout = codecs.getwriter('UTF-8')(sys.stdout)text = u'Français złoty Österreich'pattern = ur'\w+'ascii_pattern = re.compile(pattern)unicode_pattern = re.compile(pattern, re.UNICODE)print 'Text :', textprint 'Pattern :', patternprint 'ASCII :', u', '.join(ascii_pattern.findall(text))print 'Unicode :', u', '.join(unicode_pattern.findall(text))
结果
Text : Français złoty Österreich
Pattern : \w+
ASCII : Fran, ais, z, oty, sterreich
Unicode : Français, złoty, Österreich
注:在没有加上unicode匹配时,不是ascii码的字符,无法匹配
正则表达式注释,并已json输出
# 匹配某些带尖括号的匹配邮箱address = re.compile( ''' ((?P<name> ([\w.,]+\s+)*[\w.,]+) # 名字中可能包含点字符 \s* < # 当有名字的时候,邮箱是放在尖括号里面的 )? # 邮箱前面的名字可有可无 (?P<email> [\w\d.+-]+ # 邮箱符号的前面是一个名称 @ ([\w\d.]+\.)+ # 域名的前缀 (com|org|edu) # 限制哪些域名是在考虑范围的 ) >? # 尖括号是根据前面有没有名字而可有可无的 ''', re.UNICODE | re.VERBOSE)candidates = [ u'first.last@example.com', u'first.last+category@gmail.com', u'valid-address@mail.example.com', u'not-valid@example.foo', u'First Last <first.last@example.com>', u'No Brackets first.last@example.com', u'First Last', u'First Middle Last <first.last@example.com>', u'First M. Last <first.last@example.com>', u'<first.last@example.com>', ]for candidate in candidates: print 'Candidate:', candidate match = address.search(candidate) if match: print match.groupdict() else: print ' No match'
结果
Candidate: first.last@example.com
{‘name’: None, ‘email’: u’first.last@example.com’}
Candidate: first.last+category@gmail.com
{‘name’: None, ‘email’: u’first.last+category@gmail.com’}
Candidate: valid-address@mail.example.com
{‘name’: None, ‘email’: u’valid-address@mail.example.com’}
Candidate: not-valid@example.foo
No match
Candidate: First Last first.last@example.com
{‘name’: u’First Last’, ‘email’: u’first.last@example.com’}
Candidate: No Brackets first.last@example.com
{‘name’: None, ‘email’: u’first.last@example.com’}
Candidate: First Last
No match
Candidate: First Middle Last first.last@example.com
{‘name’: u’First Middle Last’, ‘email’: u’first.last@example.com’}
Candidate: First M. Last first.last@example.com
{‘name’: u’First M. Last’, ‘email’: u’first.last@example.com’}
Candidate: first.last@example.com
{‘name’: None, ‘email’: u’first.last@example.com’}
注:名字可有可无,因此会出现name为none的情况
通过?i来标记忽略大小写
# 通过加入标签(?i)忽略大小写,其它的标签 IGNORECASE:i,# MULTILINE:m,DOTALL:s,UNICODE:u,VERBOSE:x ,可以同时加入多个标签如:?imutext = 'This is some text -- with punctuation.'pattern = r'(?i)\bT\w+'regex = re.compile(pattern)print 'Text :', textprint 'Pattern :', patternprint 'Matches :', regex.findall(text)
结果
Text : This is some text – with punctuation.
Pattern : (?i)\bT\w+
Matches : [‘This’, ‘text’]
- python正则表达式复习2
- python正则表达式复习1
- python正则表达式复习3
- python正则表达式复习4
- 正则表达式复习+python使用正则
- Python爬虫知识(2)——正则表达式复习
- python正则表达式2
- 正则表达式语法(2次复习)
- 正则表达式复习
- 每日复习正则表达式
- 正则表达式复习笔记
- 正则表达式的复习
- 正则表达式自我复习
- 正则表达式复习--正则语法
- Python中的正则表达式2
- Python正则表达式-2
- Python 使用正则表达式 - 2
- Python 正则表达式(2)
- IOS字典转模型
- intellij的一些设置和快捷键
- Android带通知栏操作多页面同步暂停支持多任务多线程断点下载demo
- mahout贝叶斯分类code example
- 获取环境变量
- python正则表达式复习2
- 多尺度滑动窗口法,multiple-scale sliding window method
- 软件开发应试人员考试试题(Java)
- mahout贝叶斯分类结果解析
- 使用fiddler模拟http请求
- hibernate 调用oracle 自带函数
- 正则表达式使用汇总
- 集成友盟分享SDK
- [BZOJ 3289] Mato的文件管理 · 莫队算法 & 树状数组