python学习之正则表达式应用

来源：互联网发布：淘宝店出售编辑：程序博客网时间：2024/05/21 09:40

1.在一段字符中找出tip 或 top

import rest = "top tip taq twp tep"res = r"t[io]p"print re.findall(res,st)

输出：[‘top’, ‘tip’]

2.在一段字符在找出‘t?p’(‘?’表示除了’i’或’o’以外的任意字符)

import rest = "top tip taq twp tep"res = r"t[^io]p"print re.findall(res,st)

输出：[‘twp’, ‘tep’]

3.判断字符串s是否是以hello开头

import res = "hello world,hello boy"r = r"^hello"print re.findall(r,s)

输出为：[‘hello’]

4.判断字符串s是否是以boy开头

import res = "hello world,hello boy"r = r"boy$"print re.findall(r,s)

输出为：[‘boy’]

5.匹配电话010-开头后面跟着八个数字

import res = "010-77189021"r = r"^010-\d{8}$"print re.findall(r,s)

输出为:[‘010-77189021’]

6.a后面至少一个b

import res = "abbbbbbb"s1 = "a"r = r"ab+"print re.findall(r,s)

输出为：
[‘abbbbbbb’]
[]

7.a后面至少零个b

import res = "abbbbbbb"s1 = "a"r = r"ab*"print re.findall(r,s)print re.findall(r,s1)

[‘abbbbbbb’]
[‘a’]

8.a后面有一个或没有b

import res = "abbbbbbb"s1 = "a"r = r"ab?"print re.findall(r,s)print re.findall(r,s1)

输出为：
[‘ab’]
[‘a’]

9.a后面b的个数非贪婪（如果多个之匹配一个）

import res = "abbbbbbb"r = r"ab+?"print re.findall(r,s)

输出：[‘ab’]
其他的一些正则：
这里写图片描述

闲来无聊，附加一个爬虫。
匹配百度主页的所有汉字：

import reimport urllibimport urllib2def get_html(url):    request = urllib2.Request(url)    response = urllib2.urlopen(request)    html = response.read()    return htmldef get_china(url):    html = unicode(get_html(url),'utf8')    r = ur'[\u4e00-\u9fa5]+'  #ur，u表示unicode编码，r表示原始字符没有变化    china = re.findall(r,html)    return chinachina = get_china("http://www.baidu.com")for c in china:    print c

0 0