正则表达式

来源：互联网发布：视频混合软件编辑：程序博客网时间：2024/05/19 14:19

This module provides regular expression matching operations similar to those found in Perl. Both patterns and strings to be searched can be Unicode strings as well as 8-bit strings.

Regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\\\' as the pattern string, because the regular expression must be \\, and each backslash must be expressed as \\ inside a regular Python string literal.

The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.

It is important to note that most regular expression operations are available as module-level functions and RegexObjectmethods. The functions are shortcuts that don’t require you to compile a regex object first, but miss some fine-tuning parameters.

http://cuiqingcai.com/3179.html

1.1：Python基础运行环境：本篇教程采用Python3 来写，所以你需要给你的电脑装上Python3才行，我就说说Windows的环境（会玩Linux的各位应该不需要我多此一举了）。

anaconda （点我下载）（这是一个Python的科学计算发行版本，作者打包好多好多的包，不知道干啥的没关系，你只需要知道拥有它之后，那些Windows下pip安装包报错的问题将不复存在）

下载不顺利的同学我已经传到百度云了：http://pan.baidu.com/s/1boAYaTL

1.2：Requests urllib的升级版本打包了全部功能并简化了使用方法（点我查看官方文档）

1.3： beautifulsoup 是一个可以从HTML或XML文件中提取数据的Python库.它能够通过你喜欢的转换器实现惯用的文档导航,查找,修改文档的方式.（点我查看官方文档）（作为一个菜鸟就别去装逼用正则表达式了，匹配不到想要的内容，容易打击积极性。老老实实的用beautifulsoup 吧！虽然性能差了点、但是你会爱上它的。）

1.4：LXML 一个HTML解析包用于辅助beautifulsoup解析网页（如果你不用anaconda，你会发现这个包在Windows下pip安装报错，用了就不会啦。）。

上面的模块需要单独安装，下面几个就不用啦。

0 0