Python基础知识

来源：互联网发布：部落冲突胖子数据编辑：程序博客网时间：2024/06/07 02:37

一、安装、编译与运行

Python的安装很容易，直接到官网：http://www.python.org/下载安装就可以了。Ubuntu一般都预安装了。没有的话，就可以#apt-get install python。Windows的话直接下载msi包安装即可。Python 程序是通过解释器执行的，所以安装后，可以看到Python提供了两个解析器，一个是IDLE (Python GUI)，一个是Python (command line)。前者是一个带GUI界面的版本，后者实际上和在命令提示符下运行python是一样的。运行解释器后，就会有一个命令提示符>>>，在提示符后键入你的程序语句，键入的语句将会立即执行。就像Matlab一样。

另外，Matlab有.m的脚步文件，python也有.py后缀的脚本文件，这个文件除了可以解释执行外，还可以编译运行，编译后运行速度要比解释运行要快。

例如，我要打印一个helloWorld。

方法1：直接在解释器中，>>> print ‘helloWorld’。

方法2：将这句代码写到一个文件中，例如hello.py。运行这个文件有三种方式：

1）在终端中：python hello.py

2）先编译成.pyc文件：

import py_compile

py_compile.compile("hello.py")

再在终端中：python hello.pyc

3）在终端中：

python -O -m py_compile hello.py

python hello.pyo

编译成.pyc和.pyo文件后，执行的速度会更快。所以一般一些重复性并多次调用的代码会被编译成这两种可执行的方式来待调用。

二、变量、运算与表达式

这里没什么好说的，有其他语言的编程基础的话都没什么问题。和Matlab的相似度比较大。这块差别不是很大。具体如下：

需要注意的一个是：5/2 等于2，5.0/2才等于2.5。

[python] view plain copy
###################################  
### compute #######  
# raw_input() get input from keyboard to string type  
# So we should transfer to int type  
# Some new support computing type:  
# and or not in is < <= != == | ^ & << + - / % ~ **  
print 'Please input a number:'  
number = int(raw_input())   
number += 1  
print number**2 # ** means ^  
print number and 1  
print number or 1  
print not number  
5/2 # is 2  
5.0/2 # is 2.5, should be noted  

三、数据类型

1、数字

通常的int, long,float,long等等都被支持。而且会看你的具体数字来定义变量的类型。如下：

[python] view plain copy
###################################  
### type of value #######  
# int, long, float  
# do not need to define the type of value, python will  
# do this according to your value  
num = 1   # stored as int type  
num = 1111111111111   # stored as long int type  
num = 1.0   # stored as float type  
num = 12L # L stands for long type  
num = 1 + 12j # j stands for complex type  
num = '1' # string type  

2、字符串

单引号，双引号和三引号都可以用来定义字符串。三引号可以定义特别格式的字符串。字符串作为一种序列类型，支持像Matlab一样的索引访问和切片访问。

[python] view plain copy
###################################  
### type of string #######  
num = "1" # string type  
num = "Let's go" # string type  
num = "He's \"old\"" # string type  
mail = "Xiaoyi: \n hello \n I am you!"  
mail = """Xiaoyi: 
    hello 
    I am you! 
    """ # special string format  
string = 'xiaoyi' # get value by index  
copy = string[0] + string[1] + string[2:6] # note: [2:6] means [2 5] or[2 6)  
copy = string[:4] # start from 1  
copy = string[2:] # to end  
copy = string[::1] # step is 1, from start to end  
copy = string[::2] # step is 2  
copy = string[-1] # means 'i', the last one  
copy = string[-4:-2:-1] # means 'yoa', -1 step controls direction  
memAddr = id(num) # id(num) get the memory address of num  
type(num) # get the type of num  

3、元组

元组tuple用()来定义。相当于一个可以存储不同类型数据的一个数组。可以用索引来访问，但需要注意的一点是，里面的元素不能被修改。

[python] view plain copy
###################################  
### sequence type #######  
## can access the elements by index or slice  
## include: string, tuple(or array? structure? cell?), list  
# basis operation of sequence type  
firstName = 'Zou'  
lastName = 'Xiaoyi'  
len(string) # the length  
name = firstName + lastName # concatenate 2 string  
firstName * 3 # repeat firstName 3 times  
'Z' in firstName # check contain or not, return true  
string = '123'  
max(string)  
min(string)  
cmp(firstName, lastName) # return 1, -1 or 0  
  
## tuple(or array? structure? cell?)  
## define this type using ()  
user = ("xiaoyi", 25, "male")  
name = user[0]  
age = user[1]  
gender = user[2]  
t1 = () # empty tuple  
t2 = (2, ) # when tuple has only one element, we should add a extra comma  
user[1] = 26 # error!! the elements can not be changed  
name, age, gender = user # can get three element respectively  
a, b, c = (1, 2, 3)  

4、列表

列表list用[]来定义。它和元组的功能一样，不同的一点是，里面的元素可以修改。List是一个类，支持很多该类定义的方法，这些方法可以用来对list进行操作。

[python] view plain copy
## list type (the elements can be modified)  
## define this type using []  
userList = ["xiaoyi", 25, "male"]  
name = userList[0]  
age = userList[1]  
gender = userList[2]  
userList[3] = 88888 # error! access out of range, this is different with Matlab  
userList.append(8888) # add new elements  
"male" in userList # search  
userList[2] = 'female' # can modify the element (the memory address not change)  
userList.remove(8888) # remove element  
userList.remove(userList[2]) # remove element  
del(userList[1]) # use system operation api  
## help(list.append)  
  
################################  
######## object and class ######  
## object = property + method  
## python treats anything as class, here the list type is a class,  
## when we define a list "userList", so we got a object, and we use  
## its method to operate the elements  

5、字典

字典dictionary用{}来定义。它的优点是定义像key-value这种键值对的结构，就像struct结构体的功能一样。它也支持字典类支持的方法进行创建和操作。

[python] view plain copy
################################  
######## dictionary type ######  
## define this type using {}  
item = ['name', 'age', 'gender']  
value = ['xiaoyi', '25', 'male']  
zip(item, value) # zip() will produce a new list:   
# [('name', 'xiaoyi'), ('age', '25'), ('gender', 'male')]  
# but we can not define their corresponding relationship  
# and we can define this relationship use dictionary type  
# This can be defined as a key-value manner  
# dic = {key1: value1, key2: value2, ...}, key and value can be any type  
dic = {'name': 'xiaoyi', 'age': 25, 'gender': 'male'}  
dic = {1: 'zou', 'age':25, 'gender': 'male'}  
# and we access it like this: dic[key1], the key as a index  
print dic['name']  
print dic[1]  
# another methods create dictionary  
fdict = dict(['x', 1], ['y', 2]) # factory mode  
ddict = {}.fromkeys(('x', 'y'), -1) # built-in mode, default value is the same which is none  
# access by for circle  
for key in dic  
    print key  
    print dic[key]  
  
# add key or elements to dictionary, because dictionary is out of sequence,  
# so we can directly and a key-value pair like this:  
dic['tel'] = 88888    
# update or delete the elements  
del dic[1] # delete this key  
dic.pop('tel') # show and delete this key  
dic.clear() # clear the dictionary  
del dic # delete the dictionary  
dic.get(1) # get the value of key  
dic.get(1, 'error') # return a user-define message if the dictionary do not contain the key  
dic.keys()  
dic.values()  
dic.has_key(key)  
# dictionary has many operations, please use help to check out  

四、流程控制

在这块，Python与其它大多数语言有个非常不同的地方，Python语言使用缩进块来表示程序逻辑（其它大多数语言使用大括号等）。例如：

if age < 21:

print("你不能买酒。")

print("不过你能买口香糖。")

print("这句话处于if语句块的外面。")

这个代码相当于c语言的：

if (age < 21)

{

print("你不能买酒。")

print("不过你能买口香糖。")

}

print("这句话处于if语句块的外面。")

可以看到，Python语言利用缩进表示语句块的开始和退出（Off-side规则），而非使用花括号或者某种关键字。增加缩进表示语句块的开始（注意前面有个:号），而减少缩进则表示语句块的退出。根据PEP的规定，必须使用4个空格来表示每级缩进（不清楚4个空格的规定如何，在实际编写中可以自定义空格数，但是要满足每级缩进间空格数相等）。使用Tab字符和其它数目的空格虽然都可以编译通过，但不符合编码规范。

为了使我们自己编写的程序能很好的兼容别人的程序，我们最好还是按规范来，用四个空格来缩减（注意，要么都是空格，要是么都制表符，千万别混用）。

1、if-else

If-else用来判断一些条件，以执行满足某种条件的代码。

[python] view plain copy
################################  
######## procedure control #####  
## if else  
if expression: # bool type and do not forget the colon  
    statement(s) # use four space key   
  
if expression:   
statement(s) # error!!!! should use four space key   
      
if 1<2:  
    print 'ok, ' # use four space key  
    print 'yeah' # use the same number of space key  
      
if True: # true should be big letter True  
    print 'true'  
  
def fun():  
    return 1  
  
if fun():  
    print 'ok'  
else:  
    print 'no'  
      
con = int(raw_input('please input a number:'))  
if con < 2:  
    print 'small'  
elif con > 3:  
    print 'big'  
else:  
    print 'middle'  
      
if 1 < 2:  
    if 2 < 3:  
        print 'yeah'  
    else:  
        print 'no'    
    print 'out'  
else:  
    print 'bad'  
  
if 1<2 and 2<3 or 2 < 4 not 0: # and, or, not  
    print 'yeah'  

2、for

for的作用是循环执行某段代码。还可以用来遍历我们上面所提到的序列类型的变量。

[python] view plain copy
################################  
######## procedure control #####  
## for  
for iterating_val in sequence:  
    statements(s)  
# sequence type can be string, tuple or list  
  
for i in "abcd":  
    print i  
  
for i in [1, 2, 3, 4]:  
    print i  
  
# range(start, end, step), if not set step, default is 1,   
# if not set start, default is 0, should be noted that it is [start, end), not [start, end]  
range(5) # [0, 1, 2, 3, 4]  
range(1, 5) # [1, 2, 3, 4]  
range(1, 10, 2) # [1, 3, 5, 7, 9]  
for i in range(1, 100, 1):   
    print i  
  
# ergodic for basis sequence  
fruits = ['apple', 'banana', 'mango']  
for fruit in range(len(fruits)):   
    print 'current fruit: ', fruits[fruit]  
  
# ergodic for dictionary  
dic = {1: 111, 2: 222, 5: 555}  
for x in dic:  
    print x, ': ', dic[x]  
      
dic.items() # return [(1, 111), (2, 222), (5, 555)]  
for key,value in dic.items(): # because we can: a,b=[1,2]  
    print key, ': ', value  
else:  
    print 'ending'  
  
################################  
import time  
# we also can use: break, continue to control process  
for x in range(1, 11):  
    print x  
    time.sleep(1) # sleep 1s  
    if x == 3:  
        pass # do nothing  
    if x == 2:  
        continue  
    if x == 6:  
        break  
    if x == 7:    
        exit() # exit the whole program  
    print '#'*50  

3、while

while的用途也是循环。它首先检查在它后边的循环条件，若条件表达式为真，它就执行冒号后面的语句块，然后再次测试循环条件，直至为假。冒号后面的缩近语句块为循环体。

[python] view plain copy
################################  
######## procedure control #####  
## while  
while expression:  
    statement(s)  
  
while True:  
    print 'hello'  
    x = raw_input('please input something, q for quit:')  
    if x == 'q':  
        break  
else:  
    print 'ending'  

4、switch

其实Python并没有提供switch结构，但我们可以通过字典和函数轻松的进行构造。例如：

[python] view plain copy
#############################  
## switch ####  
## this structure do not support by python  
## but we can implement it by using dictionary and function  
## cal.py ##  
#!/usr/local/python  
  
from __future__ import division  
# if used this, 5/2=2.5, 6/2=3.0  
  
def add(x, y):  
    return x + y  
def sub(x, y):  
    return x - y  
def mul(x, y):  
    return x * y  
def div(x, y):  
    return x / y  
  
operator = {"+": add, "-": sub, "*": mul, "/": div}  
operator["+"](1, 2) # the same as add(1, 2)  
operator["%"](1, 2) # error, not have key "%", but the below will not  
operator.get("+")(1, 2) # the same as add(1, 2)  
  
def cal(x, o, y):  
    print operator.get(o)(x, y)  
cal(2, "+", 3)  
# this method will effect than if-else  

五、函数

1、自定义函数

在Python中，使用def语句来创建函数：

[python] view plain copy
################################  
######## function #####   
def functionName(parameters): # no parameters is ok  
    bodyOfFunction  
  
def add(a, b):  
    return a+b # if we do not use a return, any defined function will return default None   
      
a = 100  
b = 200  
sum = add(a, b)  
  
##### function.py #####  
#!/usr/bin/python  
#coding:utf8  # support chinese  
def add(a = 1, b = 2): # default parameters  
    return a+b  # can return any type of data  
# the followings are all ok  
add()  
add(2)  
add(y = 1)  
add(3, 4)  
  
###### the global and local value #####  
## global value: defined outside any function, and can be used  
##              in anywhere, even in functions, this should be noted  
## local value: defined inside a function, and can only be used  
##              in its own function  
## the local value will cover the global if they have the same name  
val = 100 # global value  
def fun():  
    print val # here will access the val = 100  
print val # here will access the val = 100, too  
  
def fun():  
    a = 100 # local value  
    print a  
print a # here can not access the a = 100  
  
def fun():  
    global a = 100 # declare as a global value  
    print a  
  
print a # here can not access the a = 100, because fun() not be called yet  
fun()  
print a # here can access the a = 100  
  
############################  
## other types of parameters  
def fun(x):  
    print x  
# the follows are all ok  
fun(10) # int  
fun('hello') # string  
fun(('x', 2, 3))  # tuple  
fun([1, 2, 3])    # list  
fun({1: 1, 2: 2}) # dictionary  
  
## tuple  
def fun(x, y):  
    print "%s : %s" % (x,y) # %s stands for string  
fun('Zou', 'xiaoyi')  
tu = ('Zou', 'xiaoyi')  
fun(*tu)    # can transfer tuple parameter like this  
  
## dictionary  
def fun(name = "name", age = 0):  
    print "name: %s" % name  
    print "age: " % age  
dic = {name: "xiaoyi", age: 25} # the keys of dictionary should be same as fun()  
fun(**dic) # can transfer dictionary parameter like this  
fun(age = 25, name = 'xiaoyi') # the result is the same  
## the advantage of dictionary is can specify value name  
  
#############################  
## redundancy parameters ####  
## the tuple  
def fun(x, *args): # the extra parameters will stored in args as tuple type   
    print x  
    print args  
# the follows are ok  
fun(10)  
fun(10, 12, 24) # x = 10, args = (12, 24)  
  
## the dictionary  
def fun(x, **args): # the extra parameters will stored in args as dictionary type   
    print x  
    print args  
# the follows are ok  
fun(10)  
fun(x = 10, y = 12, z = 15) # x = 10, args = {'y': 12, 'z': 15}  
  
# mix of tuple and dictionary  
def fun(x, *args, **kwargs):  
    print x  
    print args  
    print kwargs  
fun(1, 2, 3, 4, y = 10, z = 12) # x = 1, args = (2, 3, 4), kwargs = {'y': 10, 'z': 12}  

2、Lambda函数

Lambda函数用来定义一个单行的函数，其便利在于：

[python] view plain copy
#############################  
## lambda function ####  
## define a fast single line function  
fun = lambda x,y : x*y # fun is a object of function class  
fun(2, 3)  
# like  
def fun(x, y):  
    return x*y  
  
## recursion  
# 5=5*4*3*2*1, n!  
def recursion(n):  
    if n > 0:  
        return n * recursion(n-1) ## wrong  
  
def mul(x, y):  
    return x * y  
numList = range(1, 5)  
reduce(mul, numList) # 5! = 120  
reduce(lambda x,y : x*y, numList) # 5! = 120, the advantage of lambda function avoid defining a function  
  
### list expression  
numList = [1, 2, 6, 7]  
filter(lambda x : x % 2 == 0, numList)  
print [x for x in numList if x % 2 == 0] # the same as above  
map(lambda x : x * 2 + 10, numList)  
print [x * 2 + 10 for x in numList] # the same as above  

3、Python内置函数

Python内置了很多函数，他们都是一个个的.py文件，在python的安装目录可以找到。弄清它有那些函数，对我们的高效编程非常有用。这样就可以避免重复的劳动了。下面也只是列出一些常用的：

[python] view plain copy
###################################  
## built-in function of python ####  
## if do not how to use, please use help()  
abs, max, min, len, divmod, pow, round, callable,  
isinstance, cmp, range, xrange, type, id, int()  
list(), tuple(), hex(), oct(), chr(), ord(), long()  
  
callable # test a function whether can be called or not, if can, return true  
# or test a function is exit or not  
  
isinstance # test type  
numList = [1, 2]  
if type(numList) == type([]):  
    print "It is a list"  
if isinstance(numList, list): # the same as above, return true  
    print "It is a list"  
      
for i in range(1, 10001) # will create a 10000 list, and cost memory  
for i in xrange(1, 10001)# do not create such a list, no memory is cost  
  
## some basic functions about string  
str = 'hello world'  
str.capitalize() # 'Hello World', first letter transfer to big  
str.replace("hello", "good") # 'good world'  
ip = "192.168.1.123"  
ip.split('.') # return ['192', '168', '1', '123']  
help(str.split)  
  
import string  
str = 'hello world'  
string.replace(str, "hello", "good") # 'good world'  
  
## some basic functions about sequence  
len, max, min  
# filter(function or none, sequence)  
def fun(x):  
    if x > 5:  
        return True  
numList = [1, 2, 6, 7]  
filter(fun, numList) # get [6, 7], if fun return True, retain the element, otherwise delete it  
filter(lambda x : x % 2 == 0, numList)  
# zip()  
name = ["me", "you"]  
age = [25, 26]  
tel = ["123", "234"]  
zip(name, age, tel) # return a list: [('me', 25, '123'), ('you', 26, '234')]  
# map()  
map(None, name, age, tel) # also return a list: [('me', 25, '123'), ('you', 26, '234')]  
test = ["hello1", "hello2", "hello3"]  
zip(name, age, tel, test) # return [('me', 25, '123', 'hello1'), ('you', 26, '234', 'hello2')]  
map(None, name, age, tel, test) # return [('me', 25, '123', 'hello1'), ('you', 26, '234', 'hello2'), (None, None, None, 'hello3')]  
a = [1, 3, 5]  
b = [2, 4, 6]  
def mul(x, y):  
    return x*y  
map(mul, a, b) # return [2, 12, 30]  
# reduce()  
reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) # return ((((1+2)+3)+4)+5)  

六、包与模块

1、模块module

python中每一个.py脚本定义一个模块，所以我们可以在一个.py脚本中定义一个实现某个功能的函数或者脚本，这样其他的.py脚本就可以调用这个模块了。调用的方式有三种，如下：

[python] view plain copy
###################################  
## package and module ####  
## a .py file define a module which can be used in other script  
## as a script, the name of module is the same as the name of the .py file  
## and we use the name to import to a new script  
## e.g., items.py, import items  
## python contains many .py files, which we can import and use  
# vi cal.py  
def add(x, y):  
    return x + y  
def sub(x, y):  
    return x - y  
def mul(x, y):  
    return x * y  
def div(x, y):  
    return x / y  
  
print "Your answer is: ", add(3, 5)  
  
if __name__ == "__main__"  
    r = add(1, 3)  
    print r  
      
# vi test.py  
import cal # will expand cal.py here  
# so, this will execute the following code in cal.py  
# print "Your answer is: ", add(3, 5)  
# it will print "Your answer is: 8"  
# but as we import cal.py, we just want to use those functions  
# so the above code can do this for me, the r=add(1, 3) will not execute  
result = cal.add(1, 2)  
print result  
# or  
import cal as c  
result = c.add(1, 2)  
# or  
from cal import add  
result = add(1, 2)  

2、包package

python 的每个.py文件执行某种功能，那有时候我们需要多个.py完成某个更大的功能，或者我们需要将同类功能的.py文件组织到一个地方，这样就可以很方便我们的使用。模块可以按目录组织为包，创建一个包的步骤：

# 1、建立一个名字为包名字的文件夹

# 2、在该文件夹下创建一个__init__.py空文件

# 3、根据需要在该文件夹下存放.py脚本文件、已编译拓展及子包

# 4、import pack.m1,pack.m2 pack.m3

[python] view plain copy
#### package 包  
## python 的模块可以按目录组织为包，创建一个包的步骤：  
# 1、建立一个名字为包名字的文件夹  
# 2、在该文件夹下创建一个__init__.py 空文件  
# 3、根据需要在该文件夹下存放.py脚本文件、已编译拓展及子包  
# 4、import pack.m1, pack.m2 pack.m3  
mkdir calSet  
cd calSet  
touch __init_.py  
cp cal.py .  
  
# vi test.py  
import calSet.cal  
result = calSet.cal.add(1, 2)  
print result  

七、正则表达式

正则表达式，（英语：RegularExpression，在代码中常简写为regex、regexp或RE），计算机科学的一个概念。正则表达式使用单个字符串来描述、匹配一系列符合某个句法规则的字符串。在很多文本编辑器里，正则表达式通常被用来检索、替换那些符合某个模式的文本。

Python提供了功能强大的正则表达式引擎re，我们可以利用这个模块来利用正则表达式进行字符串操作。我们用import re来导入这个模块。

正则表达式包含了很多规则，如果能灵活的使用，在匹配字符串方面是非常高效率的。更多的规则，我们需要查阅其他的资料。

1、元字符

很多，一些常用的元字符的使用方法如下：

[python] view plain copy
##############################  
## 正则表达式 RE  
## re module in python  
import re  
rule = r'abc' # r prefix, the rule you want to check in a given string  
re.findall(rule, "aaaaabcaaaaaabcaa") # return ['abc', 'abc']  
  
# [] 用来指定一个字符集 [abc] 表示 abc其中任意一个字符符合都可以  
rule = r"t[io]p"   
re.findall(rule, "tip tep twp top") # return ['tip', 'top']  
  
# ^ 表示 补集，例如[^io] 表示除i和o外的其他字符  
rule = r"t[^io]p"   
re.findall(rule, "tip tep twp top") # return ['tep', 'twp']  
  
# ^ 也可以 匹配行首，表示要在行首才匹配，其他地方不匹配  
rule = r"^hello"  
re.findall(rule, "hello tep twp hello") # return ['hello']  
re.findall(rule, "tep twp hello") # return []  
  
# $ 表示匹配行尾  
rule = r"hello$"  
re.findall(rule, "hello tep twp hello") # return ['hello']  
re.findall(rule, "hello tep twp") # return []  
  
# - 表示范围  
rule = r"x[0123456789]x" # the same as  
rule = r"x[0-9]x"  
re.findall(rule, "x1x x4x xxx") # return ['x1x', 'x4x']  
rule = r"x[a-zA-Z]x"  
  
# \ 表示转义符  
rule = r"\^hello"  
re.findall(rule, "hello twp ^hello") # return ['^hello']  
# \d 匹配一个数字字符。等价于[0-9]。  
# \D 匹配一个非数字字符。等价于[^0-9]。  
# \n 匹配一个换行符。等价于\x0a和\cJ。  
# \r 匹配一个回车符。等价于\x0d和\cM。  
# \s 匹配任何空白字符，包括空格、制表符、换页符等等。等价于[ \f\n\r\t\v]。  
# \S 匹配任何非空白字符。等价于[^ \f\n\r\t\v]。  
# \t 匹配一个制表符。等价于\x09和\cI。  
# \w 匹配包括下划线的任何单词字符。等价于“[A-Za-z0-9_]”。  
# \W 匹配任何非单词字符。等价于“[^A-Za-z0-9_]”。  
  
# {} 表示重复规则  
# 例如我们要查找匹配是否是 广州的号码，020-八位数据  
# 以下三种方式都可以实现  
rule = r"^020-\d\d\d\d\d\d\d\d$"  
rule = r"^020-\d{8}$" # {8} 表示前面的规则重复8次  
rule = r"^020-[0-9]{8}$"  
re.findall(rule, "020-23546813") # return ['020-23546813']  
  
# * 表示将其前面的字符重复0或者多次  
rule = r"ab*"  
re.findall(rule, "a") # return ['a']  
re.findall(rule, "ab") # return ['ab']  
  
# + 表示将其前面的字符重复1或者多次  
rule = r"ab+"  
re.findall(rule, "a") # return []  
re.findall(rule, "ab") # return ['ab']  
re.findall(rule, "abb") # return ['abb']  
  
# ? 表示前面的字符可有可无  
rule = r"^020-?\d{8}$"  
re.findall(rule, "02023546813") # return ['020-23546813  
re.findall(rule, "020-23546813") # return ['020-23546813']  
re.findall(rule, "020--23546813") # return []  
  
# ? 表示非贪婪匹配  
rule = r"ab+?"  
re.findall(rule, "abbbbbbb") # return ['ab']  
  
# {} 可以表示范围  
rule = r"a{1,3}"  
re.findall(rule, "a") # return ['a']  
re.findall(rule, "aa") # return ['aa']  
re.findall(rule, "aaa") # return ['aaa']  
re.findall(rule, "aaaa") # return ['aaa', 'a']  
  
## compile re string  
rule = r"\d{3,4}-?\d{8}"  
re.findall(rule, "020-23546813")  
# faster when you compile it  
# return a object  
p_tel = re.compile(rule)  
p_tel.findall("020-23546813")  
  
# the parameter re.I 不区分大小写  
name_re = re.compile(r"xiaoyi", re.I)  
name_re.findall("Xiaoyi")  
name_re.findall("XiaoYi")  
name_re.findall("xiAOyi")  

2、常用函数

Re模块作为一个对象，它还支持很多的操作，例如：

[python] view plain copy
# the object contain some methods we can use  
# match 去搜索字符串开头，如果匹配对，那就返回一个对象，否则返回空  
obj = name_re.match('Xiaoyi, Zou')  
# search 去搜索字符串（任何位置），如果匹配对，那就返回一个对象  
obj = name_re.search('Zou, Xiaoyi')  
# 然后可以用它来进行判断某字符串是否存在我们的正则表达式  
if obj:  
    pass  
# findall 返回一个满足正则的列表  
name_re.findall("Xiaoyi")  
  
# finditer 返回一个满足正则的迭代器  
name_re.finditer("Xiaoyi")  
  
# 正则替换  
rs = r"z..x"  
re.sub(rs, 'python', 'zoux ni ziox me') # return 'python ni python me'  
re.subn(rs, 'python', 'zoux ni ziox me') # return ('python ni python me', 2), contain a number  
  
# 正则切片  
str = "123+345-32*78"  
re.split(r'[\+\-\*]', str) # return ['123', '345', '32', '78']  
  
# 可以打印re模块支持的属性和方法，然后用help  
dir(re)  
  
##### 编译正则表达式式 可以加入一些属性，可以增加很多功能  
# 多行匹配  
str = """ 
    hello xiaoyi 
    xiaoyi hello 
    hello zou 
    xiaoyi hello 
    """  
re.findall(r'xiaoyi', str, re.M)  

3、分组

分组有两个作用，它用()来定义一个组，组内的规则只对组内有效。

[python] view plain copy
# () 分组  
email = r"\w{3}@\w+(\.com|\.cn|\.org)"    
re.match(email, "zzz@scut.com")  
re.match(email, "zzz@scut.cn")  
re.match(email, "zzz@scut.org")  

另外，分组可以优先返回分组内匹配的字符串。

[python] view plain copy
# 另外，分组可以优先返回分组内匹配的字符串  
str = """ 
    idk hello name=zou yes ok d 
    hello name=xiaoyi yes no dksl 
    dfi lkasf dfkdf hello name=zouxy yes d 
    """  
r1 = r"hello name=.+ yes"  
re.findall(r1, str) # return ['hello name=zou yes', 'hello name=xiaoyi yes', 'hello name=zouxy yes']  
r2 = r"hello name=(.+) yes"  
re.findall(r2, str) # return ['zou', 'xiaoyi', 'zouxy']  
# 可以看到，它会匹配整个正则表达式，但只会返回()括号分组内的字符串，  
# 用这个属性，我们就可以进行爬虫，抓取一些想要的数据  

4、一个小实例-爬虫

这个实例利用上面的正则和分组的优先返回特性来实现一个小爬虫算法。它的功能是到一个给定的网址里面将.jpg后缀的图片全部下载下来。

[python] view plain copy
## 一个小爬虫  
## 下载贴吧 或 空间中的所有图片  
## getJpg.py  
  
#!/usr/bin/python  
import re  
import urllib  
  
# Get the source code of a website  
def getHtml(url):  
    print 'Getting html source code...'  
    page = urllib.open(url)  
    html = page.read()  
    return html  
  
# Open the website and check up the address of images,  
# and find the common features to decide the re_rule  
def getImageAddrList(html):  
    print 'Getting all address of images...'  
    rule = r"src=\"(.+\.jpg)\" pic_ext"  
    imReg = re.compile(rule)  
    imList = re.findall(imReg, html)  
    return imList  
  
def getImage(imList):  
    print 'Downloading...'  
    name = 1;  
    for imgurl in imList:  
        urllib.urlretrieve(imgurl, '%s.jpg' % name)  
        name += 1  
    print 'Got ', len(imList), ' images!'  
  
## main  
htmlAddr = "http://tieba.baidu.com/p/2510089409"  
html = getHtml(htmlAddr)  
imList = getImageAddrList(html)  
getImage(imList)  

八、深拷贝与浅拷贝

Python中对数据的复制有两个需要注意的差别：

浅拷贝：对引用对象的拷贝（只拷贝父对象），深拷贝：对对象资源的拷贝。具体的差别如下：

[python] view plain copy
##############################  
### memory operation  
## 浅拷贝：对引用对象的拷贝（只拷贝父对象）  
## 深拷贝：对对象资源的拷贝  
  
a = [1, 2, 3]  
b = a # id(a) == id (b), 同一个标签，相当于引用  
a.append(4) # a = [1, 2, 3, 4], and b also change to = [1, 2, 3, 4]  
  
import copy  
a = [1, 2, ['a', 'b']] # 二元列表  
c = copy.copy(a)  # id(c) != id(a)  
a.append('d') # a = [1, 2, ['a', 'b'], 'd'] but c keeps not changed  
# 但只属于浅拷贝，只拷贝父对象  
# 所以 id(a[0]) == id(c[0])，也就是说对a追加的元素不影响c，  
# 但修改a被拷贝的数据后，c的对应数据也会改变，因为拷贝不会改变元素的地址  
a[2].append('d') # will change c, too  
a[1] = 3 # will change c, too  
  
# 深拷贝  
d = copy.deepcopy(a) # 全部拷贝，至此恩断义绝，两者各走  
# 各的阳关道和独木桥，以后毫无瓜葛  

九、文件与目录

1、文件读写

Python的文件操作和其他的语言没有太大的差别。通过open或者file类来访问。但python支持了很多的方法，以支持文件内容和list等类型的交互。具体如下：

[python] view plain copy
########################  
## file and directory  
# file_handler = open(filename, mode)  
# mode is the same as other program langurage  
## read  
# method 1  
fin = open('./test.txt')  
fin.read()  
fin.close()  
  
# method 2, class file  
fin = file('./test.txt')  
fin.read()  
fin.close()  
  
## write  
fin = open('./test.txt', 'r+') # r, r+, w, w+, a, a+, b, U  
fin.write('hello')  
fin.close()  
  
### 文件对象的方法  
## help(file)  
  
for i in open('test.txt'):  
    print i  
  
str = fin.readline() # 每次读取一行  
list = fin.readlines() # 读取多行，返回一个列表，每行作为列表的一个元素  
fin.next() # 读取改行，指向下一行  
  
# 用列表来写入多行  
fin.writelines(list)  
  
# 移动指针  
fin.seek(0, 0)  
fin.seek(0, 1)  
fin.seek(-1, 2)  
  
# 提交更新  
fin.flush() # 平时写数据需要close才真正写入文件，这个函数可以立刻写入文件  

2、OS模块

os模块提供了很多对系统的操作。例如对目录的操作等。我们需要用import os来插入这个模块以便使用。

[python] view plain copy
#########################  
## OS module  
## directory operation should import this  
import os  
  
os.mkdir('xiaoyi') # mkdir  
os.makedirs('a/b/c', mode = 666) # 创建分级的目录  
os.listdir() # ls 返回当前层所有文件或者文件夹名到一个列表中（不包括子目录）  
os.chdir() # cd  
os.getcwd() # pwd  
os.rmdir() # rm  

3、目录遍历

目录遍历的实现可以做很多普遍的功能，例如杀毒软件，垃圾清除软件，文件搜索软件等等。因为他们都涉及到了扫描某目录下所有的包括子目录下的文件。所以需要对目录进行遍历。在这里我们可以使用两种方法对目录进行遍历：

1）递归

[python] view plain copy
#!/usr/bin/python  
#coding:utf8  
import os  
  
def dirList(path):  
    fileList = os.listdir(path)  
    allFile = []  
    for fileName in fileList:  
        # allFile.append(dirPath + '/' + fileName) # the same as below  
        filePath = os.path.join(path, fileName)  
        if os.path.isdir(filePath):  
            dirList(filePath)  
        allFile.append(filePath)  
    return allFile  

2）os.walk函数

[python] view plain copy
# os.walk 返回一个生成器，每次是一个三元组 [目录, 子目录, 文件]  
gen = os.walk('/')  
for path, dir, filelist in os.walk('/'):  
    for filename in filelist:  
        os.path.join(path, filename)  

十、异常处理

异常意味着错误，未经处理的异常会中止程序运行。而异常抛出机制，为程序开发人员提供一种在运行时发现错误，并进行恢复处理，然后继续执行的能力。

[python] view plain copy
###################################  
### 异常处理  
# 异常抛出机制，为程序开发人员提供一种在运行时发现错误，  
# 进行恢复处理，然后继续执行的能力  
  
# 用try去尝试执行一些代码，如果错误，就抛出异常，  
# 异常由except来捕获，并由我们写代码来处理这种异常  
try:  
    fin = open("abc.txt")  
    print hello  
    ### your usually process code here  
except IOError, msg:  
    print "On such file!"  
    ### your code to handle this error  
except NameError, msg:  
    print msg  
    ### your code to handle this error  
finally: # 不管上面有没有异常，这个代码块都会被执行  
    print 'ok'  
  
# 抛出异常，异常类型要满足python内定义的  
if filename == "hello":  
    raise TypeError("Nothing!!")  

阅读全文

0 0