Python 文件操作

来源：互联网发布：东方财富知乎编辑：程序博客网时间：2024/05/21 10:23

File模块文档

filepath=r"E:\workpalce\Python\data\Andersen's_Fairy_Tales\100.txt"f=file(filepath,mode='a+')### 模式f.mode### 判断是否关闭f.closed### 编码f.encoding### 终端是否能访问f.isatty()f.fileno()f.namef.readline()### 定位f.tell()### 移动位置，seek(offset,from=0)  from =0 从开头，1,：当前位置，2：最后的位置f.seek(10,1)f.tell()### 关闭f.close()

fileinput 模块

import fileinput

fileinput.input (files=None, inplace=False, backup=”, bufsize=0, mode=’r’, openhook=None)

files: 文件的路径列表，默认是stdin方式，多文件[‘1.txt’,’2.txt’,…]
inplace: 是否将标准输出的结果写回文件，默认不取代
backup: 备份文件的扩展名，只指定扩展名，如.bak。如果该文件的备份文件已存在，则会自动覆盖。
bufsize: 缓冲区大小，默认为0，如果文件很大，可以修改此参数，一般默认即可
mode: 读写模式，默认为只读
openhook: 该钩子用于控制打开的所有文件，比如说编码方式等;

常用函数

fileinput.input() #返回能够用于for循环遍历的对象
fileinput.filename() #返回当前文件的名称
fileinput.lineno() #返回当前已经读取的行的数量（或者序号）
fileinput.filelineno() #返回当前读取的行的行号
fileinput.isfirstline() #检查当前行是否是文件的第一行
fileinput.isstdin() #判断最后一行是否从stdin中读取
fileinput.close() #关闭队列

for line in fileinput.input(filepath):    print linefileinput.close()

1872FAIRY TALES OF HANS CHRISTIAN ANDERSENTHE SNOW QUEEN IN SEVEN STORIESby Hans Christian Andersensummer,- warm, beautiful summer。。。。。。。。。。。THE ENDLastIndexNext    Written By Anderson

%ls -l -h  E:\workpalce\Python\regular_python\ipython\data

 驱动器 E 中的卷是 NewDisk 卷的序列号是 283B-5BA5 E:\workpalce\Python\regular_python\ipython 的目录 E:\workpalce\Python\regular_python\ipython 的目录 E:\workpalce\Python\regular_python\ipython\data 的目录2016/09/25  10:54    <DIR>          .2016/09/25  10:54    <DIR>          ..2016/08/11  15:32             8,612 01.txt2016/08/11  15:32             4,110 02.txt2016/08/11  15:32             5,848 03.txt2016/08/11  15:32             2,791 04.txt2016/08/11  15:32            74,135 05.txt2016/08/11  15:32             9,676 06.txt2016/08/11  15:32            23,267 07.txt   ...........              30 个文件        458,105 字节               2 个目录 92,106,444,800 可用字节找不到文件

import osfiles=r"E:\workpalce\Python\regular_python\ipython\data\\"filepaths=[files+line for line in os.listdir(files)]

利用fileinput对多文件操作，并原地修改内容

for line in fileinput.input(filepaths):    if fileinput.isfirstline():        print fileinput.filename()fileinput.close()

E:\workpalce\Python\regular_python\ipython\data\\01.txtE:\workpalce\Python\regular_python\ipython\data\\02.txtE:\workpalce\Python\regular_python\ipython\data\\03.txtE:\workpalce\Python\regular_python\ipython\data\\04.txtE:\workpalce\Python\regular_python\ipython\data\\05.txtE:\workpalce\Python\regular_python\ipython\data\\06.txtE:\workpalce\Python\regular_python\ipython\data\\07.txt

详解python linecache模块读取文件的方法

python linecache模块读取文件
在python中，有个好用的模块linecache，该模块允许从任何文件里得到任何的行，并且使用缓存进行优化，常见的情况是从单个文件读取多行。

linecache.getlines(filename)
从名为filename的文件中得到全部内容，输出为列表格式，以文件每行为列表中的一个元素,并以linenum-1为元素在列表中的位置存储

linecache.getline(filename,lineno)
从名为filename的文件中得到第lineno行。这个函数从不会抛出一个异常–产生错误时它将返回”（换行符将包含在找到的行里）。
如果文件没有找到，这个函数将会在sys.path搜索。

linecache.clearcache()
清除缓存。如果你不再需要先前从getline()中得到的行

linecache.checkcache(filename)
检查缓存的有效性。如果在缓存中的文件在硬盘上发生了变化，并且你需要更新版本，使用这个函数。如果省略filename，将检查缓存里的所有条目。

linecache.updatecache(filename)
更新文件名为filename的缓存。如果filename文件更新了，使用这个函数可以更新linecache.getlines(filename)返回的列表。

import linecachelinecache_test=linecache.getline(filepath,lineno=8)linecache.clearcache()

linecache_test

'You must attend to the commencement of this story, for when we get\n'

itertools 构造迭代器

from itertools  import *

count(5, 2) #从5开始的整数循环器，每次增加2，即5, 7, 9, 11, 13, 15 …
cycle(‘abc’) #重复序列的元素，既a, b, c, a, b, c …
repeat(1.2) #重复1.2，构成无穷循环器，即1.2, 1.2, 1.2, …
repeat也可以有一个次数限制:
repeat(10, 5) #重复10，共重复5次

Iterators terminating on the shortest input sequence:

chain(p, q, …) –> p0, p1, … plast, q0, q1, …

compress(data, selectors) –> (d[0] if s[0]), (d[1] if s[1]), …

dropwhile(pred, seq) –> seq[n], seq[n+1], starting when pred fails

groupby(iterable[, keyfunc]) –> sub-iterators grouped by value of keyfunc(v)

ifilter(pred, seq) –> elements of seq where pred(elem) is True

ifilterfalse(pred, seq) –> elements of seq where pred(elem) is False

islice(seq, [start,] stop [, step]) –> elements from
seq[start:stop:step]

imap(fun, p, q, …) –> fun(p0, q0), fun(p1, q1), …

starmap(fun, seq) –> fun(*seq[0]), fun(*seq[1]), …

tee(it, n=2) –> (it1, it2 , … itn) splits one iterator into n

takewhile(pred, seq) –> seq[0], seq[1], until pred fails

izip(p, q, …) –> (p[0], q[0]), (p[1], q[1]), …

izip_longest(p, q, …) –> (p[0], q[0]), (p[1], q[1]), …

Combinatoric generators:

product(p, q, … [repeat=1]) –> cartesian product

permutations(p[, r])

combinations(p, r)

combinations_with_replacement(p, r)

groupby(iterable[, keyfunc]) -> create an iterator which returns
(key, sub-iterator) grouped by each value of key(value).

list(imap(pow,[1,2,3,4],[1,2,3,2]))

[1, 4, 27, 16]

list(ifilter(lambda x: x > 5, [2, 3, 5, 6, 7]))

[6, 7]

list(ifilterfalse(lambda x: x > 5, [2, 3, 5, 6, 7]))

[2, 3, 5]

list(izip(range(3),range(3,6),range(6,9)))

[(0, 3, 6), (1, 4, 7), (2, 5, 8)]

tee(range(10),2)   ##将可迭代对象分成n个迭代器

(<itertools.tee at 0x3804088>, <itertools.tee at 0x3804248>)

chain([1, 2, 3], [4, 5, 7])   #将两个迭代器链接product('abc', [1, 2]) # 多个循环器集合的笛卡尔积。相当于嵌套循环 permutations('abc', 2)   # 从'abcd'中挑选两个元素，比如ab, bc, ... 将所有结果排序，返回为新的循环器。  排列combinations('abc', 2)   # 从'abcd'中挑选两个元素，比如ab, bc, ... 将所有结果排序，返回为新的循环器。  组合combinations_with_replacement('abc', 2) # 与上面类似，但允许两次选出的元素重复。即多了aa, bb, ccislice(range(10),3,7)   ## 切片形成迭代器

<itertools.combinations at 0x39e3ef8>

将key函数作用于原循环器的各个元素。根据key函数结果，将拥有相同函数结果的元素分到一个新的循环器。每个新的循环器以函数返回结果为标签。

这就好像一群人的身高作为循环器。我们可以使用这样一个key函数: 如果身高大于180，返回”tall”；如果身高底于160，返回”short”;中间的返回”middle”。最终，所有身高将分为三个循环器，即”tall”, “short”, “middle”

def height_class(h):    if h > 180:        return "tall"    elif h < 160:        return "short"    else:        return "middle"friends = [191, 158, 159, 165, 170, 177, 181, 182, 190]friends = sorted(friends, key = height_class)for m, n in groupby(friends, key = height_class):    print(m)    print(list(n))

middle[165, 170, 177]short[158, 159]tall[191, 181, 182, 190]

persistence 序列化

marhsal 序列化方式是直接对每个元素进行序列化，然而pickle对那些出现次数多于1的只序列化一次，所以在重复性较高的情况下pickle的序列化效果较好

marshal

list_=range(10)import marshalgetmarshal=marshal.dumps(aa)# print aa# marshal.loads(aa)len(getmarshal)

pickle,cPickle

import picklerep_list_=list(repeat(list_,2))len(pickle.dumps(rep_list_))

0 0