cPickle.load和cPickle.dump

来源:互联网 发布:python读取xlsx文件 编辑:程序博客网 时间:2024/05/14 14:06

加载和存储数据:

有些training set 数据是用.pkl 文件存储的。用module cPickle读取非常方便。

#加载cPickle模块
<span style="font-family: Arial, Helvetica, sans-serif;">import cPickle</span><span style="font-family: Arial, Helvetica, sans-serif;"></span>

常用的命令有以下两个:

pickle.dump(obj, file[, protocol])pickle.load(file)


例如

#加载数据集
f = gzip.open('mnist.pkl.gz', 'rb')train_set, valid_set, test_set = cPickle.load(f)f.close()
#存储数据

pickle.dump(d,open("noProt", 'w'))pickle.dump(d,open("prot0", 'w'), protocol=0)pickle.dump(d,open("prot1", 'w'), protocol=1)pickle.dump(d,open("prot2", 'w'), protocol=2)pickle.dump(d,open("prot2", 'w'), protocol=-1)

  • Protocol version 0 is the original ASCII protocol and is backwards compatible with earlier versions of Python.
  • Protocol version 1 is the old binary format which is also compatible with earlier versions of Python.
  • Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes.

协议0是以ASCII 码存储,并且向前兼容;

协议1是以2进制码存储,向前兼容;

协议2是Python2.3 之后的格式;

协议-1是当前发布的最新格式。


好的格式会有更好的压缩性。

例如stackoverflow上有人做的实验(http://stackoverflow.com/questions/23582489/python-pickle-protocol-choice):

import numpy as npimport pickleclass data(object):    def __init__(self):        self.a = np.zeros((100, 37000, 3), dtype=np.float32)d = data()print "data size: ", d.a.nbytes/1000000.print "highest protocol: ", pickle.HIGHEST_PROTOCOLpickle.dump(d,open("noProt", 'w'))pickle.dump(d,open("prot0", 'w'), protocol=0)pickle.dump(d,open("prot1", 'w'), protocol=1)pickle.dump(d,open("prot2", 'w'), protocol=2)out >> data size:  44.4out >> highest protocol:  2

实验结果:

  • noProt: 177.6MB
  • prot0: 177.6MB
  • prot1: 44.4MB
  • prot2: 44.4MB


https://docs.python.org/2/library/pickle.html

http://stackoverflow.com/questions/23582489/python-pickle-protocol-choice

0 0
原创粉丝点击