python struct中pack和unpack

来源：互联网发布：ps软件安装教程编辑：程序博客网时间：2024/04/29 13:16

python struct的中pack和unpack方法，其实是外界通信的一种方式；

其实就是序列化和反序列化；

pack：序列化，将python的数据类型表示为外界可认知的数据类型；

unpack: 反序列化，将外界的数据类型转化Python的数据类型；

需要注意两点：

1. 数据类型的对应关系；

2. 数据类型表示的字节序问题；

python的format串与c的数据type的类型对应关系（参见python doc）：
FormatC TypePythonNotesxpad byteno value ccharstring of length 1 bsigned charinteger Bunsigned charinteger hshortinteger Hunsigned shortinteger iintinteger Iunsigned intlong llonginteger Lunsigned longlong qlong longlong(1)Qunsigned long longlong(1)ffloatfloat ddoublefloat schar[]string pchar[]string Pvoid *integer

字节序表示方法：

FormatC TypePythonNotesxpad byteno value ccharstring of length 1 bsigned charinteger Bunsigned charinteger hshortinteger Hunsigned shortinteger iintinteger Iunsigned intlong llonginteger Lunsigned longlong qlong longlong(1)Qunsigned long longlong(1)ffloatfloat ddoublefloat schar[]string pchar[]string Pvoid *integer

这两天做TCP协议，数据的传输都是二进制的，需要解析，

于是用到了struct 看到这样一句代码：

Python代码  
length = struct.unpack('>I', self.buffer[:4])[0]  

当时没有明白format=">I"是什么意思，从google找了一下，有人说这个东西，可都是比较笼统，没能让我明白，

于是硬着头皮看API：
By default, C numbers are represented in the machine’s native format and byte order,

and properly aligned by skipping pad bytes if necessary (according to the rules used by the C compiler).
通常，C语言下数字都是机器语言的格式并且按照字节排序，

同时在需要的情况下会利用跳过填补的字节来进行适当的调整

Alternatively, the first character of the format string can be used to indicate the byte order, size and alignment of the packed data。
非此即彼：字符串的第一个字符要么被用于表示字符串的字节的排序，

或者是字符串的size，还有就是数据是否对准。

Native byte order is big-endian or little-endian, depending on the host system.

For example, Motorola and Sun processors are big-endian; Intel and DEC processors are little-endian.
计算机的字节序要么是高位顺序，要么是低位的，这依赖于主机本身。

比如，摩托罗拉和sun的处理器是高位的，但是intel和DEC的是低位的。

这样子就明白了

上面的format=">I"的意思，也就是说按照高位顺序来格式化取得一个int或long值。

下面问题就又来了，你怎么知道读取的就是一个int或long值呢？

通过看struct的文档，可以看到struct通过两张表制定了一定的format规则，

我按照自己的观察，给他归纳为两类，一个是和C当中类型的对照，

另一个就是选择按照高位还是低位来解释字节。上面已经说了高低字节顺序，那么观察和C对照的表格，发现I 代表的就是integer or long ，详细的可以去看python的API。

python的format串与ctype的类型对应关系（参见python doc）：
FormatC TypePythonNotesxpad byteno value ccharstring of length 1 bsigned charinteger Bunsigned charinteger hshortinteger Hunsigned shortinteger iintinteger Iunsigned intlong llonginteger Lunsigned longlong qlong longlong(1)Qunsigned long longlong(1)ffloatfloat ddoublefloat schar[]string pchar[]string Pvoid *integer

下面是一些使用的例子，具体的使用，可以参考这些例子：
1. 设置fomat格式，如下:
# 取前5个字符，跳过4个字符华，再取3个字符

 # 取前5个字符，跳过4个字符华，再取3个字符  format = '5s 4x 3s'

2. 使用struck.unpack获取子字符串

import struct format = '5s 4x 3s'print struct.unpack(format, 'Test astring') #('Test', 'ing')

来个简单的例子吧，

有一个字符串'He is not very happy'，处理一下，把中间的not去掉，然后再输出。

import struct theString = 'He is not very happy' format = '2s 1x 2s 5x 4s 1x 5s' print ' '.join(struct.unpack(format, theString))

输出结果：
He is very happy

关于网络字节的东东，从网上看来的，感觉有用：

Python的socket库采用二进制串来来编码待发送和接收的数据，

这样当我们用
i = socket.recv(4) 来接收一个4字节的整数时，

该整数实际上是以二进制的形式保存在字符串 i 的前4个字节中；

大多数的时候我们需要的是一个真正的integer/long型，

而不是一个用string型表示的整型。

这时我们可以使用struct库：Interpret

strings as packed binary data. 对上面的情况，

我们可以写
t = unpack("I", i)

第一个参数是格式化字符串，I指明字符串 i 包含的头一个数据项，是一个以C语言的unsigned integer表示的整数，

这里 i 只包含了一个数据项，

实际上这个被解释的字符串也可以包含多个数据项，

只要在格式化字符串里为每项数据指明一个格式即可；

自然地，unpack返回的就是一个tuple类型了。

0 0