Big-Endian 和 Little-Endian 字节排序

来源：互联网发布：万方数据库怎么收费编辑：程序博客网时间：2024/04/30 11:31

Big-Endian 和 Little-Endian 字节排序

字节排序	含义
Big-Endian	一个Word中的高位的Byte放在内存中这个Word区域的低地址处。
Little-Endian	一个Word中的低位的Byte放在内存中这个Word区域的低地址处。

必须注意的是：表中一个Word的长度是16位，一个Byte的长度是8位。如果一个数超过一个Word的长度，必须先按Word分成若干部分，然后每一部分(即每个Word内部)按Big-Endian或者Little-Endian的不同操作来处理字节。

一个例子：
如果我们将0x1234abcd写入到以0x0000开始的内存中，则结果为
                big-endian     little-endian
0x0000     0x12              0xcd
0x0001     0x34              0xab
0x0002     0xab              0x34
0x0003     0xcd              0x12
(注意：0xab换算成2进制是10101011，是个8bit的数。)

详细介绍如下：

不同体系的CPU在内存中的数据存储往往存在着差异。例如，Intel的x86系列处理器将低序字节存储在起始地址，而一些RISC架构的处理器，如IBM的370主机使用的PowerPC或Motorola公司生产的CPU，都将高序字节存储在起始位置。这两种不同的存储方式被称为little-endian和big-endian。

little-endian是x86系列CPU的数据存储方式，即将低序的部分存储在前面。而big-endian是将高序部分存储在前面。例如，要存储0xF432，little-endian将以32F4存储，而使用big-endian与此相反，将存储为F432，如图13.2所示。

程序p13.1.c讲解了如何判断系统是使用big-endian还是little-endian实现数据存储的。程序中使用的方法如下所示。

图13.2 big-endian与little-endian方式数据存储示例

（1）利用联合的特点。联合中的数据成员是共享存储空间的，所分配的空间为数据成员中最大所需的内存数。程序定义了名为endian_un的联合体，其中包含两个数据成员，一个是short类型的数据成员（在32位系统上，short类型的长度是2字节），一个是字符类型的字符数组，字符数组的元素个数为short类型的字节数。

程序将var赋值为0x0102。由于联合结构的特点，bits字符串数组中同样存储了0x0102这一数值。通过判断字符串中的低位和高位存储的内容，就可以知道系统是little-endian还是big-endian的。

（2）通过强制类型转换实现。程序中通过取flag变量的地址，获得起始空间的存储内容。如果起始空间存储的是数据的低位内容，则表示存储方式为little-endian，否则为big-endian。

程序的具体代码如下：

    //p13.1.c 判断big-endian与little-endian
#include <stdio.h>//使用类型的强制转换实现little-endian与big-endian的判断
int is_little_endian(void)
{ 
unsigned short flag=0x4321;
if(*(unsigned char*)&flag==0x21)
return 1;
else
return 0;
}
int main(void)
{
//利用联合的特点来判断little-endian与big-endian
union endian_un{
short var;
char bits[sizeof(short)];
};
union endian_un flag;
flag.var=0x0102;
//判断低位和高位的存储内容，确定是何种方式
if(sizeof(short)==2){
if(flag.bits[0]==1 && flag.bits[1]==2)
printf("judged by first method, big-endian/n");
else if(flag.bits[0]==2 && flag.bits[1]==1)
printf("judged by first method, little-endian/n");
else
printf("cannot determine the type/n");
}
if(is_little_endian())
printf("judged by second method, little-endian/n");
else
printf("judged by second method, big-endian/n");
return 0;
}

使用gcc编译p13.1.c，获得名为p13.1的可执行文件。执行该程序，具体输出如下。可以看到x86系统的内存数据存储方式为little-endian方式。

   [program@localhost charter13]$ gcc -o p13.1 p13.1.c 
[program@localhost charter13]$ ./p13.1 
judged by first method, little-endian
judged by second method, little-endian
[program@localhost charter13]$

之所以介绍big-endian和little-endian，是因为这一数据存储方式不仅影响程序在不同硬件平台中的移植，而且在网络编程中也要考虑字节顺序的问题。为了避免兼容性的问题，网络中的数据传输都使用了从高到低的顺序存储方式。因此，如果要将数据从低位字节优先（little-endian）的机器上发往网络，必须首先进行转换。而big-endian的机器是不需要转换的。

Linux系统提供了htons、htonl、ntohs、ntoh这4个函数用于进行字节顺序的转换。其中，h是host的缩写，n表示network。最后一个字符如果是s，表示short类型，如果是l，表示为long类型。4个函数的具体定义如下：
uint32_t htonl(uint32_t hostlong);
uint16_t htons(uint16_t hostshort);
uint32_t ntohl(uint32_t netlong);
uint16_t ntohs(uint16_t netshort);

htonl/htons：表示主机字节顺序转换成网络字节顺序，htonl函数和htons函数的区别在于参数长度存在差异。

ntohl/ntohs：表示网络字节顺序转换成主机字节顺序，ntohl函数和ntohs函数的区别在于参数长度存在差异。