memcpy() 函数的效率与平台相关.

来源:互联网 发布:七秀最美捏脸数据 编辑:程序博客网 时间:2024/06/13 07:14

先来看看微软开发工具下的 memcpy() 源代码(E:\Microsoft Visual Studio 9.0\VC\crt\src):

/****memcpy.c - contains memcpy routine**       Copyright (c) Microsoft Corporation. All rights reserved.**Purpose:*       memcpy() copies a source memory buffer to a destination buffer.*       Overlapping buffers are not treated specially, so propogation may occur.********************************************************************************/#include <cruntime.h>#include <string.h>#ifdef _MSC_VER#pragma function(memcpy)#endif  /* _MSC_VER *//****memcpy - Copy source buffer to destination buffer**Purpose:*       memcpy() copies a source memory buffer to a destination memory buffer.*       This routine does NOT recognize overlapping buffers, and thus can lead*       to propogation.**       For cases where propogation must be avoided, memmove() must be used.**Entry:*       void *dst = pointer to destination buffer*       const void *src = pointer to source buffer*       size_t count = number of bytes to copy**Exit:*       Returns a pointer to the destination buffer**Exceptions:*******************************************************************************/void * __cdecl memcpy (        void * dst,        const void * src,        size_t count        ){        void * ret = dst;#if defined (_M_IA64)        {        __declspec(dllimport)        void RtlCopyMemory( void *, const void *, size_t count );        RtlCopyMemory( dst, src, count );        }#else  /* defined (_M_IA64) */        /*         * copy from lower addresses to higher addresses         */        while (count--) {                *(char *)dst = *(char *)src;                dst = (char *)dst + 1;                src = (char *)src + 1;        }#endif  /* defined (_M_IA64) */        return(ret);}


在 16/32 位系统中, 一次拷贝一个字节的情况是非常浪费 CPU 效率的. 因为他们一般都要半字或字对齐. 读写数据一次就是 16/32bit. 如果在奇数地址上访问一个字节效率可想而知. 所以, 对于像 ARM 这种 4Byte 对齐的CPU而言下面的这种写法是效率最高的, 而且效率相比于一次一字节的情况, 不止是 4 倍的效率增长:

void my_memcpy(void * dest, const void * src, unsigned int n){    unsigned int i = 0;    long * Dest = (long *)dest;    long * Src  = (long *)src;    for (i = 0; i < (n >> 2); i++) {        Dest[i] = Src[i];    }}

当然了, 如果不能保证使用的 CPU 平台是 4B 对齐的, 可以在上述程序中添加代码来尽量保证实现高效率.

void my_memcpy(void * dest, const void * src, unsigned int n){    unsigned int i = 0;    long * Dest = (long *)dest;    long * Src  = (long *)src;    if (((unsigned long)Src) % 4 == 0) && ((unsigned long)Dest % 4 == 0) {        for (i = 0; i < (n >> 2); i++) {            Dest[i] = Src[i];        }    } else {        memcpy(dest, src, n);    }}
而这种调整只有在对性能要求敏感的场合使用, 如果不是这样还是要使用标准的库函数. 毕竟可移植性和可维护性也是很重要的.

===============================================================================================================================