为什么很多大牛在写题的时候要加一堆宏？

来源：互联网发布：演讲学而知不足编辑：程序博客网时间：2024/04/29 02:37

给你说几个 inline 无法代替宏的地方：

1. 循环展开：

// loop unroll double#define LOOP_UNROLL_DOUBLE(action, actionx2, width) do { \unsigned long __width = (unsigned long)(width); \unsigned long __increment = __width >> 2; \for (; __increment > 0; __increment--) { \actionx2;\actionx2;\}\switch (__width & 3) { \case 2: actionx2; break; \case 3: actionx2; \case 1: action; break; \}\}while (0)// loop unroll quatro#define LOOP_UNROLL_QUATRO(action, actionx2, actionx4, width) do { \unsigned long __width = (unsigned long)(width);\unsigned long __increment = __width >> 2; \for (; __increment > 0; __increment--) { \actionx4;\}\switch (__width & 3) { \case 2: actionx2; break; \case 3: actionx2; \case 1: action; break; \}\}while (0)

假设你需要高速循环做一个事情，那么展开循环可以极大的减少CPU分支，并且充分利用CPU流水线的并行效果，比如你开发一个 FIR滤波器来处理信号，那么你的代码如果从 for (...) { .... } 变成循环展开的话，可以这么写：

LOOP_UNROLL_DOUBLE({x = *src++;// do something with x and h and output to y*dst++ = y;},{x1 = *src++;x2 = *src++;// do something with x1 and h and output to y1// do something with x2 and h and output to y2*dst++ = y1;*dst++ = y2;},nsamples,);

如此写法将每个循环只计算一个 sample，变为每个循环同时计算两个sample，分开写代码，也能更好的利用 SIMD去加速同时多个 sample的计算过程，这就是利用循环展开来优化性能的用法，直接传 "{...}" 里面的运行代码给宏，宏不变，但是每处使用LOOP_UNROLL的地方 "{.. } " 中的代码都是不同的，inline是代替不了的，你总不至于传个函数指针过去吧，这时性能优化方面情况。

2. 函数组装：

想象一下，你写图形图像的代码，现在你需要给像素合成实现 SRC_ATOP， SRC_OVER, SRC_IN, SRC_OUT, DST_ATOP, DST_OVER, DST_IN, DST_OUT, XOR, PLUS, ALLANON, TINT, DIFF, DARKEN, LIGHTEN, SCREEN, OVERLAY 等等二十种像素合成的方法，你如果不用宏，那么你需要写多少个函数？20多个看起来类似的函数，你不得写疯了么？此时用函数指针其实是很浪费性能的事情，那么该如何写呢？你可以规定一系列用来计算composite的方法，接受两组 RGBA，生成新的，比如：

/* compositing */#define IBLEND_COMPOSITE(sr, sg, sb, sa, dr, dg, db, da, FS, FD) do { \(dr) = _ipixel_mullut[(FS)][(sr)] + _ipixel_mullut[(FD)][(dr)]; \(dg) = _ipixel_mullut[(FS)][(sg)] + _ipixel_mullut[(FD)][(dg)]; \(db) = _ipixel_mullut[(FS)][(sb)] + _ipixel_mullut[(FD)][(db)]; \(da) = _ipixel_mullut[(FS)][(sa)] + _ipixel_mullut[(FD)][(da)]; \}while (0)/* premultiply: src over */#define IBLEND_OP_SRC_OVER(sr, sg, sb, sa, dr, dg, db, da) do { \IUINT32 FD = 255 - (sa); \IBLEND_COMPOSITE(sr, sg, sb, sa, dr, dg, db, da, 255, FD); \}while (0)/* premultiply: dst atop */#define IBLEND_OP_DST_ATOP(sr, sg, sb, sa, dr, dg, db, da) do { \IUINT32 FS = 255 - (da); \IUINT32 FD = (sa); \IBLEND_COMPOSITE(sr, sg, sb, sa, dr, dg, db, da, FS, FD); \}while (0)/* premultiply: dst in */#define IBLEND_OP_DST_IN(sr, sg, sb, sa, dr, dg, db, da) do { \IUINT32 FD = (sa); \IBLEND_COMPOSITE(sr, sg, sb, sa, dr, dg, db, da, 0, FD); \}while (0)

然后用 #连接各种方法和格式，生成不同的函数，比如：

#define IPIXEL_COMPOSITE_FN(name, opname) \static void ipixel_comp_##name(IUINT32 *dst, const IUINT32 *src, int w)\{ \IUINT32 sr, sg, sb, sa, dr, dg, db, da; \for (; w > 0; dst++, src++, w--) { \_ipixel_load_card(src, sr, sg, sb, sa); \_ipixel_load_card(dst, dr, dg, db, da); \IBLEND_OP_##opname(sr, sg, sb, sa, dr, dg, db, da); \dst[0] = IRGBA_TO_A8R8G8B8(dr, dg, db, da); \} \}

然后开始生成我们的各种合成函数：

IPIXEL_COMPOSITE_PREMUL(pre_xor, XOR);IPIXEL_COMPOSITE_PREMUL(pre_plus, PLUS);IPIXEL_COMPOSITE_PREMUL(pre_src_atop, SRC_ATOP);IPIXEL_COMPOSITE_PREMUL(pre_src_in, SRC_IN);IPIXEL_COMPOSITE_PREMUL(pre_src_out, SRC_OUT);IPIXEL_COMPOSITE_PREMUL(pre_src_over, SRC_OVER);IPIXEL_COMPOSITE_PREMUL(pre_dst_atop, DST_ATOP);IPIXEL_COMPOSITE_PREMUL(pre_dst_in, DST_IN);IPIXEL_COMPOSITE_PREMUL(pre_dst_out, DST_OUT);IPIXEL_COMPOSITE_PREMUL(pre_dst_over, DST_OVER);

这样你相当于定义了：

ipixel_comp_pre_xor (...)ipixel_comp_pre_plus (...)....ipixel_comp_dst_over (...)

等好几个函数了，并且这些函数都是被你 “组装” 出来的，你并没有使用函数指针，也没有笨重的去写20多个函数。进一步如果你写图形图像你会发现你需要面对多种设备的像素格式，从 A8R8G8B8, A8B8G8R8 到 A1R5G5B5 , 主流需要处理的像素格式都有10多种。

那么你可以把 “从不同格式读取 r,g,b,a”，以及 “将 r,g,b,a组装成任意格式”，展开成很多个宏，然后不管你在这些像素格式里面做转换还是要做一些其他处理，你都可以用任意的 “像素读写” 宏 + “像素计算” 宏组装成一个个具体需要的函数。

所以用宏来解决性能问题，并且简化自己的程序设计往往能起到 inline不能起的作用，甚至能完成很多 template 所不能完成的任务。

3. 数据结构和算法：

具体可以参考 Linux Kernel的 include/linux/list.h:

struct list_head {struct list_head *next, *prev;};#define INIT_LIST_HEAD(ptr) do { \(ptr)->next = (ptr); (ptr)->prev = (ptr); \} while (0)/* * Insert a new entry between two known consecutive entries.  * * This is only for internal list manipulation where we know * the prev/next entries already! */static __inline__ void __list_add(struct list_head * new,struct list_head * prev,struct list_head * next){next->prev = new;new->next = next;new->prev = prev;prev->next = new;}

这里定义了一个 LIST，kernel中，能用 inline的地方都用了，但是有些地方用不了，比如，你有一个结构体（netfilter 部分）：

struct nf_hook_ops{struct list_head list;/* User fills in from here down. */nf_hookfn *hook;int pf;int hooknum;/* Hooks are ordered in ascending priority. */int priority;};

然后你有一个链表，记录着很多 nf_hook_ops，你取到了其中一个节点的指针，其实是指向结构体的 &list这个成员的，你需要得到对应结构体的指针，那么你可以用下面的 list 的宏：

/** * list_entry - get the struct for this entry * @ptr:the &struct list_head pointer. * @type:the type of the struct this is embedded in. * @member:the name of the list_struct within the struct. */#define list_entry(ptr, type, member) \((type *)((char *)(ptr)-(unsigned long)(&((type *)0)->member)))

比如，list_entry(ptr, struct nf_hook_ops, list) 就能根据节点指针，和在某个 struct里面的位置，取到某个节点对应的 struct的指针了。这个做法，用 inline也是没法做的。

同样的应用，在 Kernel中，还有红黑树 rbtree.h，rbtree.c中的实现，和 list很类似，大量的宏应用。Linux 用基础的宏实现的 list, rbtree等基础数据结构，用起来是相当方便的，有些地方比 std::list, std::map 都方便多了，比 STL性能高的同时，避免象模版一样为每种类型生成不同的代码，让你的二进制程序变得很臃肿。

比如你在做题的时候，用上了这样的数据结构，你程序会比用 stl容器的代码更高效和精简，同时你不知道目标平台 STL是怎么实现的，你无法控制，明明我在这个平台写着很快的代码，为何换个平台又慢了，为了追求究极性能，这样重新定义数据结构，其实是可以理解的。

4. 其他 inline 无法代替宏的地方

针对不同平台特性（比如整数是32还是64，lsb还是msb）写出的优化代码。
泛型的模拟
小型高频重复的代码片
硬件操作的定义

等等，很多情况，inline或者 template还是无法把宏给代替了，所以很多开源项目的代码里面，大量的出现各种宏，主要是出于这些方面的考虑。

0 0