CS107-Lecture 5-Note

来源：互联网发布：狼道seo 编辑：程序博客网时间：2024/06/06 18:23

lsearch

在Lecture 4和5介绍lsearch的设计过程中，我有一些小感悟：程序设计的问题有很多种答案，优秀的答案不是大笔一挥就跃然纸上的，而是不断地思考完善，不断地根据需求（比如采用的语言，针对的数据类型）优化得到的。言归正传，继续上次课的内容：针对specific数据类型的lsearch –> generic的lsearch –> generic lsearch中自定义comparison函数。

Program. 1. lsearch (generic data type)

void *lsearch(void *key, void *base, int n, int elemSize, int (*cmpfn)(void *, void *)){    for (int i=0; i<n; i++)    {        void *elemAddr = (char*)base + i*elemSize;        if(cmpfn(elemAddr, key) == 0) //memcmp有局限性，这里实现自定义cmpfn            return elemAddr;    }    return NULL;}

line 1，Jerry习惯将lsearch函数定义中的cmpfn表示成(*cmpfn)，尽管不加”()”结果也一样。不加表示返回一个指向int型变量的指针（本质是返回指针），加了”()”表明cmpfn是一个函数指针，指向函数首地址，该函数返回一个int值。

Program. 2. how to call lsearch

int array[] = {4, 2, 3, 7, 11, 6};int size = 6;   //I'll just hard code it as 6.int number = 7; //search for the number of 7int *found = lsearch(&number, array, size, sizeof(int), IntCmp);if(found == NULL) :-(else :-)

line 4， array前不需要&，因为array隐式地包含了&。lsearch前两个参数，无论传入的值是什么数据类型，在实现的时候都被视为void型指针，（经过强制类型转换后）用于接下来的指针的算术运算。

在call lsearch前首先implement comparison函数：

Program. 3. implement IntCmp()

int IntCmp(void *elem1, void *elem2){    int *ip1 = elem1; //为了和lsearch中的参数类型完全匹配，强制转换为int*    int *ip2 = elem2; //至于为啥非要int *我暂时也没理解    return *ip1-*ip2;}

讲完了这个针对integer的comparison函数，Jerry又比较了C实现的comparison泛型和其它语言中的templates：在C和其各种古老的specification中，能够做到如此轻量和快速已经很cool；现有各种语言中的template则more type safe、compiler time时more information get，但也存在code bloat问题。

Jerry: You have to recognize that this is not exactly the most elegant way, it just the best that C, with its specification that was more or less defined 35 years ago, can actually do. All the other languages you’ve ever heard of, they are all so much younger that they’ve learned from C’s mistakes, and they have better solution for supporting generics. There’re some plus to this. It’s very fast. You only use one copy of the code, ever, to do all of your linear searching. The template approach, it’s more type safe. You get more information at compiler time, but you get code bloat because you’ve got one instance of that lsearch algorithm for every single data type you ever searched for.

实现了int数据类型的比较函数，接下来实现字符串数据类型的比较函数。Jerry打了预防针：”This gets a lot more complicated when you start dealing with the problem of lsearching an array of C-strings. So, you’re going to have an array of char *’s, and you’re gonna have to search for a particular char * to see whether you have a match or not.”

Program. 4. implement StrCmp()

char *notes[] = {"Ab", "F#", "B", "Gb", "D"};char *favoriteNote = "Eb";char **found = lsearch(&favoriteNote, notes, 5, sizeof(char *), StrCmp);int StrCmp(void *vp1, void *vp1){    char *s1 = *(char **)vp1;    char *s2 = *(char **)vp2;}

line 1，字符串数组notes的存储需要理解：

Jerry: They’re not in the heap, they’re actually global variables that happen to be constant. It’s like normal global variables, except they happen to be character arrays that reside up there, and these are replaced at load time with the base address of the A, F and the D.

用过Java的话，应该对这种存储很熟悉。Java也会在new一个String对象时进行优化，防止性能较差或内存泄漏。一种方式是将String换成StringBuilder，因为String对象是immutable（不可变）的，对它的修改总会生成新对象；另一种方式是在某些情况下可以依靠编译器，比如连接静态字符串时如String test = "1"+"0"+"1";，编译器是不会在连接过程中生成3个String对象的。

line 3，变量found的类型“char **”需要理解：

这和IntCmp中的int *found如出一辙，都是已知要查找的数据类型X*，将found类型定位为X*的指针。

line 6, 7, 对vp1是不能直接解引用的:

因为编译器无法解释void*，但编译器理解void**解引用后是void *，所以才有了先对vp1强制类型转换为char **，再解引用。但如果在调用lsearch的时候，传入的是favoriteNote而不是&favoriteNote，那么这里可以直接char *s1 = (char *)vp1，但会使得函数实现不对称。

Figure. 1. StrCmp中的两跳指针和解引用

这里写图片描述

以上就讲完了lsearch。Jerry提了下作业的情况：“Now, for Assignment 2, search certainly comes up. As opposed to all of these examples, you know that there are some sordid flavor to the arrays that you’re searching there. If you haven’t read Assignment 2, again, I’ll try to be as generic as possible in my description. But you basically have the opportunity to binary search as opposed to linear search for Assignment 2.”课上到这，学生应该完成了Assignment 1并预习了Assignment 2，在能够用lsearch解决问题的基础上尝试使用bsearch。

bsearch

Jerry: There’s a built-in function called bsearch. It turns out that there’s a built-in function called lsearch as well. It’s not technically standard, but almost all compilers provide it, at least on UNIX systems. I’m gonna want you to use the generic bsearch algorithm which has more or less the same prototype as lsearch right here.

This is the prototype of the built-in bsearch:

void *bsearch(void *key, void *base, int n, int elemSize, int (*cmp)(void *, void*));

Jerry在这里强调了int (*cmp)(void *, void *)的性质，即cmp是纯函数不是方法，即使在Java和C++的大类里，也必须是和类无关的全局函数或static 函数，因为一旦涉及到方法，那么cmp就会隐式地包含this指针接收传入对象的地址。Jerry解释了函数和方法的区别（用Java和C++的话也是常说方法不说函数的）：

Jerry:“The difference between a function and a method, they look very similar, except that methods actually have the address of the relevant object lying around as this invisible paramter via this parameter called this.”

用C的语法实现栈

模仿C++和Java中的templates或泛型，用C的语法struct尽可能实现“类”定义，同时将应用前提限制为int型数据。

Program. 5. implement a stack data structure

stack.htypedef struct {    int *elems；    int logicalLen; //已经使用了多少空间    int allocLen;   //动态申请了多少空间}stack;void StackNew(stack *s);void StackDispose(stack *s); void StackPush(stack *s, int value);void StackPop(stack *s);

首先，C中没有class关键字，可以用struct类比；
其次，C中没有const, public, private（Jerry说他们的编译器支持C语言中的const？好神奇）；
最后，technically，三个int域都是暴露在外的，相当于public，操作这些域应该使用“函数”，而不是“方法”。

Program. 6. how to call stack

stack s;      //声明一块12字节大小的内存，但编译器不会对这块内存进行清理StackNew(&s); //初始化时：申请4个字节，一开始使用了0字节for(int i=0; i<5; i++){    StackPush(&s, i);}StackDispose(&s);

因为已经预先申请了4个字节，所以这4个字节在初始化时很快，因为已经被预留。如果想push第5个的话，则会doubling strategy另外寻找一个4＊2的内存，将前4个字节copy过来dispose旧空间，再push第5个。

Program. 7. how to call stack

void StackNew(stack *s) //s是一个局部变量（地址值），指向一块12字节大小的内存{    s->logicalLen = 0;    s->allocLen = 4;    s->elems = malloc(4*sizeof(int));    assert(s->elems != NULL);}

line 5, malloc是Java和C++中new的前身，Operator new会隐式地考虑数据类型，例如new int[4]或new double[20]，malloc只会从heap中找出这样一个块，返回该块的地址。

line 6, malloc一般都会和assert成对使用。It’s actually not a function it’s actually a macro. 如果测试结果为true，则什么都不执行；为false，assert会终止程序，compiler会告诉你执行代码的文件号（file number）和终止的assert语句的行号（line number of the assert that broke）。如果没有assert，内存申请失败（虽然不太可能），编译器会返回NULL，那么在程序接下来运行的某处，对NULL解引用，导致程序崩溃。

小tip：对seg fault或bus error，可以查一下发生seg fault行的assert。

0 0