编译原理:C语言词法分析器

来源:互联网 发布:windows播放器解码器 编辑:程序博客网 时间:2024/05/28 06:04

编译原理的实验:完成对C语言的词法分析


先说一下整体框架:

基类:Base  封装了一些基础的字符判断函数,如下:

int charkind(char c);//判断字符类型int spaces(char c); //当前空格是否可以消除int characters(char c);//是否是字母int keyword(char str[]);//是否是关键字int signwords(char str[]);//是否是标识符int numbers(char c);//是否是数字int integers(char str[]);//是否是整数int floats(char str[]);//是否是浮点型


派生类 LexAn 继承Base并且封装了对行和单词处理的函数,如下:

void scanwords(); //处理每一行void clearnotes();//清除注释和多余的空格void getwords(int state);//处理出单词void wordkind(char str[]);//判断单词类型并且输出

函数之间调用关系如下:




好了,整体框架说完了,我们来说具体的实现:


(一)清除注释和多余的空格


(1)C语言的注释有//和/* 两种形式,所以如果当前读进的是 / 只需分情况判断下一个:

如果是/ 那么本行 //之后的肯定都是注释,只需要保存注释,更新当前行即可;

如果是* ,那么接着寻找直至 */位置,保存注释,更新当前行,然后继续这个操作(有可能有本行有多个 /* */).

不足:不能处理跨行注释。

(2)处理多余的空格这里较为草率,只处理了形如if (    a    >=   b  ),即特殊符号和字母(数字)之间的空格;只要空格两端有特殊符号,那么去掉当前空格便不会造成错误。


void LexAn::clearnotes(){int i, j, k;int noteCount = 0;int flag = 0;char note[100];/*注释*/for (i = 0; bufferin[buffernum][i] != '\0'; i++){if (bufferin[buffernum][i] == '"'){flag = 1 - flag;continue;}if (bufferin[buffernum][i] == '/' && flag == 0){if (bufferin[buffernum][i + 1] == '/'){for (j = i; bufferin[buffernum][j] != '\0'; j++){note[noteCount++] = bufferin[buffernum][j];}note[noteCount] = '\0';noteCount = 0;fprintf(fout, "  [ %s ]  ----  [ 注释 ]\n", note);bufferin[buffernum][i] = '\0';break;}if (bufferin[buffernum][i + 1] == '*'){note[noteCount++] = '/';note[noteCount++] = '*';for (j = i + 2; bufferin[buffernum][j] != '\0'; j++){note[noteCount++] = bufferin[buffernum][j];if (bufferin[buffernum][j] == '*' && bufferin[buffernum][j + 1] == '/'){j += 2;note[noteCount++] = bufferin[buffernum][j];note[noteCount] = '\0';noteCount = 0;fprintf(fout, "  [ %s ]  ----  [ 注释 ]\n", note);break;}}for (; bufferin[buffernum][j] != '\0'; j++, i++){bufferin[buffernum][i] = bufferin[buffernum][j];}if (bufferin[buffernum][j] == '\0'){bufferin[buffernum][i] = '\0';}}}}//空格 for (i = 0, flag = 0; bufferin[buffernum][i] != '\0'; i++){if (bufferin[buffernum][i] == '"'){flag = 1 - flag;continue;}if (bufferin[buffernum][i] == ' ' && flag == 0){for (j = i + 1; bufferin[buffernum][j] != '\0' && bufferin[buffernum][j] == ' '; j++){}if (bufferin[buffernum][j] == '\0'){bufferin[buffernum][i] = '\0';break;}if (bufferin[buffernum][j] != '\0' && ((spaces(bufferin[buffernum][j]) == 1) || (i > 0 && spaces(bufferin[buffernum][i - 1]) == 1))){for (k = i; bufferin[buffernum][j] != '\0'; j++, k++){bufferin[buffernum][k] = bufferin[buffernum][j];}bufferin[buffernum][k] = '\0';i--;}}}//制表符 for (i = 0, flag = 0; bufferin[buffernum][i] != '\0'; i++){if (bufferin[buffernum][i] == '\t'){for (j = i; bufferin[buffernum][j] != '\0'; j++){bufferin[buffernum][j] = bufferin[buffernum][j + 1];}i = -1;}}}

(二)最重要的状态机的转化


画图不是很好话,我尽量用语言清除地描述,大家还需结合源码分析:

主要分为 <字母, 1> <数字,  2> <$ _ ,  3> <4 ,/ >(转义) < = ,5> <0,else >

state初始值设为0:

(1)如果首位字符是字母,那么只可能是标识符和关键字,之后遇到除 数字,字母,$,_,之外的字符结束,取出单词。

(2)如果首位字符是数字,那么只能是数字,即八进制,十六进制,. ,数字,$ ,之后遇到除上述之外的字符结束,取出单词。

(3)如果首位是$ _ ,那么只能是标识符,即字母,数字,$,之后遇到除上述之外的字符结束,取出单词。

(4)如果首位是特殊字符(" . () = 等),那么再分开处理,流程和上述的一致,遇到不可能的组合结束;这部分看代码吧。


//状态机void LexAn::getwords(int state){char word[100];int charCount = 0;int finish = 0;int num;int i, j, k;for (i = 0; bufferscan[i] != '\0'; i++){switch (state / 10){case 0:switch (charkind(bufferscan[i])){case 1:word[charCount++] = bufferscan[i];state = 10;break;case 2:word[charCount++] = bufferscan[i];state = 20;break;case 3:word[charCount++] = bufferscan[i];state = 30;break;case 0: case 5:word[charCount++] = bufferscan[i];switch (bufferscan[i]){case '"':state = 41;break;case '\'':state = 42;break;case '(': case ')': case '{': case '}': case '[': case ']': case ';': case ',': case '.':state = 50;word[charCount] = '\0';finish = 1;break;case '=':state = 43;break;default:state = 40;break;}break;default: word[charCount++] = bufferscan[i]; break;}break;case 1:switch (charkind(bufferscan[i])){case 1:word[charCount++] = bufferscan[i];state = 10;break;case 2:word[charCount++] = bufferscan[i];state = 20;break;case 3:word[charCount++] = bufferscan[i];state = 30;break;case 0:case 5:word[charCount] = '\0';num = 0;while (word[num] != '\0')num++;<span style="color:#ff6600;">//长度的处理 !!if (num>7)word[7] = '\0';</span>i--;finish = 1;state = 50;break;default: word[charCount++] = bufferscan[i]; break;}break;case 2:switch (charkind(bufferscan[i])){case 1:word[charCount++] = bufferscan[i];state = 20;break;case 2:word[charCount++] = bufferscan[i];state = 20;break;case 3:word[charCount++] = bufferscan[i];state = 30;break;case 0:if (bufferscan[i] == '.'){word[charCount++] = bufferscan[i];state = 20;break;}word[charCount] = '\0';i--;finish = 1;state = 50;break;default: word[charCount++] = bufferscan[i]; break;}break;case 3:switch (charkind(bufferscan[i])){case 1:word[charCount++] = bufferscan[i];state = 30;break;case 2:word[charCount++] = bufferscan[i];state = 30;break;case 3:word[charCount++] = bufferscan[i];state = 30;break;case 0:word[charCount] = '\0';i--;finish = 1;state = 50;break;default: word[charCount++] = bufferscan[i]; break;}break;case 4:switch (state){case 40:switch (charkind(bufferscan[i])){case 1:word[charCount] = '\0';i--;finish = 1;state = 50;break;case 2:word[charCount] = '\0';i--;finish = 1;state = 50;break;case 3:word[charCount] = '\0';i--;finish = 1;state = 50;break;case 0:word[charCount++] = bufferscan[i];state = 40;break;default: word[charCount++] = bufferscan[i]; break;}break;case 41:word[charCount++] = bufferscan[i];if (bufferscan[i] == '"'){if (charkind(bufferscan[i - 1]) == 4){}else{word[charCount] = '\0';finish = 1;state = 50;}}break;case 42:word[charCount++] = bufferscan[i];if (bufferscan[i] == '\''){word[charCount] = '\0';finish = 1;state = 50;}break;case 43:if (bufferscan[i] == '='){word[charCount++] = bufferscan[i];state = 43;}else{word[charCount] = '\0';finish = 1;i--;state = 50;}break;default: word[charCount++] = bufferscan[i]; break;}break;case 5:finish = 0;state = 0;charCount = 0;i--;wordkind(word);break;default:break;}if (bufferscan[i + 1] == '\0'){word[charCount] = '\0';wordkind(word);}}}

另外注意:应实验要求,对长度超过7的标识符直接截断。如果需要正常处理的话删掉代码中红色标注的部分即可。


(三)效果截图:



本项目全部源码放在个人  Github上,欢迎大家star和fork学习哈。




3 0
原创粉丝点击