编译原理——lex 与yacc实例剖析

来源：互联网发布：顾比均线源码编辑：程序博客网时间：2024/06/04 17:41

这段时间一直在反思教育问题，把自己以前的书翻出来好好读，发现了许多不明白，未曾真懂得东西。

刚刚看完了词法分析和语法分析，越看越简单，不知道以前怎么会觉得它这么难。总之以前还是缺少实践。

下面来谈谈我在做lex和yacc遇到的一个例子。

lex与yacc（第二版)原书第一章有个实例源码是这样的。

ch1-05.l

%{
/*
* We now build a lexical analyzer to be used by a higher-level parser.
*/

#include "ch1-05y.h" /* token codes from the parser */

#define LOOKUP 0 /* default - not a defined word type. */

int state;

/n { state = LOOKUP; }

/./n { state = LOOKUP;
return 0; /* end of sentence */
}

^verb { state = VERB; }
^adj { state = ADJECTIVE; }
^adv { state = ADVERB; }
^noun { state = NOUN; }
^prep { state = PREPOSITION; }
^pron { state = PRONOUN; }
^conj { state = CONJUNCTION; }

[a-zA-Z]+ {
      if(state != LOOKUP) {
      add_word(state, yytext);
      } else {
  switch(lookup_word(yytext)) {
  case VERB:
    return(VERB);
  case ADJECTIVE:
    return(ADJECTIVE);
  case ADVERB:
    return(ADVERB);
  case NOUN:
    return(NOUN);
  case PREPOSITION:
    return(PREPOSITION);
  case PRONOUN:
    return(PRONOUN);
  case CONJUNCTION:
    return(CONJUNCTION);
  default:
    printf("%s: don't recognize/n", yytext);
    /* don't return, just ignore it */
  }
            }
          }

. ;

%%
/* define a linked list of words and types */
struct word {
char *word_name;
int word_type;
struct word *next;
};

struct word *word_list; /* first element in word list */

extern void *malloc();

int
add_word(int type, char *word)
{
struct word *wp;

if(lookup_word(word) != LOOKUP) {
printf("!!! warning: word %s already defined /n", word);
return 0;
}

/* word not there, allocate a new entry and link it on the list */

wp = (struct word *) malloc(sizeof(struct word));

wp->next = word_list;

/* have to copy the word itself as well */

wp->word_name = (char *) malloc(strlen(word)+1);
strcpy(wp->word_name, word);
wp->word_type = type;
word_list = wp;
return 1; /* it worked */
}

int
lookup_word(char *word)
{
struct word *wp = word_list;

/* search down the list looking for the word */
for(; wp; wp = wp->next) {
if(strcmp(wp->word_name, word) == 0)
return wp->word_type;
}

return LOOKUP; /* not found */
}

ch1-05.y

%{
/*
* A lexer for the basic grammar to use for recognizing english sentences.
*/
#include <stdio.h>
%}

%token NOUN PRONOUN VERB ADVERB ADJECTIVE PREPOSITION CONJUNCTION

%%
sentence: subject VERB object { printf("Sentence is valid./n"); }
;

subject: NOUN
| PRONOUN
;

object: NOUN
;
%%

extern FILE *yyin;

main()
{
while(!feof(yyin)) {
yyparse();
}
}

yyerror(s)
char *s;
{
fprintf(stderr, "%s/n", s);
}

ch1-05.y是yacc程序，yyparse()例程表示开始语法分析，根据编译原理所学，yyparse会调用lex的yylex，
执行yylex后，符合词法规则，则执行相应动作返回给yyparse。

编译运行后(在flex和bison下，使用cygwin模拟环境)

flex ch1-05.l
mv lex.xx.c ch1-05.c
bison -d ch1-05.y
gcc -g -DYYDEBUG -c -o ch1-05l.o ch1-05l.c
gcc -g -DYYDEBUG -c -o ch1-05y.o ch1-05y.c
gcc -g -o ch1-05.pgm ch1-05l.o ch1-05y.o -lfl
其中lfl为flex库。

（1）第一次运行./ch1-05.pgm,报告segment fault。使用gdb调试，在while(!feof(yyin))处发生，通过
对ch1-05l.c和ch1-05y.c源代码的查看加上猜测，估计此事yyin还没有值，应该是在yyparse后才会有值。
于是ch1-05y.c的main代码改为

main()
{
do {
yyparse();
}while(!feof(yyin))

}

(2) 第二次运行./ch1-05.pgm,输入如下:
verb is are am
noun i he pig
he is student
报告sentence is valid
再次输入
he is student
报告
syntax error

为什么呢？
想了一下，回过头看了下语法分析LALR和ch1-05.l分析的过程
发现yyparse调用后，经过了多次yylex的调用，已经把
he is student规约为sentence,但是由于缺少句号，一次yyparse调用并没有结束，再输入第二个he is student,即
栈内为sentence，输入为he is student，无法再规约，报告语法错误。

(3)第三次运行./ch1-05.pgm，输入如下：
verb is are am
noun i he pig
he is student
报告sentence is valid
再次输入
. 此处输入后yylex返回0，则本次yyparse调用结束。

he is student.

报告
sentence is valid.

上面就是基本的词法与语法分析的一个例子，应该对大家会有点启发。