编译原理课程实验

来源：互联网发布：网络安全教育体会编辑：程序博客网时间：2024/05/01 12:41

直接粘实验报告，debug的时间较少，估计还有很多错儿，有些情况没考虑到。不过状态机的基本思路就是这样，其他的部分可以自行删改。

一、实验概述

1.1、实验要求

选择计算机高级语言之一-----C语言，运用恰当的此法分析技术线路，设计和实现其对应的词法分析器。

建议：编程语言，选择《计算机程序设计》课程所采用的语言。

提示：技术线路选择如下两种之一：

正则式→NFA→DFA→minDFA→程序设计

或正则文法→NFA→DFA→minDFA→程序设计。

要求：分析器输出结果存入到磁盘文件中，具有出错处理功能。

1.2、实验目的

1）加深对编译原理及其构造词法分析器的原理和技术理解与应用，进一步提高学生编程能力；

2）培养、提高学生分析问题、解决问题的综合能力；

3）整理资料，撰写规范的实验报告;

二、系统分析

2.1、系统需求

根据C语言语法，待分析的词可以分为如下几类：

(1) 关键字

如if, else, whlile, int 等。

(2) 标示符

开头只能为字母，后面可以接数字或者字母，用来表示各种名字，如变量名、常量名和过程名等

(3) 常数

各种类型的常数，如整型（1, 30），浮点型（2.16），字符串型（”AHD”）,字符型（’A’）

(4) 运算符与界符

如+, *, <= , 逗号等。

2.2、系统功能

读入一个C语言源程序（经过预处理的），对每一个单词输出一些三元组的集合。

2.3、系统实现步骤

按照如下顺序构造词法分析器：

(1) 设计出各类单词的正规式，画出有限状态自动机。

(2) 将各类单词的正规式转换成相应的NFA M，并将其合并成一个NFA M`

(3) 将NFA M`转换成对应的DFA M``

(4) 将DFA M``最小化为DFA M```

(5) 根据DFA M```用C语言设计出相应的词法分析器。

三、系统设计

3.1、有限状态自动机设计

状态机说明：由于单词的构成较为复杂，所以再设计时，边的变迁不再是一个字符，而是一个函数。若当前输入串满足该函数，则当前状态可以变迁到该边连接的下一状态。

根据终态可以看出自动机可以分离的状态有：

INT

整数

FLOAT

浮点数

CHAR

字符型

CHARS

字符串型

IDENT

标识符（包括关键字）

SYMBOL

符号

其中，关键字的分离在辅助程序中进行。

3.2、单词符号对应的种别码

种别码

单词符号及说明

种别码

单词符号及说明

INT（整数）

FLOAT（浮点数）

CHAR（字符型）

CHAR（字符串型）

IDENT（标识符）

else

int

char

float

double

long

short

return

;

while

(

break

)

{

}

[

]

3.3、基本数据结构及代码设计

代码是用C++完成的。为状态机定义了两个基本的结构体,分别为STATE和LIST。其中STATE是LIST的友元，STATE表示的是状态机中的一个状态，包括error和start等状态。LIST的实例是依附于一个STATE的实例存在的，他表示一条边，边的值是一个函数指针，该边指向一个满足该函数的另一状态。

另外用到了STL库中的MAP模板，定义为map<string,string>type用以存放关键字及其对应的种别码。

程序的输出为三元组的集合，其中三元组定义为<单词名, 单词含义 ,种别码>。如一个标识符abc的三元组为<abc, IDENT , 4> 。若某个单词错误，则会输出 error: name。输出的最后会显示共识别了多少单词，并发现多少错误。

四、系统实现

4.1 系统运行

l 在命令行里直接输入待翻译的文件和输出的文件名。如果没有给参数，缺省为输入”input.txt”，输出”output.txt”。

l 输出的结果。

4.2 系统结果

l input.txt

其中，第六行为错误行。

int main()

{

freopen("input.txt","r",stdin);

char input[255],*s = input;

int t = 1;

floatp = 12.4;

int 0a = 2;

init();

while(gets(s))

{

curn = 0;

printf("Line %d:\n",t++);

while((*s)!= 0)

{

while((*s) == ' ') s++;

curn = 0;

curtype = 0;

print(s,start->start(s));

s += curn;

}

return 0;

}

l output.txt

********Line 1*********:

<int, int, 7>

<main, IDENT, 3>

<(, (, 34>

<), ), 35>

********Line 2*********:

<{, {, 36>

********Line 3*********:

<freopen, IDENT, 3>

<(, (, 34>

<"input.txt",CHARS, 2>

<,, ,, 43>

<"r", CHARS, 2>

<,, ,, 43>

<stdin, IDENT, 3>

<), ), 35>

<;, ;, 33>

********Line 4*********:

<char, char, 8>

<input, IDENT, 3>

<[, [, 38>

<255, INT, 0>

<], ], 39>

<,, ,, 43>

<*, *, 45>

<s, IDENT, 3>

<=, =, 25>

<input, IDENT, 3>

<;, ;, 33>

********Line 5*********:

<int, int, 7>

<t, IDENT, 3>

<=, =, 25>

<1, INT, 0>

<;, ;, 33>

********Line 6*********:

<float, float, 9>

<p, IDENT, 3>

<=, =, 25>

<12.4, FLOAT, 1>

<;, ;, 33>

********Line 7*********:

<int, int, 7>

error: 0a

<=, =, 25>

<2, INT, 0>

<;, ;, 33>

********Line 8*********:

********Line 9*********:

<init, IDENT, 3>

<(, (, 34>

<), ), 35>

<;, ;, 33>

********Line 10*********:

********Line 11*********:

<while, while, 14>

<(, (, 34>

<gets, IDENT, 3>

<(, (, 34>

<s, IDENT, 3>

<), ), 35>

********Line 12*********:

<{, {, 36>

********Line 13*********:

<curn, IDENT, 3>

<=, =, 25>

<0, INT, 0>

<;, ;, 33>

********Line 14*********:

<printf, IDENT, 3>

<(, (, 34>

<"Line %d:\n",CHARS, 2>

<,, ,, 43>

<t, IDENT, 3>

<++, ++, 27>

<), ), 35>

<;, ;, 33>

********Line 15*********:

<while, while, 14>

<(, (, 34>

<*, *, 45>

<s, IDENT, 3>

<), ), 35>

<!=, !=, 26>

<0, INT, 0>

<), ), 35>

********Line 16*********:

<{, {, 36>

********Line 17*********:

<while, while, 14>

<(, (, 34>

<*, *, 45>

<s, IDENT, 3>

<), ), 35>

<==, ==, 24>

<' ', CHAR, 2>

<), ), 35>

<s, IDENT, 3>

<++, ++, 27>

<;, ;, 33>

********Line 18*********:

<curn, IDENT, 3>

<=, =, 25>

<0, INT, 0>

<;, ;, 33>

********Line 19*********:

<curtype, IDENT, 3>

<=, =, 25>

<0, INT, 0>

<;, ;, 33>

********Line 20*********:

<print, IDENT, 3>

<(, (, 34>

<s, IDENT, 3>

<,, ,, 43>

<start, IDENT, 3>

<-, -, 31>

<>, >, 23>

<start, IDENT, 3>

<(, (, 34>

<s, IDENT, 3>

<), ), 35>

<;, ;, 33>

********Line 21*********:

<s, IDENT, 3>

<+=, +=, 28>

<curn, IDENT, 3>

<;, ;, 33>

********Line 22*********:

<}, }, 37>

********Line 23*********:

<}, }, 37>

********Line 24*********:

********Line 25*********:

<return, return, 13>

<0, INT, 0>

<;, ;, 33>

********Line 26*********:

<}, }, 37>

*******************************

1 error!

116 Word Have Been Found Out!

源代码：

#include <iostream>#include <string.h>#include <map>#include <stdio.h>#define num_before_symbol 20using namespace std;bool isNum(char *a);bool isWord(char *a);bool isSymbol(char *a);bool isNULL(char *a);map<string,string>type;char symbol[][10] = {">=","<=","<",">","==","=","!=","++","+=","+","/","-","\\",";","(",")","{","}",                     "[","]",":","->","?",",",".","*","\0"};int curn = 0;int curtype = 0;int nerror = 0;class STATE;class LIST;class STATE{    LIST *list;    static STATE *error; public:    static int count;    int type;    char *name;    void enlist(bool (*fun)(char *),STATE *out);    const STATE *next(char *in)const;    const STATE *start(char *)const;    STATE(char *name);    ~STATE();};class LIST{  LIST *next;  bool (*fun)(char *);  STATE *output;  LIST(bool (*fun)(char *),STATE *out);  ~LIST();  friend class STATE;};STATE *STATE::error = 0;int STATE::count = 0;LIST::LIST(bool (*fun)(char *),STATE *out){    this->next = NULL;    this->fun = fun;    this->output = out;}LIST::~LIST()   //怎么delete？？？？{    if(this->next!=NULL)        delete this->next;}const STATE *STATE::next(char *in)const{    LIST *p = list;    //if(this == error) return error;    while(p!=NULL)    {        if(p->fun(in))          return p->output;        else          p = p->next;    }    return error;}const STATE *STATE::start(char *s)const{    const STATE *p;    if(list == NULL)    {        if(this != error)           count++;        else        {            while(isWord(s))              curn++;        }        return this;    }    p = this->next(s);    if(p == error)      return error;          //error是否要加前缀    return p->start(s+1);}STATE::STATE(char *name){    if(name == 0)    {       error = this;       this->type = 1;       return;    }    if(strcmp(name,"SYMBOL"))      this->type = 0;    else      this->type = 1;    this->name = new char[strlen(name)]; //strlen+1    strcpy(this->name,name);    this->list = NULL;}STATE::~STATE(){    if(list)    {      delete list;      list = 0;    }    if(name)    {        delete name;        name = 0;    }}void STATE::enlist(bool (*fun)(char *),STATE *out){    LIST *p = new LIST(fun,out);    LIST *cur = this->list;    if(cur == NULL)     this->list = p;    else    {       while(cur->next!=NULL)          cur = cur->next;       cur->next = p;    }}bool mystrcmp(char *a,char *s){   int i = 0;   while(s[i]!='\0')   {      if(a[i]!=s[i])        return false;      i++;   }   return true;}bool isNum(char *a){   if(a[0]<='9' && a[0]>='0')   {       curn++;       return true;   }   return false;}bool isDot(char *a){    if(a[0] == '.')    {       curn++;       return true;    }    return false;}bool isWord(char *a){   if((a[0]<='Z' && a[0]>='A') || (a[0]>='a' && a[0]<='z'))    {       curn++;       return true;    }    return false;}bool isNotNumOrWord(char *a){   if((!(a[0]<='9' && a[0]>='0')) && !isWord(a))   {       return true;   }   curn--;   return false;}bool isSymbol(char *a){    int i = 0;    while(strcmp(symbol[i],"\0"))    {       if(mystrcmp(a,symbol[i]))       {         curtype = i;         curn = strlen(symbol[i]);         break;       }       i++;    }}bool isDQuotation(char *a){   if(a[0] == '"')   {       curn++;       return true;   }   return false;}bool isNotDQuotation(char *a){   if(a[0] != '"')   {       curn++;       return true;   }   return false;}bool isNotSQuotation(char *a){   if(a[0]!='\'')   {       curn++;       return true;   }   return false;}bool isSQuotation(char *a){   if(*(a-1)!='\\' && (*a)== '\'')   {       curn++;       return true;   }   return false;}STATE *start = new STATE("start");STATE *s1 = new STATE("s1");STATE *s2 = new STATE("s2");STATE *s3 = new STATE("s3");STATE *s4 = new STATE("s4");STATE *s5 = new STATE("s5");STATE *s6 = new STATE("s6");STATE *INT = new STATE("INT");STATE *FLOAT = new STATE("FLOAT");STATE *IDENT = new STATE("IDENT");STATE *SYMBOL = new STATE("SYMBOL");STATE *CHAR = new STATE("CHAR");STATE *CHARS = new STATE("CHARS");STATE error(0);void init(){    start->enlist(isNum,s1);    start->enlist(isWord,s3);    start->enlist(isSymbol,SYMBOL);    start->enlist(isDQuotation,s4);    start->enlist(isSQuotation,s5);    s1->enlist(isNum,s1);    s1->enlist(isDot,s2);    s1->enlist(isNotNumOrWord,INT);    s1->enlist(isWord,&error);   //需要将错误部分剩下的跳过，对error类型进行标识。    s2->enlist(isNum,s2);    s2->enlist(isNotNumOrWord,FLOAT);    s2->enlist(isWord,&error);    s3->enlist(isWord,s3);    s3->enlist(isNum,s3);    s3->enlist(isNotNumOrWord,IDENT);    s4->enlist(isNotDQuotation,s4);  //if is """ will get wrong answer    s4->enlist(isDQuotation,CHARS);    s5->enlist(isNotSQuotation,s6);    s5->enlist(isSQuotation,&error);    s6->enlist(isSQuotation,CHAR);    type["INT"] = "0";    type["FLOAT"] = "1";    type["CHAR"] = "2";    type["CHARS"] = "2";    type["IDENT"] = "3";    type["if"] = "5";    type["else"] = "6";    type["int"] = "7";    type["char"] = "8";    type["float"] = "9";    type["double"] = "10";    type["long"] = "11";    type["short"] = "12";    type["return"] = "13";    type["while"] = "14";    type["break"] = "15";    //引号算什么？？}void print(char *s,const STATE *p){    int i = 0;    char temp[255];    for(i = 0;i<curn;i++)      temp[i] = s[i];    temp[i] = '\0';    if(p->name == 0)    {      printf("error: %s\n",temp);      nerror++;      return;    }    printf("<");    for(i = 0;i<curn;i++)      printf("%c",*(s+i));    if(curtype == 0)    {       if(type.find(temp) == type.end())          cout<<", "<<p->name<<", "<<type[p->name]<<">"<<endl;       else         cout<<", "<<temp<<", "<<type[temp]<<">"<<endl;    }    else      printf(", %s, %d>\n",symbol[curtype],num_before_symbol+curtype);}int main(int argv,char *argc[]){    char temp[2][255];    if(argv<2)        strcpy(temp[0],"input.txt");    else        strcpy(temp[0],argc[1]);    if(argv<3)        strcpy(temp[1],"output.txt");    else        strcpy(temp[1],argc[2]);    freopen((const char*)temp[0],"r",stdin);    freopen((const char*)temp[1],"w",stdout);    char input[255],*s = input;    int t = 1;    init();    while(gets(s))    {       curn = 0;       printf("********Line %d*********:\n",t++);       if(t == 18)         t = 18;       while((*s)!='\0')    //n is last edit       {          while((*s) == ' ') s++;   //滤掉空格          curn = 0;          curtype = 0;          print(s,start->start(s));          s += curn;          while((*s) == ' ') s++;   //滤掉空格       }       s = input;       printf("\n");    }    printf("*******************************\n%d error!\n%d Word Have Been Found Out!\n",nerror,STATE::count);    return 0;}