大学编译实验--词法分析器（Java实现）

来源：互联网发布：java开发简历自我评价编辑：程序博客网时间：2024/06/06 02:42

SIMPLE语言定义

一、字符集定义

1． <字符集>→ <字母>│<数字>│<单界符>

2． <字母>→ A│B│…│Z│a│b│…│z

3． <数字>→0│1│2│…│9

4． <单界符>→+│-│*│/│=│<│>│(│)│[│]│:│.│;│,│'

二、单词集定义

5．<单词集>→<保留字>│<双界符>│<标识符>│<常数>│<单界符>

6．<保留字>→and│array│begin│bool│call│case│char│constant│dim│do│else│end│false│for│if│input│integer│not│of│or│output│procedure│program│read│real│repeat│set│stop│then│to│true│until│var│while│write

7．<双界符>→ <>│<=│>=│:=│/*│*/│..

8．<标识符>→ <字母>│<标识符> <数字>│<标识符> <字母>

9．<常数>→ <整数>│<布尔常数>│<字符常数>

10．<整数>→ <数字>│<整数> <数字>

11．<布尔常数>→true│false

12．<字符常数>→ '除{'}外的任意字符串'

三、数据类型定义

13．<类型>→integer│bool│char

四、表达式定义

14．<表达式>→ <算术表达式>│<布尔表达式>│<字符表达式>

15．<算术表达式>→ <算术表达式> + <项>│<算术表达式> - <项>│<项>

16．<项>→ <项> * <因子>│<项> / <因子>│<因子>

17．<因子>→ <算术量>│- <因子>

18．<算术量>→ <整数>│<标识符>│（ <算术表达式>）

19．<布尔表达式>→<布尔表达式>or <布尔项>│<布尔项>

20．<布尔项>→ <布尔项>and <布因子>│<布因子>

21．<布因子>→ <布尔量>│not <布因子>

22．<布尔量>→ <布尔常量>│<标识符>│（ <布尔表达式>）│

<标识符> <关系符> <标识符>│<算术表达式> <关系符> <算术表达式>

23．<关系符>→<│<>│<=│>=│>│=

24．<字符表达式>→<字符常数>│<标识符>

五、语句定义

25．<语句>→<赋值句>│<if句>│<while句>│<repeat句>│<复合句>

26．<赋值句>→<标识符> := <算术表达式>

27．<if句>→if <布尔表达式>then <语句>│if <布尔表达式>then <语句>else <语句>

28．<while句>→while <布尔表达式>do <语句>

29．<repeat句>→repeat <语句>until <布尔表达式>

30．<复合句>→begin <语句表>end

31．<语句表>→<语句>；<语句表>│<语句>

六、程序定义

32．<程序>→program <标识符>；<变量说明> <复合语句> .

33．<变量说明>→var<变量定义>│ε

34．<变量定义>→<标识符表>：<类型>；<变量定义>│<标识符表>：<类型>；

35．<标识符表>→<标识符>，<标识符表>│<标识符>

七、 SIMPLE语言单词编码

单词

种别码

单词

种别码

单词

种别码

and

output

array

procedure

begin

program

bool

read

call

real

case

repeat

char

set

constant

stop

dim

then

else

true

end

until

;

false

var

for

while

write

input

标识符

integer

整数

not

字符常数

(

[

)

]

八、实验一：设计SAMPLE语言的词法分析器

检查要求：

a) 启动程序后，先输出作者姓名、班级、学号（可用汉语、英语或拼音）；

b) 请求输入测试程序名，键入程序名后自动开始词法分析并输出结果；

c) 输出结果为单词的二元式序列（样式见样板输出1和2）；

d) 要求能发现下列词法错误和指出错误性质和位置：

非法字符，即不是SAMPLE字符集的符号；

字符常数缺右边的单引号（字符常数要求左、右边用单引号界定，不能跨行）；

注释部分缺右边的界符*/（注释要求左右边分别用/*和*/界定，不能跨行）；

发现错误后要能够继续编译下去，不能只报一个错误；

九、实验一测试程序与样板输出

测试程序1：程序名TEST1

and array begin bool call

case char constant dim do

else end false for if

input integer not of or

output procedure program read real

repeat set stop then to

true until var while write

abc 123 'EFG' ( ) * + , - . .. /

: := ; < <= <> = > >= [ ]

样板输出1：（要求在屏幕上显示）注：（种别码，单词）

( 1 , and) (2 , array ) ( 3 , begin ) ( 4 ,bool) ( 5 , call )

( 6 , case) ( 7 , char) ( 8 , constant) ( 9 , dim) (10, do )

(11 , else) (12, end) (13 ,false) (14 ,for) (15 ,if)

(16 ,input) (17,integer) (18 ,not) (19 ,of) (20 ,or)

(21 , output) (22 ,procedure) (23 ,program) (24 ,read) (25,real)

(26 ,repeat) (27 ,set) (28 ,stop) (29 ,then) (30,to)

(31 ,true) (32,until) (33 ,var) (34,while) (35 ,write)

(36 ,abc) (37,123) (38 ,EFG) (39 , ( ) (40 , ) )

(41 , * ) (43, + ) (44, , ) (45, - ) (46, . )

(47 , .. ) (48, / ) (50, : ) (51, := ) (52, ; )

(53 , < ) (54 , <= ) (55 , <> ) (56, = ) (57, > )

(58 , >= ) (59 , [ ) (60, ])

测试程序2：程序名TEST2

program example2;

var A,B,C:integer;

X,Y:bool;

begin /* this is an example */

A:=B*C+37;

X:= 'ABC';

end.

样板输出2：（要求在屏幕上显示）

(23 , program) (36 , example2 ) (52 , ; ) (33, var ) (36 , A )

(44 , , ) (36, B ) (44 , , ) (36 , C) (50 , : )

(17 , integer ) (52 , ; ) (36 , X ) (44, , ) (36, Y )

(50 , : ) ( 4 , bool ) (52 , ; ) ( 3 , begin ) (36 , A )

(51 , :=) (36, B ) (41, * ) (36, C ) (43, + )

(37 , 37 ) (52 , ; ) (36 , X ) (51 , := ) (38 , ABC )

(52, ; ) (12, end ) (46 , . )

十、实验二：设计SAMPLE语言的语法、语义分析器，输出四元式的中间结果。

检查要求：

a) 启动程序后，先输出作者姓名、班级、学号（可用汉语、英语或拼音）。

b) 请求输入测试程序名，键入程序名后自动开始编译。

c) 输出四元式中间代码（样式见样板输出3和4）。

d) 能发现程序的语法错误并输出出错信息。

十一、测试样板程序与样板输出

测试程序3：程序名TEST4 测试程序4：程序名TEST5

program example4; programexample5;

var A,B,C,D:integer; var A,B,C,D,W:integer;

begin begin

A:=1; B:=5; C:=3; D:=4; A:=5; B:=4; C:=3;D:=2; W:=1;

while A<C and B>D do if W>=1 thenA:=B*C+B/D

if A=1 thenC:=C+1 else else repeat A:=A+1 until A<0

while A<=D do A:=A*2 end.

end.

样板输出3：（要求在屏幕上显示）样板输出4：（要求在屏幕上显示）

( 0) (program,example4,-,-) (0) (program,example5,-,-)

( 1) (:= , 1 , - , A) (1) (:= , 5 , - , A)

( 2) (:= , 5 , - , B) (2) (:= , 4 , - , B)

( 3) (:= , 3 , - , C) (3) (:= , 3 , - , C)

( 4) (:= , 4 , - , D) (4) (:= , 2 , - , D)

( 5) (j< , A , C, 7) (5) (:= , 1 , - , W)

( 6) (j , - , - , 20) (6) (j>=, W , 1 , 8)

( 7) (j> , B , D, 9) (7) (j , - , - , 13)

( 8) (j , - , - , 20) (8) (* , B , C , T1)

( 9) (j= , A , 1 , 11) ( 9) (/ , B , D, T2)

(10) (j , - , - , 14) (10) (+ , T1, T2 , T3)

(11) (+ , C , 1 , T1) (11) (:= , T3 , - , A)

(12) (:= , T1 , - , C) (12) (j , - , - , 17)

(13) (j , - , - , 5) (13) (- , A , 1 , T4)

(14) (j<=, A , D, 16) (14) (:= , T4 , - , A)

(15) (j , - , - , 5) (15) (j< , A , 0 , 17)

(16) (* , A, 2 , T2) (16) (j , - , - , 13)

(17) (:= , T2 , - , A) (17) (sys , - , - , -)

(18) (j , - , - ,14)

(19) (j , - , - , 5)

(20) (sys , - , - , -)

实验1代码：

package firstExam;import java.io.BufferedReader;import java.io.File;import java.io.FileInputStream;import java.io.FileNotFoundException;import java.io.FileReader;import java.io.IOException;import java.util.Scanner;public class Test3 {private static String dyhStr = "'";// 定义一个字符串数组用来保存保留字private static String[] keyWord = { "and", "array", "begin", "bool","call", "case", "char", "constant", "dim", "do", "else", "end","false", "for", "if", "input", "integer", "not", "of", "or","output", "procedure", "program", "read", "real", "repeat", "set","stop", "then", "to", "true", "until", "var", "while", "write" };private static char[] sigleDelimiter = { '+', '-', '*', '/', '=', '<', '>','(', ')', '[', ']', ':', '.', ';', ',', dyhStr.charAt(0) };// 判断是否为保留字，每次读取的是字符串public static boolean isKeyWord(String str) {for (int i = 0; i < keyWord.length; i++) {if (keyWord[i].equals(str)) {return true;}}return false;}// 判断是否是数字，每次读取的是字符public static boolean isDigit(char ch) {if (ch >= 48 && ch <= 57) {return true;} else {return false;}}// 判断是否为字母，每次读取的是字符public static boolean isLetter(char ch) {if ((ch >= 65 && ch <= 90) || (ch >= 97 && ch <= 122) | (ch == 37)) {return true;} else {return false;}}// 判断是否为单界符，每次读取的是字符public static boolean isSingleDlimeter(char ch) {for (int i = 0; i < sigleDelimiter.length; i++) {if (ch == sigleDelimiter[i]) {return true;}}return false;}// 获取该保留字的种别码public static int getKeywordKindCode(String str) {int keyWordIndex = 0;for (int i = 0; i < keyWord.length; i++) {if (str.equals(keyWord[i])) {keyWordIndex = i + 1;}}return keyWordIndex;}// 获取单界符的种别码// '+','-','*', '/', '=', '<', '>', '(',')', '[', ']',':', '.', ';',','public static int getSingleKindCode(char ch) {int sCode = 0;switch (ch) {case '+':sCode = 43;break;case '-':sCode = 45;break;case '*':sCode = 41;break;case '/':sCode = 48;break;case '=':sCode = 56;break;case '<':sCode = 53;break;case '>':sCode = 57;break;case '(':sCode = 39;break;case ')':sCode = 40;break;case '[':sCode = 59;break;case ']':sCode = 60;break;case ':':sCode = 50;break;case '.':sCode = 46;break;case ';':sCode = 52;break;case ',':sCode = 44;break;}return sCode;}public static int getDoubleKindCode(String str) {int code = 0;// '=','>=','<=',':='if (str.equals(":=")) {code = 51;} else if (str.equals(">=")) {code = 58;} else if (str.equals("<=")) {code = 54;} else if (str.equals("..")) {code = 47;} else if (str.equals("<>")) {code = 55;}return code;}/** * 从D:/file1.text读取文本 *  * @param path *            文本路径 * @return 返回读取的字符串 * @throws IOException */public static String FileInputStreamMethod(String path) throws IOException {File file = new File(path);if (!file.exists() || file.isDirectory()) {throw new FileNotFoundException();}FileInputStream fis = new FileInputStream(file);byte[] buffer = new byte[1024];StringBuffer sb = new StringBuffer();while ((fis.read(buffer)) != -1) {sb.append(new String(buffer));buffer = new byte[1024];}return sb.toString();}/** * 词法分析核心函数 */public static void tokenAnysis() {String lineStr;int row = 0;String filePath = "F://file2.txt";try {String fileTxt = FileInputStreamMethod(filePath).trim();System.out.println("源程序如下: ");System.out.println(fileTxt);System.out.println("开始词法分析");} catch (IOException e1) {// TODO Auto-generated catch blocke1.printStackTrace();}File file = new File(filePath);BufferedReader br;char ch; // 单个字符int count = 0; // 用来统计二元组个数try {br = new BufferedReader(new FileReader(file));// 一行一行地分析while ((lineStr = br.readLine()) != null) {int i = 0;row++; // 行号+1;int col = 1; // 列号while (i <= lineStr.length() - 1) {ch = lineStr.charAt(i);// 判断读取第一个字符是否为字母if (isLetter(ch)) {StringBuffer sb = new StringBuffer();sb.append(ch);col++; // 列号+1// 读取下一个字符ch = lineStr.charAt(++i);// 是字符或数字都okwhile ((isLetter(ch) || isDigit(ch))) {sb.append(ch);if (i == lineStr.length() - 1) {i++;break;} else {// 继续读取字符ch = lineStr.charAt(++i);}// 列号继续加1col++;}// 如果是关键字if (isKeyWord(sb.toString())) {// 获取该关键字的种别码int kindCode = getKeywordKindCode(sb.toString());// 输出该关键字的二元组System.out.print("(" + kindCode + ","+ sb.toString() + ")" + "");// 二元组个数+1count++;} else { // 要么为标识符// 输出该标识符的二元组System.out.print("(" + 36 + "," + sb.toString()+ ")" + " ");count++;}if (count % 5 == 0) {System.out.println();}// 如果是单界符的话} else if (isSingleDlimeter(ch)) {StringBuffer sb = new StringBuffer();String dyh = "'";// 如果是逗号(,)或者是分号(;)等于号(=)的话，直接输出二元组if ((ch == ',') || (ch == ';') || (ch == '=')) {System.out.print("(" + getSingleKindCode(ch) + ","+ ch + ")" + " ");i++;col++;count++;// 如果是左括号'(',右括号')',左中括号'[',右中括号']',直接输出而元组} else if ((ch == '(') || (ch == ')') || (ch == '[')|| (ch == ']')) {System.out.print("(" + getSingleKindCode(ch) + ","+ ch + ")" + " ");i++;col++;count++;}// 如果读取的字符是加(+),减(-),乘(*)的话，也直接输出该单词的二元组else if ((ch == '+') || (ch == '-') || (ch == '*')) {System.out.print("(" + getSingleKindCode(ch) + ","+ ch + ")" + " ");i++;col++;count++;// 如果读取的字符是等号(=),大于号(>),小于号(<)或者是冒号(:)// 这时需要继续读取下一个字符进行判断是否是双界符} else if ((ch == '>') || (ch == '<') || (ch == ':')) {// 定义一字符来存放上一个字符char ch1 = ch;sb.append(ch);col++;// 读取下一个字符ch = lineStr.charAt(++i);// 如果下一个字符为等于号(=)if (ch == '=') {sb.append(ch);col++;// 这时候可以直接输出双界符的相关的二元组System.out.print("("+ getDoubleKindCode(sb.toString())+ "," + sb.toString() + ")" + " ");i++;count++;// 如果上一个字符是小于号(<)的话} else if (ch1 == '<') {// 如果下一个字符是大于号(>)的话if (ch == '>') {sb.append(ch);col++;// 这时会匹配为SIMPLE语言的不等于号(<>)// 输出二元组System.out.print("("+ getDoubleKindCode(sb.toString())+ "," + sb.toString() + ")" + " ");count++;}// 如果下一个字符不是与上一个字符匹配为双界符，就直接输出该单界符} else {System.out.print("("+ getSingleKindCode(sb.charAt(0)) + ","+ sb.charAt(0) + ")" + " ");count++;// 并且跳出当前循环continue;}}// 如果读取的字符为斜线(/)或者是单引号('),双引号(")else if ((ch == '/') || (ch == dyh.charAt(0))) {sb.append(ch);col++;if (i == lineStr.length() - 1) {i++;break;} else {// 继续读取字符ch = lineStr.charAt(++i);}if (ch == '*') {sb.append(ch);int bb = 0;bb++;ch = lineStr.charAt(++i);col++;while (ch != '*') {if (i == lineStr.length() - 1) {i++;System.out.print("错误类型:注释不匹配" + " 第 " + row+ " 行，第" + col + " 列 ");break;} else {ch = lineStr.charAt(++i);col++;}}if(i <= lineStr.length()){break;} else {ch = lineStr.charAt(++i);}col++;if (ch == '/') {bb--;i++;continue;} else {System.out.print("错误类型:注释不匹配" + " 第 " + row+ " 行，第" + col + " 列 ");}}if (sb.charAt(0) == dyh.charAt(0)) {StringBuffer sb1 = new StringBuffer();sb1.append(ch);col++;if (i == lineStr.length() - 1) {i++;break;} else {ch = lineStr.charAt(++i);col++;while (ch != dyh.charAt(0)) {sb1.append(ch);if (i == lineStr.length() - 1) {i++;break;} else {// 继续读取字符ch = lineStr.charAt(++i);col++;}}}if( ch == dyh.charAt(0)){// 输出的是字符常数System.out.print("(" + 38 + ","+ sb1.toString() + ")" + " ");count++;}else {System.out.print("错误类型:单引号不匹配" + " 第 " + row + " 行 , 第 " + col + " 列");}i++;}} else if (ch == '.') {sb.append(ch);col++;StringBuffer sb1 = new StringBuffer();if (i == lineStr.length() - 1) {i++;System.out.print("(" + getSingleKindCode(ch)+ "," + ch + ")" + " ");count++;} else {// 继续读取字符ch = lineStr.charAt(++i);if (ch == '.') {sb.append(ch);// 这时候可以直接输出双界符(..)的相关的二元组System.out.print("("+ getDoubleKindCode(sb.toString())+ "," + sb.toString() + ")" + " ");i++;col++;count++;} else {System.out.print("("+ getSingleKindCode(sb.charAt(0))+ "," + sb.charAt(0) + ")" + " ");i++;col++;}}}if (count % 5 == 0) {System.out.println();}}// 如果第一次读入的是数字else if (isDigit(ch)) {StringBuffer sb = new StringBuffer();sb.append(ch);col++;ch = lineStr.charAt(++i);if (isDigit(ch)) {while (isDigit(ch)) {sb.append(ch);col++;ch = lineStr.charAt(++i);}System.out.print("(" + 37 + "," + sb.toString()+ ")" + " ");count++;}if (isLetter(ch)) {while (isLetter(ch)) {sb.append(ch);col++;ch = lineStr.charAt(++i);}System.out.print("非法字符" + sb.toString() + " 第 "+ row + " 行,第 " + col + " 列出错");}if (count % 5 == 0) {System.out.println();}} else {i++;col++;}}}} catch (Exception e) {// TODO: handle exceptione.printStackTrace();}}/** * @param args */public static void main(String[] args) {// TODO Auto-generated method stubScanner input = new Scanner(System.in);String testName;System.out.println("&&&欢迎来到小巫的编译世界&&&:");System.out.println("&姓名:" + "巫文杰" + "\n" + "&班级:" + "10计算机科学与技术1班"+ "\n" + "&学号:" + "201038889071");System.out.println("请输入程序测试名:");testName = input.nextLine();if (testName.equals("Test3")) {tokenAnysis();}}}