FLEX&BISON学习笔记(一:语法)

来源:互联网 发布:传奇霸业武魂数据 编辑:程序博客网 时间:2024/05/21 19:42

 

 

Metacharacter   Matches
.                             any character except newline除回车的任何字符
/n                          newline回车
*                            zero or more copies of the preceding expression
+                           one or more copies of the preceding expression
?                           zero or one copy of the preceding expression
^                           beginning of line行开始
$                           end of line行结束
a|b                       a or b
(ab)+                   one or more copies of ab (grouping)
"a+b"                  literal "a+b" (C escapes still work)
[]                          character class字符类

 

Expression       Matches
abc                       abc
abc*                     ab abc abcc abccc ...
abc+                    abc abcc abccc ...
a(bc)+                 abc abcbc abcbcbc ...
a(bc)?                 a abc
[abc]                   one of: a, b, c
[a-z]                    any letter, a-z
[a/-z]                   one of: a, -, z
[-az]                    one of: -, a, z
[A-Za-z0-9]+      one or more alphanumeric characters文字数字式字符
[ /t/n]+                 whitespace空白区
[^ab]                   anything except: a, b
[a^b]                   one of: a, ^, b
[a|b]                    one of: a, |, b
a|b                      one of: a, b
Table 2: Pattern Matching Examples

 

Name                        Function
int yylex(void)        call to invoke lexer, returns token
char *yytext            pointer to matched string
yyleng                      length of matched string
yylval                       value associated with token(搞不懂什么意思)
int yywrap(void)    wrapup, return 1 if done, 0 if not done
FILE *yyout            output file
FILE *yyin               input file
INITIAL                    initial start condition
BEGIN                     condition switch start condition
ECHO                      write matched string
Table 3: Lex Predefined Variables预定义变量

 

Yacc

Grammars for yacc are described using a variant of Backus Naur Form (BNF).

At each step we expanded a term(每一步我们都扩展了一个语法结构), replacing the lhs of a production with the corresponding rhs. The numbers on the right indicate which rule applied. To parse an expression, we actually need to do the reverse operation(行倒序操作). Instead of starting with a single nonterminal (start symbol) and generating an expression from a grammar, we need to reduce an expression to a single nonterminal. This is known as bottom-up or shift-reduce parsing, and uses a stack for storing terms.

 

More Lex
Strings
Quoted strings frequently appear in programming languages. Here is one way to match a string in lex:
%{
char *yylval;
#include <string.h>
%}
%%
/"[^"/n]*["/n] {
yylval = strdup(yytext+1);


原型:char *strdup(char *s);       
用法:#include <string.h>
功能:复制字符串s
说明:返回指向被复制的字符串的指针,所需空间由malloc()分配且可以由free()释放。
    The strdup() function returns a pointer to a new string which is a duplicate of the string s. Memory for the new string is obtained with malloc(), and can be freed with free().



if (yylval[yyleng-2] != '"')
warning("improperly terminated string");
else
yylval[yyleng-2] = 0;
printf("found '%s'/n", yylval);
}
The above example ensures that strings don’t cross line boundaries, and removes enclosing quotes. If we wish to add escape sequences, such as /n or /", start states simplify matters:
%{
char buf[100];
char *s;
%}
%x STRING
%%
/" { BEGIN STRING; s = buf; }
<STRING>//n { *s++ = '/n'; }
<STRING>//t { *s++ = '/t'; }
<STRING>///" { *s++ = '/"'; }
<STRING>/" {
*s = 0;
BEGIN 0;
printf("found '%s'/n", buf);
}
<STRING>/n { printf("invalid string"); exit(1); }
<STRING>. { *s++ = *yytext; }
Exclusive/ 唯一的/start state STRING (开始状态 STRING )is defined in the definition section. When the scanner detects a quote, the BEGIN macro shifts lex into the STRING state. Lex stays in the STRING state, recognizing only patterns that begin with <STRING>, until another BEGIN is executed. Thus, we have a mini-environment for scanning strings从而,我们获得了一个专门用于扫描字符串的小环境. When the trailing quote is recognized, we switch back to state 0, the initial state

Reserved Words
If your program has a large collection of reserved words, it is more efficient to let lex simply match a string, and determine in your own code whether it is a variable or reserved word. For example, instead of coding
"if" return IF;
"then" return THEN;
"else" return ELSE;
{letter}({letter}|{digit})* {
yylval.id = symLookup(yytext);
return IDENTIFIER;
}
where symLookup returns an index into the symbol table, it is better to detect reserved words and identifiers simultaneously, as follows:
{letter}({letter}|{digit})* {
int i;
if ((i = resWord(yytext)) != 0)
return (i);
yylval.id = symLookup(yytext);
return (IDENTIFIER);
}
This technique significantly reduces the number of states required, and results in smaller scanner tables.

Inherited Attributes
The examples so far have used synthesized attributes(综合属性). At any point in a syntax tree we can determine the attributes of a node based on the attributes of its children. Consider the rule
expr: expr '+' expr { $$ = $1 + $3; }
Since we are parsing bottom-up, the values of both operands are available, and we can determine the value associated with the left-hand side. An inherited attribute of a node depends on the value of a parent or sibling 兄弟node. The following grammar defines a C variable declaration:
decl: type varlist
type: INT | FLOAT
varlist:
VAR { setType($1, $0); }
| varlist ',' VAR { setType($3, $0); }
Here is a sample parse:
. INT VAR
INT . VAR
type . VAR
type VAR .
type varlist .
decl .
When we reduce VAR to varlist, we should annotate注释 the symbol table with the type of the variable. However, the type is buried in the stack. This problem is resolved by indexing back into the stack. Recall that $1 designates the first term on the right-hand side. We can index backwards, using $0, $-1, and so on. In this case, $0 will do just fine. If you need to specify a token type, the syntax is $<tokentype>0, angle brackets included. In this particular example, care must be taken to ensure that type always precedes varlist.





LEX基本匹配方法Pattern Matching Primitives
原创粉丝点击