词法分析 VS 语法分析

来源：互联网发布：淘宝主播安然编辑：程序博客网时间：2024/04/29 02:55

词法分析阶段是编译过程的第一个阶段。这个阶段的任务是从左到右一个字符一个字符地读入源程序，即对构成源程序的字符流进行扫描然后根据构词规则识别单词(也称单词符号或符号，如标识符、常数、运算符、定界符等)。词法分析程序实现这个任务。词法分析程序可以使用Lex等工具自动生成。

词法分析是从左到右逐个字符对构成源程序的字符串进行扫描，依据词法规则，识别出一个一个的单词（token），把作为字符串的源程序变为等价的单词串序列。执行词法分析的程序称为词法分析器，也称为扫描器（scanner）。源程序中的单词符号经扫描器分析，一般产生二元式：单词种别和单词自身的值。单词种别通常用整数编码，如果一个种别只含一个单词符号，那么对这个单词符号，种别编码就完全代表它自身的值了。若一个种别含有许多个单词符号，那么，对于它的每个单词符号，除了给出种别编码以外，还应给出自身的值。

In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. Programs performing lexical analysis are called lexical analyzers or lexers. A lexer is often organized as separate scanner and tokenizer functions, though the boundaries may not be clearly defined.

The first stage, the scanner, is usually based on a finite state machine. It has encoded within it information on the possible sequences of characters that can be contained within any of the tokens it handles (individual instances of these character sequences are known as lexemes). For instance, an integer token may contain any sequence of numerical digit characters. In many cases, the first non-whitespace character can be used to deduce the kind of token that follows and subsequent input characters are then processed one at a time until reaching a character that is not in the set of characters acceptable for that token (this is known as the maximal munch rule). In some languages the lexeme creation rules are more complicated and may involve backtracking over previously read characters.

在计算机科学和语言学中，语法分析(parsing)是根据某种给定的形式文法(formal grammar)对输入的单词(token)序列进行分析并确定其语法结构的一种过程。而语法分析器通常是以编译器或解释器的组件出现的，它的作用是从输入中分析出其结构并将其转换为在后续处理过程中更易于访问的数据结构(一般是树类的数据结构)，并检测可能存在的语法错误。语法分析器通常使用一个词法分析器(lexer)从输入的字符流中分离出一个个的‘单词’，并将单词流作为其输入。在实际开发中，语法分析器可以手工编写，也可以使用自动生成程序(如yacc之类)根据一个使用巴科斯范式描述的形式文法来生成其高级语言代码。词法分析器阶段的任务：从左至右逐个读入源程序，对源程序的字符流进行扫描和分析，识别出是否为该类别程序语言的保留字，其他的单词则标为用户定义的标识符。另外，在词法分析阶段，可以分析程序的用户自定义的标识符是否符合构词规则。并表标识出行号位置。