Grammar Format Specification（1）

来源：互联网发布：网络著作权侵权案例编辑：程序博客网时间：2024/04/28 01:43

英语不好，看了一次就记下来省得下次还要再翻译。在这里共享一下。

1. Introduction

1.1 Related Documentation

2. Definitions(定义)

2.1 Grammar Names and Package Names(语法名和包名)

Each grammar defined by Java Speech Grammar Format has a unique name that is declared in the grammar header. Legal structures for grammar names are:

JSGF 定义的每个语法都有一个唯一的名字被声明在语法的头部。合理的语法名结构是：

packageName.simpleGrammarName
grammarName

The first form (package name + simple grammar name) is a full grammar name. The second form is a simple grammar name (grammar name only). Examples of full grammar names and simple grammar names include:

第一个格式（包名+简单的语法名）是一个完整语法名。第二个是一个简单语法名（是有语法名）。一个完整语法名和一个简单语法名的例子：

com.sun.speech.apps.numbers
edu.unsw.med.people
examples

The package name and grammar name have the same format as packages and classes in the Java programming language. A full grammar name is a dot- separated list of Java identifiers(1) (see GJS96, §3.8 and §6.5).

包名和语法名与JAVA语言里的包名和类名有相同的结构。一个完整的语法名是由一列以点（.）分隔的JAVA标识符组成的(见GJS96, §3.8 and §6.5).。

The grammar naming convention also follows the naming convention for classes in the Java Programming Language (see GJS96). The convention minimizes the chance of naming conflicts. The package name should be:

语法名的命名规范也遵循JAVA的命名规范（见GJS96）。约定最小的命名冲突。包名应该是：

reversedDomainName.localPackaging

For example, for com.sun.speech.apps.numbers, the com.sun part is Sun's reversed Internet domain name, speech.apps is the local package name for Sun-wide division of the name space, and numbers is the simple grammar name.

例如com.sun.speech.apps.numbers的com.sun 部分是SUN公司的域名颠倒过来。speech.apps 是Sun-wide 部门命名空间的本地包名，number是一个简单的语法名。

2.2 Rulenames

A grammar is composed of a set of rules that together define what may be spoken. Rules are combinations of speakable text and references to other rules. Each rule has a unique rulename. A reference to a rule is represented by the rule's name in surrounding <> characters (less-than and greater-than).

一个语法是一个被组合在一起的共同定义将要说的内容的一个集合。规则包括可叙述性文本和其它规则的引用。每一个规则有一个唯一的规则名。一个引用的规则名用一对<> 括起来。

A legal rulename is similar to a Java identifier but allows additional extra symbols. A legal rulename is an unlimited-length sequence of Unicode characters matching the following(2):

一个合法的规则名和JAVA的标示符一样，但可以有一些特殊字符.一个合法的规则名由一串没有长度限制的Unicode 字符组成。

Characters matching java.lang.Character.isJavaIdentifierPart including the Unicode letters and numbers plus other symbols.
The following additional punctuation symbols:
+ - : ; , = | / / ( ) [ ] @ # % ! ^ & ~

Grammar developers should be aware of two specific constraints. First, rulenames are compared with exact Unicode string matches, so case is significant. For example, , and are different. Second, whitespace(3) is not permitted in rulenames.

开发者应该知道两个特殊的约束条件。第一，规则名会被当成Unicode字符串精确匹配，大小写敏感。例如, 和是不一样的。第二。规则名内不允许有空格。

The rulenames and are reserved. These special rules are discussed later in this section.

规则名和被预留。一些特殊的规则在后面讨论。

The Unicode character set includes most writing scripts from the world's living languages, so rulenames can be written in Chinese, Japanese, Korean, Thai, Arabic, European languages, and many more. The following are examples of rulenames.

Unicode 字符集包括了大部分世界上现有的可编写脚本的字符。所以规则名可以用中文，日文，韩文，泰文，阿拉伯文，欧洲语言，和更多的语言。下面是一些规则名的例子。

  1: <hello>
  2: <Zürich>
  3: <user_test>
  4: <$100>
  5: <1+2=3>
  6: <>
  7:

2.2.1 Qualified and Fully-Qualified Names(合法的和完全合法的名字)

Although rulenames are unique within a grammar, separate grammars may reuse the same simple rulename. A later section introduces the import statement, which allows one grammar to reference rules from another grammar. When two grammars use the same rulename, a reference to that rulename may be ambiguous. Qualified names and fully-qualified names are used to reference between grammars without ambiguity.

显然规则名在一个语法内是唯一的，在分隔开的语法内是可以同名的。后面有一段重要的说明，哪一种是允许一个语法引用另一个语法的。什么时候两个语法可以使用相同的规则名，一个引用了有歧义的规则名。

A fully-qualified rulename includes the full grammar name and the simple rulename. For example:

一个完全合格的规则名包括全语法名和简单规则名，例如：

A qualified rulename includes only the simple grammar name and the rulename and is a useful shorthand representation. For example:

一个合格的规则名只包括简单的语法名和规则名，这是简短易记的。例如：

The following conditions apply to the use of rulenames:

下面的约束适用于规则名：

Qualified and fully-qualified rulenames may not be used on the left side of the definition of a rule.

合格和完全合格的规则名不能用在规则声名的左边。

Import statements must use fully-qualified rulenames.

导入语句必须用完全规则名。

Local rules can be referenced by qualified and fully-qualified names using the form .

本地规则可以符合或完全符合这种格式的名字引用。

2.2.2 Resolving Rulenames（解析规则名）

It is an error to use an ambiguous reference to a rulename. The following defines behavior for resolving references:

使用有歧义的规则名是一个错误。下面的定义会引起一个解析引用的运作：

Local rules have precedence. If a local rule and one or more imported rules have the same name, , then a simple rule reference to is a reference to the local rule.

本地规则名有优先权，如果本地的规则和一个或多个导入的规则有相同的名字, 这时候一个简单的引用规则是引用的是本地规则。

If two or more imported rules have the same name, , but there is no local rule of the same name, then a simple rule reference to is ambiguous and is an error. To resolve this ambiguity these imported rules must be referred to by their qualified or fully-qualified names.

如果两个或多个导入规则有相同的名字, 但没有与之同名的本地规则，这时候一个的简单引用是有歧义的，是错误的。解析这些有歧义的导入的规则必须用它们合格或完全合格的规则名。

If two or more imported rules have the same name and come from grammars with the same simple grammar name (but necessarily different package names), then a simple rule reference or qualified rule reference is ambiguous and is an error. These imported rules must be referred to by their fully-qualified names.

如果两个或多个导入的规则有相同的名字和来自有相同的语法名的（但必须有不同的包名），这时候一个简单的规则引用或一个有歧义的引用是错误的。这些导入的规则必须被完全的规则名引用。

A reference by a fully-qualified rulename is never ambiguous.

一个完全的规则名引用是不会有歧义的。

When a rulename reference cannot be resolved (not defined locally and not a public rule of an imported grammar), the handling of the reference is defined by the recognizer's software interface（4）.

当一个规则名引用不能被解析（没有在本地声明并且不是导入语法的公用规则），引用的处理被定义在识别器（recognizer）的接口里。

2.2.3 Special Rulenames（特殊的规则名）

The Java Speech Grammar Format defines two special rules, and . These rules are universally defined - they are available in any grammar without an explicit import statement - and they cannot be redefined. Both names are fully-qualified so no qualifying grammar name is required.

JSGFe 定义了两个特殊的规则和。这些规则被到处的定义，没有明确的导入这些语句它们也是合法的，它们不能被重复定义。它们的名字是完全合法的所以不需要语法名。

defines a rule that is automatically matched: that is, matched without the user speaking any word.

定义了一个自动匹配的规则：匹配用户不说话的时候。

defines a rule that can never be spoken. Inserting into a sequence automatically makes that sequence unspeakable(5).

定义了一个不用说出来的规则。被自动插入到一个无法表达的序列里。

The and rules are typically used in specialized circumstances. They can be used to block and enable parts of grammars, to control recursion, and to perform other advanced tasks. The Uses of and are described later in this document.

和是专门用在特殊情况下的。它们用来分隔和模块化语法，控制递归，执行更高级的任务，和的使用将在文章的后面描述。

2.3 Tokens（标识符）

A token, sometimes called a terminal symbol, is the part of a grammar that defines what may be spoken by a user. Most often, a token is equivalent to a word. Tokens may appear in isolation, for example,

一个标识符，有时叫作终止符，是语法规则的一部分，用来定义用户所说的话。通常，一个标识符是一个词，标识符可以单独出现。例如：

hello
konnichiwa

or as sequences of tokens separated by whitespace characters, for example,

或者是用空格分隔开的标识符序列。例如：

this is a test
open the directory

In Java Speech Grammar Format, a token is a character sequence bounded by whitespace, by quotes or delimited by the other symbols that are significant in the grammar:

在JSGF里，一个标识符是一个被空格符，冒号，或分号，其它在语法内有意义的字符分隔开的字符序列：

; = | * + <> () [] {} /* */ //

A token is a reference to an entry in a recognizer's vocabulary, often referred to as the lexicon. The recognizer's vocabulary defines the pronunciation of the token. With the pronunciation, the recognizer is able to listen for that token.

一个标示符是一个识别器词汇表（能常被叫作词典）里一个元素的引用，识别器词汇表里定义了标示符的发音，有了这个发音，识别器就可以听出这个标示符了。

The Java Speech Grammar Format allows multi-lingual grammars, that is, grammars that include tokens from more than one language. However, most recognizers operate mono-lingually so a typical grammar will contain only one language. It is the responsibility of the application that loads a grammar into a recognizer to ensure that it has appropriate language support. As an example, the following is a simple multi-lingual rule.

JSGF 允许支持多发音语法，语法内包含了多种语言的标示符，然而多数识别器只支持一种语言。加载适当的语法给识别器是由应用程序来完成的。下面是一个多发音规则的例子：

= no | nein | nao | non | nem;

Most recognizers have a comprehensive vocabulary for each language they support. However, it is never possible to include 100% of a language. For example, names, technical terms and foreign words are often missing from the vocabulary. For tokens missing from the vocabulary, there are three possibilities:

大多数识别器对它们所支持的语言都有一个综合的词汇表。但它不可能100%的包含了一种语言所有的词汇。例如，事物的名字，技术术语，外来词汇，和一些不常用的词汇。对于缺失的词汇可能有三种解决方法：

An application or user can add the token and pronunciation to the recognizer's vocabulary to ensure consistent recognition.

应用或用户可以添加标示符或发音到识别器的词汇表里，以确保内容的完整性。

Good recognizers are able to guess the pronunciation of many words not in the vocabulary.

一个好的识别器可以通过一些词汇猜测出一些没有在词汇表里的发音。

If neither of the previous points apply, the behavior is determined by the software interface of the recognizer. In most cases, an undefined token will be unspeakable (equivalent to ), or it will cause an error or exception. For the Java Speech API, undefined tokens are unspeakable.

如果上面两点都不可行，运作的执行由词汇表的软件接口来决定。在大多数情况下，一个没有定义的标示符是没有表示的(相当于)，或者引起一个错误或异常。对于Java Speech API，没有定义是没有表示的。

Tokens do not need to be normal written words of a language, assuming that the token is properly defined in the recognizers vocabulary. For example, to handle the two pronunciations of "read" (past tense sounds like "red", present tense sounds like "reed") an application could define two separate tokens "read_past" and "read_present" with appropriate pronunciations.

标示符不需要一种语言的正式的书面语，假设这个标示符已经正确的定义在了词汇表里了。例如，要支持read的两个发音（过去式像red,现在式像reed）程序应该分别定义两个标示符read_past和read_present和正确的发音。