Perl Regular Expression Syntax Perl的正则表达式语法

来源:互联网 发布:overlay网络 编辑:程序博客网 时间:2024/05/17 04:10


Perl Regular Expression Syntax Perl的正则表达式语法 
Synopsis 摘要 
The Perl regular expression syntax is based on that used by the programming language Perl . Perl regular expressions are the default behavior in Boost.Regex or you can pass the flag perl to the basic_regex constructor, for example:
Perl正则表达式语法基于编程语言Perl的使用。Perl正则表达式是Boost.Regex的默认行为, 或者你可以将标记 perl 传入 basic_regex 的构造,例如:

// e1 is a case sensitive Perl regular expression: 
// since Perl is the default option there's no need to explicitly specify the syntax used here:
boost::regex e1(my_expression);
// e2 a case insensitive Perl regular expression:
boost::regex e2(my_expression, boost::regex::perl|boost::regex::icase);

Perl Regular Expression Syntax Perl的正则表达式语法 
In Perl regular expressions, all characters match themselves except for the following special characters:
在Perl正则表达式中,除了下面的特殊字符外,所有的字符都匹配自己:

.[{()\*+?|^$
Wildcard 通配符 
The single character '.' when used outside of a character set will match any single character except:
单字符'.'在字符集之外使用时可以匹配任意单字符,除了:

The NULL character when the flag match_not_dot_null is passed to the matching algorithms.
NULL字符,当 标志match_not_dot_null 被传入匹配算法中时。 
The newline character when the flag match_not_dot_newline is passed to the matching algorithms.
换行字符,当 标记match_not_dot_newline 被传入匹配算法中时。 
Anchors 锚点 
A '^' character shall match the start of a line.
'^'字符会匹配行的起始。

A '$' character shall match the end of a line.
'$'字符会匹配行的终止。

Marked sub-expressions 被标记的子表达式 
A section beginning ( and ending ) acts as a marked sub-expression. Whatever matched the sub-expression is split out in a separate field by the matching algorithms. Marked sub-expressions can also repeated, or referred to by a back-reference.
开始的 ( 和终止的 ) 之间的部分是被标记的子表达式。匹配算法会将子表达式匹配的部分分离成独立的部分。 标记子表达式也可以被重复,或后向引用。

Non-marking grouping 非标记组 
A marked sub-expression is useful to lexically group part of a regular expression, but has the side-effect of spitting out an extra field in the result. As an alternative you can lexically group part of a regular expression, without generating a marked sub-expression by using (?: and ) , for example (?:ab)+ will repeat ab without splitting out any separate sub-expressions.
标记子表达式对于正则表达式中的成组文字部分是非常有用的,但将结果分组是有副作用的。 作为选择,可以通过 (?: 和 ) 产生文字分组,但不产生标记子表达式,例如 (?:ab)+ 会重复 ab 但并不分隔出单独的子表达式。

Repeats 重复 
Any atom (a single character, a marked sub-expression, or a character class) can be repeated with the *, +, ?, and {} operators.
任意原子(单个字符,一个标记子表达式或一个字符组)可以通过 *, +, ? 和 {} 操作符重复。

The * operator will match the preceding atom zero or more times, for example the expression a*b will match any of the following:
* 操作符会匹配前面的原子零次或多次,例如表达式 a*b 可以匹配下面的:

b
ab
aaaaaaaab

The + operator will match the preceding atom one or more times, for example the expression a+b will match any of the following:
+ 操作符会匹配前面的原子一次或多次, 例如表达式 a+b 可以匹配下面的:

ab
aaaaaaaab

But will not match:
但不会匹配:

b

The ? operator will match the preceding atom zero or one times, for example the expression ca?b will match any of the following:
? 操作符会匹配前面的原子零次或一次, 例如表达式 ca?b 会匹配下面的:

cb
cab

But will not match:
但不会匹配:

caab

An atom can also be repeated with a bounded repeat:
原子同样可以被重复有界次数:

a{n} Matches 'a' repeated exactly n times.
a{n} 匹配'a'重复n次。

a{n,} Matches 'a' repeated n or more times.
a{n,} 匹配'a'重复n次或更多次。

a{n, m} Matches 'a' repeated between n and m times inclusive.
a{n, m} 匹配'a'重复n次到m次之间。

For example:
例如:

^a{2,3}$
Will match either of:
会匹配如下:

aa
aaa

But neither of:
但不会匹配:

a
aaaa

It is an error to use a repeat operator, if the preceding construct can not be repeated, for example:
如果前面的结构不能被重复,那么使用重复操作符是一个错误,例如:

a(*)

Will raise an error, as there is nothing for the * operator to be applied to.
但报告一个错误,因为 * 操作符没有可以应用的对象。

Non greedy repeats 非贪婪重复 
The normal repeat operators are "greedy", that is to say they will consume as much input as possible. There are non-greedy versions available that will consume as little input as possible while still producing a match.
通常的重复是"贪婪的",也说明说它们会消耗尽可能多的输入。同样存在非贪婪版本,生成匹配时会消耗尽可能少的输入。

*? Matches the previous atom zero or more times, while consuming as little input as possible.
*? 匹配前面的原子零次或多次,但尽可能少地消耗输入。

+? Matches the previous atom one or more times, while consuming as little input as possible.
+? 匹配前面的原子一次或多次,但尽可能少地消耗输入。

?? Matches the previous atom zero or one times, while consuming as little input as possible.
?? 匹配前面的原子零次或一次,但尽可能少地消耗输入。

{n,}? Matches the previous atom n or more times, while consuming as little input as possible.
{n,}? 匹配前面的原子n次或更多次,但尽可能少地消耗输入。

{n,m}? Matches the previous atom between n and m times, while consuming as little input as possible.
{n,m}? 匹配前面的原子n次到m次,但尽可能少地消耗输入。

Pocessive repeats 
By default when a repeated patten does not match then the engine will backtrack until a match is found. However, this behaviour can sometime be undesireable so there are also "pocessive" repeats: these match as much as possible and do not then allow backtracking if the rest of the expression fails to match.
缺省情况下,当一个重复模式不能被匹配时,引擎将回溯直至找到一个匹配。但是,有时候这种行为不是用户所期望的, 因此还有一种"pocessive"重复:它尽可能多地进行匹配且当剩下的表达式不能匹配时不允许进行回溯。

*+ Matches the previous atom zero or more times, while giving nothing back.
*+ 匹配前一个原子零次或多次,但不退回。

++ Matches the previous atom one or more times, while giving nothing back.
++ 匹配前一个原子一次或多次,但不退回。

?+ Matches the previous atom zero or one times, while giving nothing back.
?+ 匹配前一个原子零次或一次,但不退回。

{n,}+ Matches the previous atom n or more times, while giving nothing back.
{n,}+ 匹配前一个原子n次或多次,但不退回。

{n,m}+ Matches the previous atom between n and m times, while giving nothing back.
{n,m}+ 匹配前一个原子n至m次,但不退回。

Back references 后向引用 
An escape character followed by a digit n, where n is in the range 1-9, matches the same string that was matched by sub-expression n. For example the expression:
一个转义字符跟着一个数字 n/,/n 的范围是1-9,匹配被子表达式 n 匹配的相同字符串。例如表达式:

^(a*).*\1$
Will match the string:
匹配字符串:

aaabbaaa

But not the string:
但不匹配:

aaabba

You can also use the \g escape for the same function, for example:
你也可以用 \g 转义符达到相同功效,例如:

Escape 转义符 
 Meaning 意义 
 
\g1 
 Match whatever matched sub-expression 1
匹配子表达式1的内容 
 
\g{1} 
 Match whatever matched sub-expression 1: this form allows for safer parsing of the expression in cases like \g{1}2 or for indexes higher than 9 as in \g{1234}
匹配子表达式1的内容:该方式可以在以下情形时更安全地对表达式进行分析:如 \g{1}2 或大于9的索引如 \g{1234} 
 
\g-1 
 Match whatever matched the last opened sub-expression
匹配最后一个被打开的子表达式的内容 
 
\g{-2} 
 Match whatever matched the last but one opened sub-expression
匹配倒数第二个被打开的子表达式的内容 
 
\g{one} 
 Match whatever matched the sub-expression named "one"
匹配名为"one"的子表达式的内容 
 

Finally the \k escape can be used to refer to named subexpressions, for example \k<two> will match whatever matched the subexpression named "two".
最后,\k 转义符可用于引用命名子表达式,例如 \k<two> 将匹配名为 "two" 的子表达式的内容。

Alternation 选择 
The | operator will match either of its arguments, so for example: abc|def will match either "abc" or "def". 
| 操作符匹配它的参数之一, 例如:abc|def 会匹配 "abc" 或 "def"。

Parenthesis can be used to group alternations, for example: ab(d|ef) will match either of "abd" or "abef".
括号可以用来对选择进行分组,例如:ab(d|ef) 会匹配 "abd" 或 "abef"。

Empty alternatives are not allowed (these are almost always a mistake), but if you really want an empty alternative use (?:) as a placeholder, for example:
空的选择是不允许的(这通常都是错误),但如果你真的需要一个空的选择,可以使用 (?:) 作为占位符,例如:

|abc is not a valid expression, but
|abc 不是有效的表达式,但

(?:)|abc is and is equivalent, also the expression:
(?:)|abc 是有效的表达式,并且与以下表达式:

(?:abc)?? has exactly the same effect.
(?:abc)?? 有完全相同的作用

Character sets 字符集 
A character set is a bracket-expression starting with [ and ending with ], it defines a set of characters, and matches any single character that is a member of that set.
字符集是一个以 [ 开始,以 ] 结束的方括号表达式,它定义了一个字符的集合,匹配集合中的任意单个字符。

A bracket expression may contain any combination of the following:
方括号表达式可以包含下面的组合:

Single characters 单个字符 
For example [abc], will match any of the characters 'a', 'b', or 'c'.
例如 [abc] 可以匹配 'a'、'b' 或 'c'。

Character ranges 字符范围 
For example [a-c] will match any single character in the range 'a' to 'c'. By default, for Perl regular expressions, a character x is within the range y to z, if the code point of the character lies within the codepoints of the endpoints of the range. Alternatively, if you set the collate flag when constructing the regular expression, then ranges are locale sensitive.
例如 [a-c] 可以匹配'a'到'c'范围内的任意单个字符。对于缺省的Perl正则表达式, 如果字符x的编码点在范围两端字符y和z的编码点之间,就称字符x在范围y到z的中间。 另外,如果在构造正则表达式时设置了 collate 标记, 那范围就是locale相关的。

Negation 否定 
If the bracket-expression begins with the ^ character, then it matches the complement of the characters it contains, for example [^a-c] matches any character that is not in the range a-c.
如果方括号表达式以 ^ 字符开始,那么它匹配包含字符的补集,例如 [^a-c] 匹配不在 a-c 范围内的任意字符。

Character classes 字符类 
An expression of the form [[:name:]] matches the named character class "name", for example [[:lower:]] matches any lower case character. See character class names.
形如 [[:name:]] 的表达式匹配命名字符类"name",例如 [[:lower:]] 任意小写字符。 参见 字符类名称。

Collating Elements 对照元素 
An expression of the form [[.col.]] matches the collating element col. A collating element is any single character, or any sequence of characters that collates as a single unit. Collating elements may also be used as the end point of a range, for example: [[.ae.]-c] matches the character sequence "ae", plus any single character in the range "ae"-c, assuming that "ae" is treated as a single collating element in the current locale.
形如 [[.col.]] 的表达式匹配对照元素 /col/。对照元素是任意的单个字符,或对应于某个单个单元的字符序列。 对照还可以用作范围的端点,例如 [[.ae.]-c] 匹配字符序列"ae",和任意的单个字符在范围"ae"到c之间, 其中"ae"被当前locale处理为单个对照元素。

As an extension, a collating element may also be specified via it's symbolic name, for example:
作为扩展,对照元素可以通过其 符号名 指定,例如:

[[.NUL.]]

matches a \0 character.
匹配一个 \0 字符。

Equivalence classes 等价类 
An expression of the form [[=col=]], matches any character or collating element whose primary sort key is the same as that for collating element col, as with collating elements the name col may be a symbolic name. A primary sort key is one that ignores case, accentation, or locale-specific tailorings; so for example [[=a=]] matches any of the characters: a, à, á, ?, ?, ?, ?, A, à, á, a, ?, ? and ?. Unfortunately implementation of this is reliant on the platform's collation and localisation support; this feature can not be relied upon to work portably across all platforms, or even all locales on one platform.
形如 [[=col=]] 的表达式匹配主排序关键字等同于对照元素 col 的任意字符或对照元素,其中名字为 col 的对照元素可以是一个 符号名。主排序关键字忽略大小写、重音或特定区域(locale)的裁剪(tailorings); 所以如 [[=a=]] 匹配下面的字符:a, à, á, ?, ?, ?, ?, A, à, á, a, ?, ? 和 ?。 不幸的是这个实现依赖于平台的对照(collation)和地区(localization)支持; 这个特性并不能很好地可移植工作于所有的平台,甚至一个平台上的所有区域(locale)。

Escaped Characters 转义字符 
All the escape sequences that match a single character, or a single character class are permitted within a character class definition. For example [\[\]] would match either of [ or ] while [\W\d] would match any character that is either a "digit", or is not a "word" character.
任意匹配单个字符或单个字符类的转义序列都可以定义在字符类中。 例如[\[\]] 可以匹配 [ 或 ] ,而 [\W\d] 可以匹配任何不是"数字"/或不是/"单词"的字符。

Combinations 组合 
All of the above can be combined in one character set declaration, for example: [[:digit:]a-c[.NUL.]].
所有上面的都可以在一个字符类声明中被组合,例如 [[:digit:]a-c[.NUL.]]。

Escapes 转义符 
Any special character preceded by an escape shall match itself.
任意特殊字符前面加转义符都匹配自己。

The following escape sequences are all synonyms for single characters:
下面的转义序列都和单个字符同义:

Escape 转义符 
 Character 字符 
 
\a 
 \a 
 
\e 
 0x1B 
 
\f 
 \f 
 
\n 
 \n 
 
\r 
 \r 
 
\t 
 \t 
 
\v 
 \v 
 
\b 
 \b (but only inside a character class declaration).
\b (但仅在字符类声明中使用) 
 
\cX 
 An ASCII escape sequence - the character whose code point is X % 32
一个ASCII转义序列 - 字符码点为 X % 32 
 
\xdd 
 A hexadecimal escape sequence - matches the single character whose code point is 0xdd.
一个十六进制转义序列 - 匹配码点为0xdd的单个字符。 
 
\x{dddd} 
 A hexadecimal escape sequence - matches the single character whose code point is 0xdddd.
一个十六进制转义序列 - 匹配码点为0xdddd的单个字符。 
 
\0ddd 
 An octal escape sequence - matches the single character whose code point is 0ddd.
八进制转义序列 - 匹配码点为0ddd的单个字符。 
 
\N{name} 
 Matches the single character which has the symbolic name name. For example \N{newline} matches the single character \n.
匹配 符号名 为 name 的单个字符。例如 \N{newline} 匹配单个字符 \n。 
 

"Single character" character classes: "单字符"字符类 
Any escaped character x, if x is the name of a character class shall match any character that is a member of that class, and any escaped character X, if x is the name of a character class, shall match any character not in that class.
任意被转义的字符 x/,如果 /x 是一个字符类的名称,会匹配这个字符类成员的任意字符。 任意被转义的字符 X, 如果 x 是一个字符类的名称,则会匹配不在这个类中的任意字符。

The following are supported by default:
下面是默认支持的:

Escape sequence 转义序列 
 Equivalent to 等价于 
 
\d 
 [[:digit:]] 
 
\l 
 [[:lower:]] 
 
\s 
 [[:space:]] 
 
\u 
 [[:upper:]] 
 
\w 
 [[:word:]] 
 
\h 
 Horizontal whitespace
水平空白 
 
\v 
 Vertical whitespace
垂直空白 
 
\D 
 [^[:digit:]] 
 
\L 
 [^[:lower:]] 
 
\S 
 [^[:space:]] 
 
\U 
 [^[:upper:]] 
 
\W 
 [^[:word:]] 
 
\H 
 Not Horizontal whitespace
非水平空白 
 
\V 
 Not Vertical whitespace
非垂直空白 
 

Character Properties 字符属性 
The character property names in the following table are all equivalent to the names used in character classes.
下面表格中的字符属性名称都等价于 在字符类中使用的名字。

Form 形式 
 Description 说明 
 Equivalent character set form 等价的字符集形式 
 
\pX 
 Matches any character that has the property X.
匹配任意具有属性X的字符。 
 [[:X:]] 
 
\p{Name} 
 Matches any character that has the property Name.
匹配任意具有属性Name的字符。 
 [[:Name:]] 
 
\PX 
 Matches any character that does not have the property X.
匹配任意不具有属性X的字符。 
 [^[:X:]] 
 
\P{Name} 
 Matches any character that does not have the property Name.
匹配任意不具有属性Name的字符。 
 [^[:Name:]] 
 

For example \pd matches any "digit" character, as does \p{digit}.
例如 \pd 匹配任意的"数字"字符,和 \p{digit} 作用是一样的。

Word Boundaries 单词边界 
The following escape sequences match the boundaries of words:
下面的转义序列匹配单词的边界:

\< Matches the start of a word.
\< 匹配单词的起点。

\> Matches the end of a word.
\> 匹配单词的终点。

\b Matches a word boundary (the start or end of a word).
\b 匹配单词的边界(起点或终点)。

\B Matches only when not at a word boundary.
\B 只有不在单词边界时才匹配。

Buffer boundaries 缓冲区边界 
The following match only at buffer boundaries: a "buffer" in this context is the whole of the input text that is being matched against (note that ^ and $ may match embedded newlines within the text).
下面的转义序列匹配缓冲区边界:这里的"缓冲区"指用于匹配的全部输入文本(注意,^和$可以用于匹配文本中的行)。

\` Matches at the start of a buffer only.
\` 匹配缓冲区的起点。

\' Matches at the end of a buffer only.
\' 匹配缓冲区的终点

\A Matches at the start of a buffer only (the same as \`).
\A 匹配缓冲区的起点(同 \` 一样)。

\z Matches at the end of a buffer only (the same as \').
\z 匹配缓冲区的终点(同 \' 一样)。

\Z Matches an optional sequence of newlines at the end of a buffer: equivalent to the regular expression \n*\z
\Z 匹配缓冲区结尾可能存在的空行:等价于正则表达式 \n*\z

Continuation Escape 持续转义 
The sequence \G matches only at the end of the last match found, or at the start of the text being matched if no previous match was found. This escape useful if you're iterating over the matches contained within a text, and you want each subsequence match to start where the last one ended.
序列 \G 只在上次匹配结尾或匹配文本的起点(如果前面没有匹配)。 当你要迭代文本中所有的匹配,并且每个子序列都从上一次结束时开始匹配的话,这个转义是很有用的。

Quoting escape 引用转义 
The escape sequence \Q begins a "quoted sequence": all the subsequent characters are treated as literals, until either the end of the regular expression or \E is found. For example the expression: \Q\*+\Ea+ would match either of:
转义序列 \Q 开始一个"被引用序列":所有后面的字符都被当作字面对待,除非正则表达式结束或碰到 \E。 例如,表达式:\Q\*+\Ea+ 可以匹配如下:

\*+a
\*+aaa

Unicode escapes Unicode转义 
\C Matches a single code point: in Boost regex this has exactly the same effect as a "." operator.
\C 匹配一个单一的码表:在 Boost.Regex 中这和"."操作符的作用是完全相同的。

\X Matches a combining character sequence: that is any non-combining character followed by a sequence of zero or more combining characters.
\X 匹配一个组合字符序列:任意非组合字符跟上一个零或多个组合字符的序列。

Matching Line Endings 匹配行末符 
The escape sequence \R matches any line ending character sequence, specifically it is identical to the expression (?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}]).
转义序列 \R 匹配任何行末符序列,特别地,它等同于表达式 (?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}]).

Keeping back some text 回退一些文本 
\K Resets the start location of $0 to the current text position: in other words everything to the left of \K is "kept back" and does not form part of the regular expression match. $` is updated accordingly.
\K 将$0的开始位置重置为当前文本位置:换言之,\K 左边的所有东西被"退回"且不作为该正则表达式的匹配部分。 $` 也被相应更新。

For example foo\Kbar matched against the text "foobar" would return the match "bar" for $0 and "foo" for $`. This can be used to simulate variable width lookbehind assertions.
例如,foo\Kbar 用于匹配文本 "foobar" 时,将对 $0 返回匹配 "bar",对 $` 返回 "foo"。 这可以被用于模拟可变宽度的后向环视断言。

Any other escape 其它转义 
Any other escape sequence matches the character that is escaped, for example \@ matches a literal '@'.
其它转义序列匹配被转义的字符,例如 \@匹配 '@'。

Perl Extended Patterns Perl扩展模式 
Perl-specific extensions to the regular expression syntax all start with (?.
Perl扩展的正则表达式语法都以 (? 开始。

Named Subexpressions 命名子表达式 
You can create a named subexpression using:
你可以如下创建一个命名子表达式:

(?<NAME>expression)

Which can be then be refered to by the name NAME. Alternatively you can delimit the name using 'NAME' as in:
然后,该表达式即可通过名字 NAME 来引用。或者你也可以用 'NAME' 来分隔该名字,如:

(?'NAME'expression)

These named subexpressions can be refered to in a backreference using either \g{NAME} or \k<NAME> and can also be refered to by name in a Perl format string for search and replace operations, or in the match_results member functions.
这些命名子表达式可以用 \g{NAME} 或 \k<NAME> 来后向引用,也可以在一个 Perl 格式化串或在 match_results 成员函数中通过名字来引用以进行搜索和替换的操作。

Comments 注释 
(?# ... ) is treated as a comment, it's contents are ignored.
(?# ... ) 作为注释,里面的内容被忽略。

Modifiers 修饰 
(?imsx-imsx ... ) alters which of the perl modifiers are in effect within the pattern, changes take effect from the point that the block is first seen and extend to any enclosing ). Letters before a '-' turn that perl modifier on, letters afterward, turn it off.
(?imsx-imsx ... ) 改变在模式中perl修饰符是否起作用,从块第一次被遇到的点开始起作用,直到遇到结束 =)=。 '-' 之前的字母打开perl修饰符,后面的字母关闭。

(?imsx-imsx:pattern) applies the specified modifiers to pattern only.
(?imsx-imsx:pattern) 只将特定的修饰符应用于指定的模式。

Non-marking groups 非标记分组 
(?:pattern) lexically groups pattern, without generating an additional sub-expression.
(?:pattern) 进行字面分组,但不产生额外的子表达式。

Branch reset 分支重置 
(?|pattern) resets the subexpression count at the start of each "|" alternative within pattern.
(?|pattern) 在 pattern 内的每个 "|" 选择符开始重置子表达式计数。

The sub-expression count following this construct is that of whichever branch had the largest number of sub-expressions. This construct is useful when you want to capture one of a number of alternative matches in a single sub-expression index. 
该结构之后的子表达式计数由具有最大子表达式数量的分支决定。当你想用单个子表达式索引捕获多个可选匹配之一的时候,可使用该结构。

In the following example the index of each sub-expression is shown below the expression:
在以下例子中,每个子表达式的索引在表达式之下给出:

# before  ---------------branch-reset----------- after        
/ ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
# 1            2         2  3        2     3     4

Lookahead 前向匹配 
(?=pattern) consumes zero characters, only if pattern matches.
(?=pattern) 当模式匹配时成功,但不消耗字符。

(?!pattern) consumes zero characters, only if pattern does not match.
(?!pattern) 当模式不匹配时成功,但不消耗字符。

Lookahead is typically used to create the logical AND of two regular expressions, for example if a password must contain a lower case letter, an upper case letter, a punctuation symbol, and be at least 6 characters long, then the expression:
前向匹配的典型用法是创建两个正则表达式的逻辑与,例如一个密码必须包含一个小写字符, 一个大写字符,一个标点符号,长度至少6个字符,那么表达式是:

(?=.*[[:lower:]])(?=.*[[:upper:]])(?=.*[[:punct:]]).{6,}

could be used to validate the password.
能被用来验证密码。

Lookbehind 后向匹配 
(?<=pattern) consumes zero characters, only if pattern could be matched against the characters preceding the current position (pattern must be of fixed length).
(?<=pattern) 只有当模式能够被当前位置之前的字符匹配时才成功(模式必须是固定长度),但并不消耗字符。

(?<!pattern) consumes zero characters, only if pattern could not be matched against the characters preceding the current position (pattern must be of fixed length).
(?<!pattern) 只有当模式不能够被当前位置之前的字符匹配时才成功(模式必须是固定长度),但并不消耗字符。

Independent sub-expressions 独立子表达式 
(?>pattern) pattern is matched independently of the surrounding patterns, the expression will never backtrack into pattern. Independent sub-expressions are typically used to improve performance; only the best possible match for pattern will be considered, if this doesn't allow the expression as a whole to match then no match is found at all.
(?>pattern) pattern 独立于外围模式地被匹配,表达式决不会被回溯到 pattern 中。 独立子表达式通常用于改善性能;只有最好可能的匹配才会被考虑,如果表达式不能作为整体被匹配,那就没有匹配。

Recursive Expressions 递归表达式 
(?N) (?-N) (?+N) (?R) (?0)

(?R) and (?0) recurse to the start of the entire pattern.
(?R) and (?0) 递归至整个模板的开始。

(?N) executes sub-expression N recursively, for example (?2) will recurse to sub-expression 2.
(?N) 重复执行子表达式/N/,例如 (?2) 将递归至子表达式2。

(?-N) and (?+N) are relative recursions, so for example (?-1) recurses to the last sub-expression to be declared, and (?+1) recurses to the next sub-expression to be declared.
(?-N) 和 (?+N) 是相对递归,因此如 (?-1) 将递归至被声明的最后一个子表达式,而 (?+1) 则递归至被声明的下一个子表达式。

Conditional Expressions 条件表达式 
(?(condition)yes-pattern|no-pattern) attempts to match yes-pattern if the condition is true, otherwise attempts to match no-pattern.
(?(condition)yes-pattern|no-pattern) 当 condition 成立时试图匹配 /yes-pattern/,否则试图匹配 /no-pattern/。

(?(condition)yes-pattern) attempts to match yes-pattern if the condition is true, otherwise fails.
(?(condition)yes-pattern) 当 condition 成立时试图匹配 /yes-pattern/,否则直接失败。

condition may be either: a forward lookahead assert, the index of a marked sub-expression (the condition becomes true if the sub-expression has been matched), or an index of a recursion (the condition become true if we are executing directly inside the specified recursion).
condition 可以是一个前向匹配断言, 或者是一个标记子表达式序号(当子表达式匹配时条件为真), 或是一个递归的索引(当我们在指定递归中直接执行时条件为真)。

Here is a summary of the possible predicates:
以下是可能的谓词:

(?(?=assert)yes-pattern|no-pattern) Executes yes-pattern if the forward look-ahead assert matches, otherwise executes no-pattern.
(?(?=assert)yes-pattern|no-pattern) 如果前向匹配成功则执行 /yes-pattern/,否则执行 /no-pattern/。 
(?(?!assert)yes-pattern|no-pattern) Executes yes-pattern if the forward look-ahead assert does not match, otherwise executes no-pattern.
(?(?!assert)yes-pattern|no-pattern) 如果前向匹配不成功则执行 /yes-pattern/,否则执行 /no-pattern/。 
(?(R)yes-pattern|no-pattern) Executes yes-pattern if we are executing inside a recursion, otherwise executes no-pattern.
(?(R)yes-pattern|no-pattern) 如果是在某个递归的内部执行,则执行 /yes-pattern/,否则执行 /no-pattern/。 
(?(RN)yes-pattern|no-pattern) Executes yes-pattern if we are executing inside a recursion to sub-expression N, otherwise executes no-pattern.
(?(RN)yes-pattern|no-pattern) 如果是在某个到子表达式/N/的递归内部执行,则执行 /yes-pattern/,否则执行 /no-pattern/。 
(?(DEFINE)never-exectuted-pattern) Defines a block of code that is never executed and matches no characters: this is usually used to define one or more named sub-expressions which are refered to from elsewhere in the pattern.
(?(DEFINE)never-exectuted-pattern) 定义一个从不执行也不匹配任何字符的代码块: 通常用于定义一个或多个从模板其它地方引用的命名子表达式。 
Operator precedence 操作符优先级 
The order of precedence for of operators is as follows:
操作符优先级如下:

Collation-related bracket symbols [==] [::] [..]
对照相关的方括号字符 [==] [::] [..] 
Escaped characters \
转义字符 \ 
Character set (bracket expression) []
字符集(方括号表达式) [] 
Grouping ()
分组 () 
Single-character-ERE duplication * + ? {m,n}
单字符-ERE重复 * + ? {m,n} 
Concatenation 
连接 
Anchoring ^$
锚点 ^$ 
Alternation |
选择 | 
What gets matched 什么被匹配 
If you view the regular expression as a directed (possibly cyclic) graph, then the best match found is the first match found by a depth-first-search performed on that graph, while matching the input text.
如果你将正则表达式当作有向(可能循环)图,那么最优匹配就是在匹配输入文本过程中按照深度优先搜索找到的第一个匹配。

Alternatively:
或者说:

The best match found is the leftmost match, with individual elements matched as follows;
或者说最优匹配就是当单个元素如下匹配时找到的 最左匹配。

Construct 结构 
 What gets matched 被匹配的 
 
AtomA AtomB 
 Locates the best match for AtomA that has a following match for AtomB.
最佳匹配定位于 AtomA 后跟 AtomB 的匹配。 
 
Expression1 | Expression2 
 If Expresion1 can be matched then returns that match, otherwise attempts to match Expression2.
如果 Expresion1 能够匹配就返回这个匹配,否则试图匹配 /Expression2/。 
 
S{N} 
 Matches S repeated exactly N times.
匹配/S/精确重复/N/次。 
 
S{N,M} 
 Matches S repeated between N and M times, and as many times as possible.
匹配/S/重复/N/到/M/次之间,并尽可能多地重复。 
 
S{N,M}? 
 Matches S repeated between N and M times, and as few times as possible.
匹配/S/重复/N/到/M/次之间,并尽可能少地重复。 
 
S?, S*, S+ 
 The same as S{0,1}, S{0,UINT_MAX}, S{1,UINT_MAX} respectively.
分别等同于 S{0,`}, S{0,UINT_MAX}, S{1,UINT_MAX} 
 
S??, S*?, S+? 
 The same as S{0,1}?, S{0,UINT_MAX}?, S{1,UINT_MAX}? respectively.
分别等同于 S{0,1}?, S{0,UINT_MAX}?, S{1,UINT_MAX}? 
 
(?>S) 
 Matches the best match for S, and only that.
匹配且只匹配/S/的最佳匹配。 
 
(?=S), (?<=S) 
 Matches only the best match for S (this is only visible if there are capturing parenthesis within S).
只匹配/S/的最佳匹配(只有当/S/中有捕获用括号时才可见)。 
 
(?!S), (?<!S) 
 Considers only whether a match for S exists or not.
只有当/S/匹配或不匹配时才继续考虑。 
 
(?(condition)yes-pattern | no-pattern) 
 If condition is true, then only yes-pattern is considered, otherwise only no-pattern is considered.
如果条件为真,只有 yes-pattern 被考虑,否则只有 no-pattern 被考虑。 
 

Variations 变体 
The options normal, ECMAScript, JavaScript and JScript are all synonyms for perl.
选项 =normal=,=ECMAScript=,JavaScript 和 JScript 都是 perl 的同义词。

Options 选项 
There are a variety of flags that may be combined with the perl option when constructing the regular expression, in particular note that the newline_alt option alters the syntax, while the collate, nosubs and icase options modify how the case and locale sensitivity are to be applied.
构造正则表达式时有 一系列标签 可以和 perl 选项组合使用, 尤其注意的是 newline_alt 选项更改语法,而 =collate=、nosubs 和 icase 选项影响大小写和locale敏感内容如何被应用。

Pattern Modifiers 模式修饰符 
The perl smix modifiers can either be applied using a (?smix-smix) prefix to the regular expression, or with one of the regex-compile time flags no_mod_m, mod_x, mod_s, and no_mod_s.
perl的 smix 修饰符可以通过往正则表达式中添加 (?smix-smix) 前缀,或使用 regex编译时标签 nno_mod_m, mod_x, mod_s 和 no_mod_s 之一。

References 参考手册 
Perl 5.8.

 Copyright ? 1998 -2007 John Maddock
Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) 
 

--------------------------------------------------------------------------------

 http://www.cppprog.com/boost_doc/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

原创粉丝点击