Regex - Regular Expression Basic Syntax Reference

来源：互联网发布：数据库详细设计文档编辑：程序博客网时间：2024/05/14 21:27

CharactersCharacterDescriptionExampleAny character except [\^$.|?*+()All characters except the listed special characters match a single instance of themselves.{ and} are literal characters, unless they're part of a valid regular expression token (e.g. the{n} quantifier).a matches a\ (backslash) followed by any of [\^$.|?*+(){}A backslash escapes special characters to suppress their special meaning.\+ matches +\Q...\EMatches the characters between \Q and \E literally, suppressing the meaning of special characters.\Q+-*/\E matches +-*/\xFF where FF are 2 hexadecimal digitsMatches the character with the specified ASCII/ANSI value, which depends on the code page used. Can be used in character classes.\xA9 matches © when using the Latin-1 code page.\n, \r and \tMatch an LF character, CR character and a tab character respectively. Can be used in character classes.\r\n matches a DOS/Windows CRLF line break.\a, \e, \f and \vMatch a bell character (\x07), escape character (\x1B), form feed (\x0C) and vertical tab (\x0B) respectively. Can be used in character classes. \cA through \cZMatch an ASCII character Control+A through Control+Z, equivalent to\x01 through\x1A. Can be used in character classes.\cM\cJ matches a DOS/Windows CRLF line break.Character Classes or Character Sets [abc]CharacterDescriptionExample[ (opening square bracket)Starts a character class. A character class matches a single character out of all the possibilities offered by the character class. Inside a character class, different rules apply. The rules in this section are only valid inside character classes. The rules outside this section are not valid in character classes, except for a few character escapes that are indicated with "can be used inside character classes". Any character except ^-]\ add that character to the possible matches for the character class.All characters except the listed special characters.[abc] matches a, b or c\ (backslash) followed by any of ^-]\A backslash escapes special characters to suppress their special meaning.[\^\]] matches ^ or]- (hyphen) except immediately after the opening [Specifies a range of characters. (Specifies a hyphen if placed immediately after the opening[)[a-zA-Z0-9] matches any letter or digit^ (caret) immediately after the opening [Negates the character class, causing it to match a single characternot listed in the character class. (Specifies a caret if placed anywhere except after the opening[)[^a-d] matches x (any character except a, b, c or d)\d, \w and \sShorthand character classes matching digits, word characters (letters, digits, and underscores), and whitespace (spaces, tabs, and line breaks). Can be used inside and outside character classes.[\d\s] matches a character that is a digit or whitespace\D, \W and \SNegated versions of the above. Should be used only outside character classes. (Can be used inside, but that is confusing.)\D matches a character that is not a digit[\b]Inside a character class, \b is a backspace character.[\b\t] matches a backspace or tab characterDotCharacterDescriptionExample. (dot)Matches any single character except line break characters \r and \n. Most regex flavors have an option to make the dot match line break characters too.. matches x or (almost) any other characterAnchorsCharacterDescriptionExample^ (caret)Matches at the start of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the caret match after line breaks (i.e. at the start of a line in a file) as well.^. matches a in abc\ndef. Also matches d in "multi-line" mode.$ (dollar)Matches at the end of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the dollar match before line breaks (i.e. at the end of a line in a file) as well. Also matches before the very last line break if the string ends with a line break..$ matches f in abc\ndef. Also matches c in "multi-line" mode.\AMatches at the start of the string the regex pattern is applied to. Matches a position rather than a character. Never matches after line breaks.\A. matches a in abc\ZMatches at the end of the string the regex pattern is applied to. Matches a position rather than a character. Never matches before line breaks, except for the very last line break if the string ends with a line break..\Z matches f in abc\ndef\zMatches at the end of the string the regex pattern is applied to. Matches a position rather than a character. Never matches before line breaks..\z matches f in abc\ndefWord BoundariesCharacterDescriptionExample\bMatches at the position between a word character (anything matched by\w) and a non-word character (anything matched by[^\w] or \W) as well as at the start and/or end of the string if the first and/or last characters in the string are word characters..\b matches c in abc\BMatches at the position between two word characters (i.e the position between\w\w) as well as at the position between two non-word characters (i.e.\W\W).\B.\B matches b inabcAlternationCharacterDescriptionExample| (pipe)Causes the regex engine to match either the part on the left side, or the part on the right side. Can be strung together into a series of options.abc|def|xyz matches abc,def orxyz| (pipe)The pipe has the lowest precedence of all operators. Use grouping to alternate only part of the regular expression.abc(def|xyz) matches abcdef orabcxyzQuantifiersCharacterDescriptionExample? (question mark)Makes the preceding item optional. Greedy, so the optional item is included in the match if possible.abc? matches ab orabc??Makes the preceding item optional. Lazy, so the optional item is excluded in the match if possible. This construct is often excluded from documentation because of its limited use.abc?? matches ab orabc* (star)Repeats the previous item zero or more times. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is not matched at all.".*" matches "def" "ghi" inabc "def" "ghi" jkl*? (lazy star)Repeats the previous item zero or more times. Lazy, so the engine first attempts to skip the previous item, before trying permutations with ever increasing matches of the preceding item.".*?" matches "def" inabc "def" "ghi" jkl+ (plus)Repeats the previous item once or more. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only once.".+" matches "def" "ghi" inabc "def" "ghi" jkl+? (lazy plus)Repeats the previous item once or more. Lazy, so the engine first matches the previous item only once, before trying permutations with ever increasing matches of the preceding item.".+?" matches "def" inabc "def" "ghi" jkl{n} where n is an integer >= 1Repeats the previous item exactly n times.a{3} matches aaa{n,m} where n >= 0 and m >= nRepeats the previous item between n and m times. Greedy, so repeating m times is tried before reducing the repetition to n times.a{2,4} matches aaaa,aaa oraa{n,m}? where n >= 0 and m >= nRepeats the previous item between n and m times. Lazy, so repeating n times is tried before increasing the repetition to m times.a{2,4}? matches aa,aaa oraaaa{n,} where n >= 0Repeats the previous item at least n times. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only n times.a{2,} matches aaaaa inaaaaa{n,}? where n >= 0Repeats the previous item n or more times. Lazy, so the engine first matches the previous item n times, before trying permutations with ever increasing matches of the preceding item.a{2,}? matches aa inaaaaa

普通字符描述\将下一个字符标记为一个特殊字符、或一个原义字符、或一个后向引用、或一个八进制转义符。例如，’n’ 匹配字符 “n”。’\n’ 匹配一个换行符。序列 ‘\\’ 匹配 “\” 而 “\(” 则匹配 “(”。.匹配除 “\n” 之外的任何单个字符。要匹配包括 ‘\n’ 在内的任何字符，请使用象 ‘[.\n]‘ 的模式。x|y匹配 x 或 y。例如，’z|food’ 能匹配 “z” 或 “food”。’(z|f)ood’ 则匹配 “zood” 或 “food”。[xyz]字符集合。匹配所包含的任意一个字符。例如， ‘[abc]‘ 可以匹配 “plain” 中的 ‘a’。[^xyz]负值字符集合。匹配未包含的任意字符。例如， ‘[^abc]‘ 可以匹配 “plain” 中的’p'。[a-z]字符范围。匹配指定范围内的任意字符。例如，’[a-z]‘ 可以匹配 ‘a’ 到 ‘z’ 范围内的任意小写字母字符。[^a-z]负值字符范围。匹配任何不在指定范围内的任意字符。例如，’[^a-z]‘ 可以匹配任何不在 ‘a’ 到 ‘z’ 范围内的任意字符。\cx匹配由x指明的控制字符。例如， \cM 匹配一个 Control-M 或回车符。 x 的值必须为 A-Z 或 a-z 之一。否则，将 c 视为一个原义的 ‘c’ 字符。\d匹配一个数字字符。等价于 [0-9]。\D匹配一个非数字字符。等价于 [^0-9]。\w匹配包括下划线的任何单词字符。等价于’[A-Za-z0-9_]‘。\W匹配任何非单词字符。等价于 ‘[^A-Za-z0-9_]‘。\xn匹配 n，其中 n 为十六进制转义值。十六进制转义值必须为确定的两个数字长。例如， ‘\x41′ 匹配 “A”。’\x041′ 则等价于 ‘\x04′ & “1″。正则表达式中可以使用 ASCII 编码。.\num匹配 num，其中 num 是一个正整数。对所获取的匹配的引用。例如，’(.)\1′ 匹配两个连续的相同字符。\n标识一个八进制转义值或一个后向引用。如果 \n 之前至少 n 个获取的子表达式，则 n 为后向引用。否则，如果 n 为八进制数字 (0-7)，则n 为一个八进制转义值。\nm标识一个八进制转义值或一个后向引用。如果 \nm 之前至少有is preceded by at least nm 个获取得子表达式，则 nm 为后向引用。如果 \nm 之前至少有 n 个获取，则 n 为一个后跟文字 m的后向引用。如果前面的条件都不满足，若? n 和 m 均为八进制数字 (0-7)，则 \nm将匹配八进制转义值 nm。\nml如果 n 为八进制数字 (0-3)，且 m 和 l 均为八进制数字 (0-7)，则匹配八进制转义值 nml。\un匹配 n，其中 n 是一个用四个十六进制数字表示的Unicode 字符。例如， \u00A9 匹配版权符号 (?)。特殊字符说明$匹配输入字符串的结尾位置。如果设置了 RegExp 对象的 Multiline 属性，则 $ 也匹配 ‘\n’ 或 ‘\r’。要匹配 $ 字符本身，请使用 \$。( )标记一个子表达式的开始和结束位置。子表达式可以获取供以后使用。要匹配这些字符，请使用 $ 和 $。*匹配前面的子表达式零次或多次。要匹配 * 字符，请使用 \*。+匹配前面的子表达式一次或多次。要匹配 + 字符，请使用 +。.匹配除换行符 \n之外的任何单字符。要匹配 .，请使用 \。[标记一个中括号表达式的开始。要匹配 [，请使用 \[。?匹配前面的子表达式零次或一次，或指明一个非贪婪限定符。要匹配 ? 字符，请使用 \?。\将下一个字符标记为或特殊字符、或原义字符、或后向引用、或八进制转义符。例如， 'n' 匹配字符 'n'。'\n' 匹配换行符。序列 '\' 匹配 "\"，而 '\(' 则匹配 "("。^匹配输入字符串的开始位置，除非在方括号表达式中使用，此时它表示不接受该字符集合。要匹配 ^ 字符本身，请使用\^。{标记限定符表达式的开始。要匹配 {，请使用 \{。|指明两项之间的一个选择。要匹配 |，请使用 \|。

非打印字符含义\cx匹配由x指明的控制字符。例如， \cM 匹配一个 Control-M 或回车符。 x 的值必须为 A-Z 或 a-z 之一。否则，将 c 视为一个原义的 'c' 字符。\f匹配一个换页符。等价于 \x0c 和 \cL。n匹配一个换行符。等价于 x0a 和 cJ。r匹配一个回车符。等价于 x0d 和 cM。\s匹配任何空白字符，包括空格、制表符、换页符等等。等价于 [?\f\n\r\t\v]。\S匹配任何非空白字符。等价于 [^?\f\n\r\t\v]。\t匹配一个制表符。等价于 \x09 和 \cI。\v匹配一个垂直制表符。等价于 \x0b 和 \cK。

限定符描述*匹配前面的子表达式零次或多次。例如，zo* 能匹配 “z” 以及 “zoo”。 * 等价于{0,}。+匹配前面的子表达式一次或多次。例如，’zo+’ 能匹配 “zo” 以及 “zoo”，但不能匹配 “z”。+ 等价于 {1,}。?匹配前面的子表达式零次或一次。例如，”do(es)?” 可以匹配 “do” 或 “does” 中的”do” 。? 等价于 {0,1}。{n}n 是一个非负整数。匹配确定的 n 次。例如，’o{2}’ 不能匹配 “Bob” 中的 ‘o’，但是能匹配 “food” 中的两个 o。{n,}n 是一个非负整数。至少匹配n 次。例如，’o{2,}’ 不能匹配 “Bob” 中的 ‘o’，但能匹配 “foooood” 中的所有 o。’o{1,}’ 等价于 ‘o+’。’o{0,}’ 则等价于 ‘o*’。{n,m}m 和 n 均为非负整数，其中n <= m。最少匹配 n 次且最多匹配 m 次。刘， “o{1,3}” 将匹配 “fooooood” 中的前三个 o。’o{0,1}’ 等价于 ‘o?’。请注意在逗号和两个数之间不能有空格。

定位符描述^匹配输入字符串的开始位置。如果设置了 RegExp 对象的 Multiline 属性，^ 也匹配 ‘\n’ 或 ‘\r’ 之后的位置。$匹配输入字符串的结束位置。如果设置了RegExp 对象的 Multiline 属性，$ 也匹配 ‘\n’ 或 ‘\r’ 之前的位置。\b匹配一个单词边界，也就是指单词和空格间的位置。\B匹配非单词边界。

From ： http://www.regular-expressions.info/reference.html & http://js8.in/473.html