正则表达式2

来源:互联网 发布:mac怎么关闭dashboard 编辑:程序博客网 时间:2024/05/29 11:42

正则表达式

Read content offline
  • 历史 
  • 编辑

TABLE OF CONTENTS

  1. Creating a Regular Expression
  2. 编写一个正则表达式的模式
    1. 使用简单的模式
    2. 使用特殊字符
    3. Using Parentheses
  3. Working with Regular Expressions
    1. Using Parenthesized Substring Matches
      1. Example 1
    2. Advanced Searching With Flags
  4. Examples
    1. Changing the Order in an Input String
    2. Using Special Characters to Verify Input
  • TAGS 
  • 文件

正则表达式是被用来匹配字符串中的字符组合的模式。在JavaScript中,正则表达式也是对象。这种模式可以被用作 exectest methods ofRegExp, and with the matchreplacesearch, and split methods of String. This chapter describes JavaScript regular expressions.        RegExp的exec和test方法以及String的replace, search和split方法。

Creating a Regular Expression

通过下面两张方法你可以创建一个正则表达式:

  • 使用一个正则表达式字面量,正如下面一样:
    var re = /ab+c/;

    正则表达式字面量实现了当脚本执行的时候的编译。当你的正则表达式是常量的时候,使用这种方式可以获得更好的性能。

  • 调用RegExp对象的构造函数,如下所示:
    var re = new RegExp("ab+c");

    使用构造函数,提供了对正则表达式运行时的编译。当你知道正则表达式的模式会发生改变, 或者你事先并不了解它的模式和从其他地方得到的代码,比如用户的输入,这时比较适合用构造函数的方式。

编写一个正则表达式的模式

一个正则表达式模式是由简单的字符所构成的,比如/abc/, 或者是简单和特殊字符的组合,比如 /ab*c/ or /Chapter (\d+)\.\d*/. 最后一个例子用到了括号,它在正则表达式中可以被用做是一个记忆设备。这一部分正则所匹配的字符将会被记住,在后面可以被利用。正如 Using Parenthesized Substring Matches

使用简单的模式

简单的模式是有你找到的直接匹配所构成的。比如,/abc/这个模式就匹配了在一个字符串中,仅仅字符'abc'同时出现并按照这个顺序。在Hi, do you know your abc's?" 和 "The latest airplane designs evolved from slabcraft."就会匹配成功。在上面的两个实例中,匹配的是子字符串‘abc’。在字符串"Grab crab"中将不会被匹配,因为它不包含任何的‘abc’子字符串。

使用特殊字符

选择一个匹配需要比直接匹配需要跟多的条件的时候,比如寻找一个或多个b's,或则寻找空格,那么这时模式将要包含特殊字符。比如, /ab*c/ matches any character combination in which a single 'a' is followed by zero or more 'b's (* means 0 or more occurrences of the preceding item) and then immediately followed by 'c'. In the string "cbbabbbbcdebc," the pattern matches the substring 'abbbbc'./ab*c/模式匹配了一个单独的‘a’后面跟了零个或则多个b(*的意思是前面一项出现了零个或者多个),且后面跟着‘c’的任何字符组合。在字符串“cbbabbbbcdebc,”中,这个模式匹配了子字符串'abbbbc'。

The following table provides a complete list and description of the special characters that can be used in regular expressions.

下面的表格列出了一个我们在正则表达式中可以利用的特殊字符的完整列表和描述。

Table 4.1 Special characters in regular expressions.CharacterMeaning\

Either of the following:

下面之一

  • For characters that are usually treated literally, indicates that the next character is special and not to be interpreted literally.
  • For example, /b/ matches the character 'b'. By placing a backslash in front of b, that is by using /\b/, the character becomes special to mean match a word boundary.
  • 对于字符来说,它们常常被当做字面量处理,暗示着接着的字符是特殊的和不能够被直接解析的。
  • 比如,/b/匹配了字符'b'.通过在b的前面放一个反斜杠,即用作/\b/,这个字符变成了一个特殊意义的字符,意思是匹配一个字符边界。
  • For characters that are usually treated specially, indicates that the next character is not special and should be interpreted literally.对于常常被当做的特殊字符是,暗示接着的字符是特殊的应该被直接的解析。
  • For example, * is a special character that means 0 or more occurrences of the preceding item should be matched; for example, /a*/ means match 0 or more a's. To match * literally, precede it with a backslash; for example, /a\*/ matches 'a*'.
  • 比如,* 是一个代表着前一项0次或多次发生时将会被匹配的特殊字符;比如,/a*/代表会匹配0个或者多个a。为了匹配*号直接量,在它的前面加一个反斜杠;比如,/a\*/匹配'a*'
  • Also do not forget to escape \ itself while using the new RegExp("pattern") notation since \ is also an escape character in strings.
  • 当使用new RegExp("pattern")方法的时候不要忘记将\它自己进行转义,因为\在字符串里面也是一个转义字符。
^

Matches beginning of input. If the multiline flag is set to true, also matches immediately after a line break character.

For example, /^A/ does not match the 'A' in "an A", but does match the 'A' in "An E".


This character has a different meaning when it appears as the first character in a character set pattern.

For example, /[^a-z\s]/ matches the '3' in "my 3 sisters".

匹配输入的开始。如果多行标示被设置为true,同时匹配换行后紧跟的字符。

比如,/^A/并不会匹配“an A”中的‘A’,但是会匹配“An E”中的‘A’。

当这个字符出现在一个字符集合模式的第一个字符的时候,它将会有不同的意义。

比如,/[^a-z\s]/会匹配“my 3 sisters”中的‘3’

$

Matches end of input. If the multiline flag is set to true, also matches immediately before a line break character.

For example, /t$/ does not match the 't' in "eater", but does match it in "eat".

匹配输入的结束,如果多行标示被设置为true,同时会匹配换行前紧跟的字符。

比如,/t$/并不会匹配“eater”中的‘t’,但是会匹配“eat”中的。

*

Matches the preceding character 0 or more times.

For example, /bo*/ matches 'boooo' in "A ghost booooed" and 'b' in "A bird warbled", but nothing in "A goat grunted".

匹配前一个字符0次或者是多次。

比如,/bo*/会匹配“A ghost boooooed”中的'boooo'和‘A bird warbled’中的‘b’,但是在“A goat grunted”中将不会匹配任何东西。

+

Matches the preceding character 1 or more times. Equivalent to {1,}.

匹配前面一个字符1次或者多次,和{1,}有相同的效果。

For example, /a+/ matches the 'a' in "candy" and all the a's in "caaaaaaandy".

比如,/a+/匹配了在“candy”中的a,和在"caaaaaaandy"中所有的a。

?

Matches the preceding character 0 or 1 time. Equivalent to {0,1}.

For example, /e?le?/ matches the 'el' in "angel" and the 'le' in "angle" and also the 'l' in "oslo".

If used immediately after any of the quantifiers *+?, or {}, makes the quantifier non-greedy (matching the minimum number of times), as opposed to the default, which is greedy (matching the maximum number of times). For example, using /\d+/ non-global match "123abc" return "123", if using /\d+?/, only "1" will be matched.

Also used in lookahead assertions, described under x(?=y) and x(?!y) in this table.

匹配前面一个字符0次或者1次,和{0,1}有相同的效果。

比如,/e?le?/匹配“angel”中的‘el’,和"angle"中的‘le’以及“oslo”中的'l'。

如果'?'紧跟在在任何量词*, + , ?,或者是{}的后面,将会事量词变成非贪婪模式(匹配最少的次数),和默认的贪婪模式(匹配最多的次数)正好相反。比如,使用/\d+/非全局的匹配“123abc”将会返回“123”,如果使用/\d+?/,那么久只会匹配到“1”。

同时运用在向前断言,在本表的x(?=y)和x(?!y)中有描述。

.

(The decimal point) matches any single character except the newline character.

(小数点)匹配任何除了新一行字符的任何单个字符。

For example, /.n/ matches 'an' and 'on' in "nay, an apple is on the tree", but not 'nay'.

比如,/.n/将会匹配‘nay, an apple is on the tree’中的‘an’和‘on’,但是不会匹配'nay'。

(x)

Matches 'x' and remembers the match. These are called capturing parentheses.

For example, /(foo)/ matches and remembers 'foo' in "foo bar." The matched substring can be recalled from the resulting array's elements [1], ...,[n].

匹配‘x’并且记住匹配项。这个被叫做捕获括号。

比如,/(foo)/匹配和记住了“foo bar”中的'foo'。匹配到子字符串可以通过结果数组的[1],...,[n]元素进行访问。

(?:x)

Matches 'x' but does not remember the match. These are called non-capturing parentheses. The matched substring can not be recalled from the resulting array's elements [1], ..., [n].

匹配'x'但是不记住匹配项。这种被叫做非捕获括号。匹配到的子字符串不能通过结果数组的[1],...,[n]进行访问。

x(?=y)

Matches 'x' only if 'x' is followed by 'y'. This is called a lookahead.

For example, /Jack(?=Sprat)/ matches 'Jack' only if it is followed by 'Sprat'. /Jack(?=Sprat|Frost)/ matches 'Jack' only if it is followed by 'Sprat' or 'Frost'. However, neither 'Sprat' nor 'Frost' is part of the match results.

匹配'x'仅仅当'x'后面跟着'y'.这种叫做向后查询。

比如,/Jack(?=Sprat)/会匹配到'Jack'仅仅当它后面跟着'Sprat'。/Jack(?=Sprat|Frost)/匹配‘Jack’仅仅当它后面跟着'Sprat'或者是‘Frost’。但是‘Sprat’和‘Frost’都不是匹配结果的一部分。

x(?!y)

Matches 'x' only if 'x' is not followed by 'y'. This is called a negated lookahead.

For example, /\d+(?!\.)/ matches a number only if it is not followed by a decimal point. The regular expression /\d+(?!\.)/.exec("3.141")matches '141' but not '3.141'.

匹配'x'仅仅当'x'后面不跟着'y',这个被叫反向向前查找。

比如,/\d+(?!\.)/匹配一个数字仅仅当这个数字后面没有跟小数点的时候。正则表达式/\d+(?!\.)/.exec("3.141")匹配‘141’但是不是‘3.141’

x|y

Matches either 'x' or 'y'.

For example, /green|red/ matches 'green' in "green apple" and 'red' in "red apple."

匹配‘x’或者‘y’。

比如,/green|red/匹配“green apple”中的‘greem’和“red apple”中的‘red’

{n}

Where n is a positive integer. Matches exactly n occurrences of the preceding character.

For example, /a{2}/ doesn't match the 'a' in "candy," but it matches all of the a's in "caandy," and the first two a's in "caaandy."

n是一个正数,匹配了前面一个字符刚好发生了n次。

比如,/a{2}/不会匹配“candy”中的'a',但是会匹配“caandy”中所有的a,和“caaandy”中的前两个'a'。

{n,m}

Where n and m are positive integers. Matches at least n and at most m occurrences of the preceding character. When either n or m is zero, it can be omitted.

For example, /a{1,3}/ matches nothing in "cndy", the 'a' in "candy," the first two a's in "caandy," and the first three a's in "caaaaaaandy" Notice that when matching "caaaaaaandy", the match is "aaa", even though the original string had more a's in it.

[xyz]

A character set. Matches any one of the enclosed characters. You can specify a range of characters by using a hyphen. Special characters (such as the dot (.) and the asterisk (*)) do not have any special meaning inside a character set. They need not be escaped. Escape sequences also work.

For example, [abcd] is the same as [a-d]. They match the 'b' in "brisket" and the 'c' in "city". /[a-z.]+/ and /[\w.]+/ both match everything in "test.i.ng".

[^xyz]

A negated or complemented character set. That is, it matches anything that is not enclosed in the brackets. You can specify a range of characters by using a hyphen. Everything that works in the normal character set also works here.

For example, [^abc] is the same as [^a-c]. They initially match 'r' in "brisket" and 'h' in "chop."

[\b]Matches a backspace (U+0008). (Not to be confused with \ b.)\b

Matches a word boundary. A word boundary matches the position where a word character is not followed or preceeded by another word-character. Note that a matched word boundary is not included in the match. In other words, the length of a matched word boundary is zero. (Not to be confused with [\b].)

Examples:
/\bm/ matches the 'm' in "moon" ;
/oo\b/ does not match the 'oo' in "moon", because 'oo' is followed by 'n' which is a word character;
/oon\b/ matches the 'oon' in "moon", because 'oon' is the end of the string, thus not followed by a word character;
/\w\b\w/ will never match anything, because a word character can never be followed by both a non-word and a word character.

\B

Matches a non-word boundary. This matches a position where the previous and next character are of the same type: Either both must be words, or both must be non-words. The beginning and end of a string are considered non-words.

For example, /\B../ matches 'oo' in "noonday" (, and /y\B./ matches 'ye' in "possibly yesterday."

\cX

Where X is a character ranging from A to Z. Matches a control character in a string.

For example, /\cM/ matches control-M (U+000D) in a string.

\d

Matches a digit character. Equivalent to [0-9].

For example, /\d/ or /[0-9]/ matches '2' in "B2 is the suite number."

\D

Matches any non-digit character. Equivalent to [^0-9].

For example, /\D/ or /[^0-9]/ matches 'B' in "B2 is the suite number."

\fMatches a form feed (U+000C).\nMatches a line feed (U+000A).\rMatches a carriage return (U+000D).\s

Matches a single white space character, including space, tab, form feed, line feed. Equivalent to [ \f\n\r\t\v​\u00A0\u1680​\u180e\u2000​\u2001\u2002​\u2003\u2004​\u2005\u2006​\u2007\u2008​\u2009\u200a​\u2028\u2029​\u2028\u2029​\u202f\u205f​\u3000].

For example, /\s\w*/ matches ' bar' in "foo bar."

\S

Matches a single character other than white space. Equivalent to [^ \f\n\r\t\v​\u00A0\u1680​\u180e\u2000​\u2001\u2002​\u2003\u2004​\u2005\u2006​\u2007\u2008​\u2009\u200a​\u2028\u2029​\u2028\u2029​\u202f\u205f​\u3000].

For example, /\S\w*/ matches 'foo' in "foo bar."

\tMatches a tab (U+0009).\vMatches a vertical tab (U+000B).\w

Matches any alphanumeric character including the underscore. Equivalent to [A-Za-z0-9_].

For example, /\w/ matches 'a' in "apple," '5' in "$5.28," and '3' in "3D."

\W

Matches any non-word character. Equivalent to [^A-Za-z0-9_].

For example, /\W/ or /[^A-Za-z0-9_]/ matches '%' in "50%."

\n

Where n is a positive integer. A back reference to the last substring matching the n parenthetical in the regular expression (counting left parentheses).

For example, /apple(,)\sorange\1/ matches 'apple, orange,' in "apple, orange, cherry, peach."

\0Matches a NULL (U+0000) character. Do not follow this with another digit, because \0<digits> is an octal escape sequence.\xhhMatches the character with the code hh (two hexadecimal digits)\uhhhhMatches the character with the code hhhh (four hexadecimal digits).
原创粉丝点击