Regular Expressions patterns

来源：互联网发布：wish for mac 编辑：程序博客网时间：2024/05/22 10:37

The patterns used in RegExp can be very simple, or verycomplicated, depending on what you're trying to accomplish. Tomatch a simple string like "Hello World!" is no harder thenactually writing the string, but if you want to match an e-mailaddress or html tag, you might end up with a very complicatedpattern that will use most of the syntax presented in the tablebelow.

PatternDescriptionEscaping\Escapes special characters to literal and literalcharacters to special.

E.g: /$s$/ matches '(s)' while /(\s)/matches any whitespace and captures the match.Quantifiers{n},{n,},{n,m},*,+, ?Quantifiers match the preceding subpattern a certain number oftimes. The subpattern can be a single character, an escapesequence, a pattern enclosed by parentheses or a characterset.

{n} matches exactly ntimes.
{n,} matches n or moretimes.
{n,m} matchesn to m times.
* is short for {0,}. Matches zero or moretimes.
+ is short for {1,}. Matches one or moretimes.
? is short for {0,1}. Matches zero or onetime.

E.g: /o{1,3}/ matches 'oo' in "tooth" and 'o' in"nose".Pattern delimiters(pattern),(?:pattern)Matches entire contained pattern.

(pattern) captures match.
(?:pattern) doesn't capturematch

E.g: /(d).\1/ matches and captures 'dad' in "abcdadef"while /(?:.d){2}/ matches but doesn't capture'cdad'.

Note: (?:pattern) isa JavaScript 1.5 feature.Lookaheads(?=pattern),(?!pattern)A lookahead matches only if the preceding subexpression isfollowed by the pattern, but the pattern is not part of the match.The subexpression is the part of the regular expression which willbe matched.

(?=pattern) matches only if there isa following pattern in input.
(?!pattern) matches only if there isnot a followingpattern in input.

E.g: /Win(?=98)/ matches 'Win' only if 'Win' isfollowed by '98'.

Note: Lookahead is a JavaScript1.5 feature.Alternation|Alternation matches content on either side of the alternationcharacter.

E.g: /(a|b)a/ matches 'aa' in "dseaas" and 'ba' in"acbab".Character sets[characters],[^characters]Matches any of the contained characters. A range of charactersmay be defined by using a hyphen.

[characters] matches any of thecontained characters.
[^characters] negates the characterset and matches all but the containedcharacters

E.g: /[abcd]/ matches any of the characters 'a', 'b','c', 'd' and may be abbreviated to/[a-d]/. Rangesmust be in ascending order, otherwise they will throw an error.(E.g:/[d-a]/ will throw an error.)
/[^0-9]/ matches all characters but digits.

Note: Most special characters are automaticallyescaped to their literal meaning in character sets.Special characters^, $,., ? and all the highlighted charactersabove in the table.Special characters are characters that match something elsethan what they appear as.

^ matches beginning of input (or new line withm flag).
$ matches end of input (or end of line with mflag).
. matches any character except a newline.
? directly following a quantifier makes the quantifiernon-greedy (makes it match minimum instead of maximum of theinterval defined).

E.g: /(.)*?/ matches nothing or '' in allstrings.

Note: Non-greedy matches are not supported inolder browsers such as Netscape Navigator 4 or Microsoft InternetExplorer 5.0.Literal charactersAll characters except those withspecial meaning.Mapped directly to the corresponding character.

E.g: /a/ matches 'a' in "Any ancestor".Backreferences\nBackreferences are references to the same thing as a previouslycaptured match.n is a positive nonzero integer tellingthe browser which captured match to reference to.

/(\S)\1(\1)+/g matches all occurrences of three equalnon-whitespace characters following each other.
/<(\S+).*>(.*)<\/\1>/matches any tag.

E.g:/<(\S+).*>(.*)<\/\1>/matches '<divid="me">text</div>' in"text<divid=\"me\">text</div>text".Character Escapes\f, \r,\n, \t, \v, \0,[\b], \s, \S,\w, \W, \d, \D,\b,\B,\cX,\xhh,\uhhhh\f matches form-feed.
\r matches carriage return.
\n matches linefeed.
\t matches horizontal tab.
\v matches vertical tab.
\0 matches NUL character.
[\b] matches backspace.
\s matches whitespace (short for[\f\n\r\t\v\u00A0\u2028\u2029]).
\S matches anything but a whitespace (short for[^\f\n\r\t\v\u00A0\u2028\u2029]).
\w matches any alphanumerical character (wordcharacters) including underscore (short for[a-zA-Z0-9_]).
\W matches any non-word characters (short for[^a-zA-Z0-9_]).
\d matches any digit (short for[0-9]).
\D matches any non-digit (short for[^0-9]).
\b matches a word boundary (the position between aword and a space).
\B matches a non-word boundary (short for[^\b]).
\cX matches a control character. E.g:\cm matches control-M.
\xhh matches the character with twocharacters of hexadecimal codehh.
\uhhhh matches the Unicode characterwith four characters of hexadecimal codehhhh.