Regular Expression_1: Pattern

来源：互联网发布：网络诈骗举报编辑：程序博客网时间：2024/06/03 23:00

Regular Expression

Is it difficult? I think it is ture before I use it. :)

First thing, we must see that Regular Expression syntax definition is not unique, even I can say lots of definition here.

So if you study RE from some reference and use it in another environment, you will get nothing.

Fortunately, they are same concept.

Here I introduced base on javascript regular expression.

[Concepts]
* Regular Expression

From http://en.wikipedia.org/wiki/Regular_expression

In computing, a regular expression (abbreviated as regexp or regex, with plural forms regexps, regexes, or regexen) is a string that describes or matches a set of strings, according to certain syntax rules. Regular expressions are used by many text editors and utilities to search and manipulate bodies of text based on certain patterns. Many programming languages support regular expressions for string manipulation. For example, Perl and Tcl have a powerful regular expression engine built directly into their syntax. The set of utilities (including the editor ed and the filter grep) provided by Unix distributions were the first to popularize the concept of regular expressions.

Many modern computing systems provide wildcard characters in matching filenames from a file system. This is a core capability of many command-line shells and is known as globbing. Wildcards differ from regular expressions in that they can only express very restrictive forms of alternation.

[Sample 1:]

Code:

this.form1.txtRE.value = "^(//S*)@(//w+(?://.//w+)+)$"; this.form1.txtText.value = "youraccount@company.com";

function btnMatch_onclick() {
var sRegE = this.form1.txtRE.value;
var sText = this.form1.txtText.value;

try
{
var match = sText.match(sRegE);
if (null == match)
{
alert("no found.");
}
else
{
var sMsg = "match text:/n"
sMsg += sText.substring(match.index, match.lastIndex);

sMsg += "/nmatch captures:/n";
for (var i = 0; i < match.length; i++)
{
sMsg += "[" + i + "] " + match[i] + "/n";
}

alert(sMsg);
}
}
catch ( e )
{
alert("error!");
}
}

You will get result:

---------------------------
Windows Internet Explorer
---------------------------
match text:
youraccount@company.com
match captures:
[0] youraccount@company.com
[1] youraccount
[2] company.com

---------------------------
OK
---------------------------
Explain:

1.1 The code does three things: validate input; match it; and parse it to account name and site name.

1.2 We explain parse functions, if match is OK,

match[0] is whole matched text.

match[1] is text in first pair brackets.

match[2] is text in second pair brackets.

[3],[4]... if there are third, forth pair brackets.

Ok, we look the RE again,

"^(//S*)@(//w+(?://.//w+)+)$"

There are three bracket, in RE call it as Pattern or subExpresion,

why lost one?

reason is the third one pattern has '?:', it hints only match don't capture.
there are 4 kind pattern syntax:

(pattern): match and capture

(?:pattern): match and don't capture

(?=pattern): lookahead, match and don't capture, if equal, match.

lookahead do not comsume characters, for example:

windows(?95|98|NT|2000), matches "windows" in "windows2000", after match, match's last index is point to "s".

(?!pattern): lookahead, match and don't capture, if not equal, match.