VC正则表达式的使用

来源:互联网 发布:网络女歌手名单大全 编辑:程序博客网 时间:2024/06/06 03:22

 

VC正则表达式的使用

2010911日星期六  邵盛松

正则表达式是一种对字符进行模糊匹配的一个公式。在数据有效性验证,查找,替换文本中都可以使用正则表达式

本篇文章主要描述的是使用ATL中两个模板类CAtlRegExpCAtlREMatchContext

 

在使用CAtlRegExp类之前需要添加#include <atlrx.h> 这个头文件。

RegExpRegular Expression的缩写

 

以匹配邮件地址字符串为例说明两个类的使用

该示例更改自http://msdn.microsoft.com/en-us/library/k3zs4axe(VS.80).aspx

          CString strRegex=L"({[0-9_]+@[a-zA-Z0-9]+[.][a-zA-Z0-9]+[.]?[a-zA-Z0-9]+})";

     CString strInput;

     strInput=L"admin@domain.com";

 

     CAtlRegExp<CAtlRECharTraitsW> reRule;

     wchar_t *wt = (wchar_t *)(LPCTSTR)strRegex;

     REParseError status = reRule.Parse((const ATL::CAtlRegExp<CAtlRECharTraitsW>::RECHAR *)wt);

 

     if (REPARSE_ERROR_OK != status)

     {

         return 0;

     }

     CAtlREMatchContext<CAtlRECharTraitsW> mcRule;

     wt = (wchar_t *)(LPCTSTR)strInput;

     if (!reRule.Match((const ATL::CAtlRegExp<CAtlRECharTraitsW>::RECHAR *)wt,&mcRule))

     {

         AfxMessageBox(L"您输入的邮件地址不合法!");

     }

     else

     {

         for (UINT nGroupIndex = 0; nGroupIndex < mcRule.m_uNumGroups; ++nGroupIndex)

         {

              const CAtlREMatchContext<>::RECHAR* szStart = 0;

              const CAtlREMatchContext<>::RECHAR* szEnd = 0;

              mcRule.GetMatch(nGroupIndex, &szStart, &szEnd);

              ptrdiff_t nLength = szEnd - szStart;

              CString strEmailAddress(szStart,  static_cast<int>(nLength));

 

              if(strEmailAddress.Compare(strInput)!=0)

              {

              CString strPrompt;

              strPrompt.Format(L"您输入的邮件地址不合法,您要输入%s 吗!",strEmailAddress);

              AfxMessageBox(strPrompt);

              }

              else

              {

                   AfxMessageBox(L"输入的邮件地址正确!");

              }

 

         }

     }

这两个模板类由另一个描述字符集特性的类参数化,可以是ASCIIWCHAR 或多字节。

可以将此忽略掉,因为根据设置的字符集,模板类自动生成具体的类。

 

atlrx.h文件中供选择的有三个类

CAtlRECharTraitsA 用于ASCII

CAtlRECharTraitsW 用于UNICODE

CAtlRECharTraitsMB 用于多字节

 

 

VC2005默认的字符集是使用Unicode字符集

根据正则的源码

#ifndef _UNICODE

typedef CAtlRECharTraitsA  CAtlRECharTraits;

#else         // _UNICODE

typedef CAtlRECharTraitsW  CAtlRECharTraits;

#endif // !_UNICODE

 

所以构造CAtlRegExp类可以是

CAtlRegExp<> reRule;

REParseError status = reRule.Parse((const ATL::CAtlRegExp<CAtlRECharTraitsW>::RECHAR *)wt);

也可以是

 

CAtlRegExp<CAtlRECharTraitsW> reRule;

REParseError status = reRule.Parse((const ATL::CAtlRegExp<CAtlRECharTraitsW>::RECHAR *)wt);

 

通过调用CAtlRegExpParse()方法,使用正则表达式字符串作为参数,就可以构造出一个我们所需要的类。

 

 

 

调用CATLRegExpMatch()函数

Match()函数参数说明

第一个参数是要对比的字符串,

第二个参数是存储match的结果

 

CAtlREMatchContext的成员变量m_uNumGroups表示匹配的Group

CAtlREMatchContextGetMatch()函数返回匹配上的字符串的pStartpEnd指针

以下从MSDN摘录的正则表达语法

原文是http://msdn.microsoft.com/en-us/library/k3zs4axe(VS.80).aspx

Regular Expression Syntax

This table lists the metacharacters understood by CAtlRegExp.

Metacharacter

Meaning

.

Matches any single character.

[ ]

Indicates a character class. Matches any character inside the brackets (for example, [abc] matches "a", "b", and "c").

^

If this metacharacter occurs at the start of a character class, it negates the character class. A negated character class matches any character except those inside the brackets (for example, [^abc] matches all characters except "a", "b", and "c").

If ^ is at the beginning of the regular expression, it matches the beginning of the input (for example, ^[abc] will only match input that begins with "a", "b", or "c").

-

In a character class, indicates a range of characters (for example, [0-9] matches any of the digits "0" through "9").

?

Indicates that the preceding expression is optional: it matches once or not at all (for example, [0-9][0-9]? matches "2" and "12").

+

Indicates that the preceding expression matches one or more times (for example, [0-9]+ matches "1", "13", "456", and so on).

*

Indicates that the preceding expression matches zero or more times.

??, +?, *?

Non-greedy versions of ?, +, and *. These match as little as possible, unlike the greedy versions that match as much as possible (for example, given the input "<abc><def>", <.*?> matches "<abc>" while <.*> matches "<abc><def>").

( )

Grouping operator. Example: (/d+,)*/d+ matches a list of numbers separated by commas (for example, "1" or "1,23,456").

{ }

Indicates a match group. The actual text in the input that matches the expression inside the braces can be retrieved through the CAtlREMatchContext object.

/

Escape character: interpret the next character literally (for example, [0-9]+ matches one or more digits, but [0-9]/+ matches a digit followed by a plus character). Also used for abbreviations (such as /a for any alphanumeric character; see the following table).

If / is followed by a number n, it matches the nth match group (starting from 0). Example: <{.*?}>.*?<//0> matches "<head>Contents</head>".

Note that, in C++ string literals, two backslashes must be used: "//+", "//a", "<{.*?}>.*?<///0>".

$

At the end of a regular expression, this character matches the end of the input (for example,[0-9]$ matches a digit at the end of the input).

|

Alternation operator: separates two expressions, exactly one of which matches (for example, T|the matches "The" or "the").

!

Negation operator: the expression following ! does not match the input (for example, a!b matches "a" not followed by "b").

Abbreviations

CAtlRegExp can handle abbreviations, such as /d instead of [0-9]. The abbreviations are provided by the character traits class passed in the CharTraits parameter. The predefined character traits classes provide the following abbreviations.

Abbreviation

Matches

/a

Any alphanumeric character: ([a-zA-Z0-9])

/b

White space (blank): ([ //t])

/c

Any alphabetic character: ([a-zA-Z])

/d

Any decimal digit: ([0-9])

/h

Any hexadecimal digit: ([0-9a-fA-F])

/n

Newline: (/r|(/r?/n))

/q

A quoted string: (/"[^/"]*/")|(/'[^/']*/')

/w

A simple word: ([a-zA-Z]+)

/z

An integer: ([0-9]+)

 

关于语法翻译可参考http://www.vckbase.com/document/viewdoc/?id=1138

摘录

字符元

意义

.

匹配单个字符

[ ]

指定一个字符类,匹配方括号内的任意字符。例:[abc] 匹配 "a", "b" "c"

^

如果^出现在字符类的开始处,它否定了字符类,这个被否定的字符类匹配除却方括号内的字符的字符。如:[^abc]匹配除了"a", "b""c"之外的字符。如果^出现在正则表达式前边,它匹配输入的开头,例:^[abc]匹配以"a", "b""c"开头的输入。

-

在字符类中,指定一个字符的范围。例如:[0-9]匹配"0""9"的数字。

?

指明?前的表达式是可选的,它可以匹配一次或不进行匹配。例如: [0-9][0-9]? 匹配"2""12"

+

指明?前的表达式匹配一次或多次。例如:[0-9]+匹配"1", "13", "666"等。

*

指明*前的表达式匹配零次或多次。

??, +?, *?

?, +*的非贪婪匹配版本,它们尽可能匹配较少的字符;而?, +*则是贪婪版本,尽可能匹配较多的字符。例如:输入"<abc><def>", <.*?> 匹配"<abc>",而<.*>匹配"<abc><def>"

( )

 分组操作符。例如:(/d+,)*/d+匹配一串由逗号分开的数字,例如: "1""1,23,456"

/

转义字符,转义紧跟的字符。例如,[0-9]+ 匹配一个或多个数字,而 [0-9]/+ 匹配一个数字后跟随一个加号的情况。反斜杠/也用于表示缩写,/a 就表示任何数字、字母。如果/后紧跟一个数字n,则它匹配第n个匹配群组(0开始),例如,<{.*?}>.*?<//0>匹配"<head>Contents</head>"。注意,在C++字符串中,反斜杠/需要用双反斜杠//来表示: "//+", "//a", "<{.*?}>.*?<///0>"

$

放在正则表达式的最后,它匹配输入的末端。例如:[0-9]$匹配输入的最后一个数字。

|

间隔符,分隔两个表达式,以正确匹配其中一个,例如:T|the匹配"The" "the"

 

缩写匹配

缩写

匹配

/a

字母、数字([a-zA-Z0-9])

/b

空格(blank): ([ //t])

/c

字母([a-zA-Z])

/d

十进制数 ([0-9])

/h

十六进制数([0-9a-fA-F])

/n

换行: (/r|(/r?/n))

/q

引用字符串(/"[^/"]*/")|(/''''[^/'''']*/'''')

/w

一段文字 ([a-zA-Z]+)

/z

一个整数([0-9]+)

 

 

以上程序在VC++2005 unicode字符集下调试通过

本文完