C++中Regex

来源:互联网 发布:火车头淘宝采集规则 编辑:程序博客网 时间:2024/05/16 12:05

在windows下使用C/C++中的正则表达式时,c regex.h 和boost regex不支持

charactersdescriptionmatches.not newlineany character except line terminators (LF, CR, LS, PS).\ttab (HT)a horizontal tab character (same as \u0009).\nnewline (LF)a newline (line feed) character (same as \u000A).\vvertical tab (VT)a vertical tab character (same as \u000B).\fform feed (FF)a form feed character (same as \u000C).\rcarriage return (CR)a carriage return character (same as \u000D).\clettercontrol codea control code character whose code unit value is the same as the remainder of dividing the code unit value of letter by 32.
For example: \ca is the same as \u0001\cb the same as \u0002, and so on...\xhhASCII charactera character whose code unit value has an hex value equivalent to the two hex digits hh.
For example: \x4c is the same as L, or \x23 the same as #.\uhhhhunicode charactera character whose code unit value has an hex value equivalent to the four hex digitshhhh.\0nulla null character (same as \u0000).\intbackreferencethe result of the submatch whose opening parenthesis is the int-th (int shall begin by a digit other than 0). See groups below for more info.\ddigita decimal digit character \Dnot digitany character that is not a decimal digit character\swhitespacea whitespace character \Snot whitespaceany character that is not a whitespace character\wwordan alphanumeric or underscore character \Wnot wordany character that is not an alphanumeric or underscore character\charactercharacterthe character character as it is, without interpreting its special meaning within a regex expression.
Any character can be escaped except those which form any of the special character sequences above.
Needed for: ^ $ \ . * + ? ( ) [ ] { } |[class]character classthe target character is part of the class [^class]negated character classthe target character is not part of the class 

这里的'\' 会被转义

而不知为何 在使用反斜杠转义其他字符的时候 要先转义反斜杠字符

如:


regex rx("^https?://[a-z]+\\.baidu\\.com/?.*");
这个正则表达式与下列字符串匹配

string st("https://tieba.baidu.com/dsadw");


而使用 
regex_match(st.begin(), st.end(), rx);
时,如果完全匹配正则表达式则返回True否则返回False


regex_search(st.c_str(),res,rx);

则会将st中所有与rx匹配的片段保存到res中返回。如果片段>=1则返回true否则返回false

提供了一个cmatch类

typedef match_results<const char*> cmatch
cmatch res;
这样可以定义res。


提供了一个

regex_replace(st, rx, std::string(""));
用string("")(字面值)来替换上面search到的片段。


总结:使用起来有点生疏,反斜杠先转义反斜杠再转义其他字符要记住。



0 0
原创粉丝点击