Understanding Marked Sub-Expressions and Captures
来源:互联网 发布:linux cdn搭建 编辑:程序博客网 时间:2024/05/21 10:43
原文:http://www.boost.org/doc/libs/1_64_0/libs/regex/doc/html/boost_regex/captures.html
Captures其实是指一系列符合某个子表达式(Sub-Expressions)的匹配结果,因为每一个子表达式可能会匹配多个结果。
Sub-expression Text found $`"@"$&"abc def"$1"abc"$2"def"$'"--"
在Boost.Regex中,当调用匹配相关算法(regex_search,regex_match或者regex_iterator)时,就会将以上的这些结果保存到match_results的对象中,比如
Perl Boost.Regex $`m.prefex()$&m[0]$nm[n]$'m.suffix()
在Boost.Regex,每一次匹配结果都被表示为一个sub_match对象,其实就是一系列迭代器,用来存储匹配内容的位置,但是这个sub_match对象上有一些重载方法让这个对象操作起来更像std::basic_string对象,比如比较字符串、字符串添加、输入输出流等。
Captures其实是指一系列符合某个子表达式(Sub-Expressions)的匹配结果,因为每一个子表达式可能会匹配多个结果。
Marked sub-expressions
在Perl正则表示中使用括号括起来的部分就叫做子表达式Sub-Expression
(\w+)\W+(\w+)在这句表达式中,有两个子表达式称为$1、$2,并且整个匹配表示称为$&,所有在匹配前的内容称为$`,所有在匹配后的内容称为$'。所以上面的表示去匹配"@abc def--",就会获得如下结果:
在Boost.Regex中,当调用匹配相关算法(regex_search,regex_match或者regex_iterator)时,就会将以上的这些结果保存到match_results的对象中,比如
boost::match_results<IteratorType> m;那么在Perl和Boost.Regex中就相当于
在Boost.Regex,每一次匹配结果都被表示为一个sub_match对象,其实就是一系列迭代器,用来存储匹配内容的位置,但是这个sub_match对象上有一些重载方法让这个对象操作起来更像std::basic_string对象,比如比较字符串、字符串添加、输入输出流等。
Unmatched Sub-Expressions
当一个正则表达式匹配完成,且并非所有的子表达式都被匹配完成,如(abc)|(def)是可以匹配$1或$2两者之一,但却无法同时匹配两者。在Boost.Regex,可以通过成员对象sub_match::matched来确定哪一个子表达式被匹配了。
Repeated Captures
当某个表达式被多次匹配,通常只有最后一次匹配的内容被保留,如(?:(\w+)\W+)+来匹配
one fine day那么$1只会包含day,所有之前的匹配会被forgotten掉。
然而Boost.Regex目前有一种实验性的特性,可以获得所有匹配结果,就是通过match_results::captures和sub_match::captures这两个方法,下面是演示程序:
#include <boost/regex.hpp>#include <iostream>void print_captures(const std::string& regx, const std::string& text){ boost::regex e(regx); boost::smatch what; std::cout << "Expression: \"" << regx << "\"\n"; std::cout << "Text: \"" << text << "\"\n"; if(boost::regex_match(text, what, e, boost::match_extra)) { unsigned i, j; std::cout << "** Match found **\n Sub-Expressions:\n"; for(i = 0; i < what.size(); ++i) std::cout << " $" << i << " = \"" << what[i] << "\"\n"; std::cout << " Captures:\n"; for(i = 0; i < what.size(); ++i) { std::cout << " $" << i << " = {"; for(j = 0; j < what.captures(i).size(); ++j) { if(j) std::cout << ", "; else std::cout << " "; std::cout << "\"" << what.captures(i)[j] << "\""; } std::cout << " }\n"; } } else { std::cout << "** No Match found **\n"; }}int main(int , char* []){ print_captures("(([[:lower:]]+)|([[:upper:]]+))+", "aBBcccDDDDDeeeeeeee"); print_captures("(.*)bar|(.*)bah", "abcbar"); print_captures("(.*)bar|(.*)bah", "abcbah"); print_captures("^(?:(\\w+)|(?>\\W+))*$", "now is the time for all good men to come to the aid of the party"); return 0;}会收到如下输出:
Expression: "(([[:lower:]]+)|([[:upper:]]+))+"Text: "aBBcccDDDDDeeeeeeee"** Match found ** Sub-Expressions: $0 = "aBBcccDDDDDeeeeeeee" $1 = "eeeeeeee" $2 = "eeeeeeee" $3 = "DDDDD" Captures: $0 = { "aBBcccDDDDDeeeeeeee" } $1 = { "a", "BB", "ccc", "DDDDD", "eeeeeeee" } $2 = { "a", "ccc", "eeeeeeee" } $3 = { "BB", "DDDDD" }Expression: "(.*)bar|(.*)bah"Text: "abcbar"** Match found ** Sub-Expressions: $0 = "abcbar" $1 = "abc" $2 = "" Captures: $0 = { "abcbar" } $1 = { "abc" } $2 = { }Expression: "(.*)bar|(.*)bah"Text: "abcbah"** Match found ** Sub-Expressions: $0 = "abcbah" $1 = "" $2 = "abc" Captures: $0 = { "abcbah" } $1 = { } $2 = { "abc" }Expression: "^(?:(\w+)|(?>\W+))*$"Text: "now is the time for all good men to come to the aid of the party"** Match found ** Sub-Expressions: $0 = "now is the time for all good men to come to the aid of the party" $1 = "party" Captures: $0 = { "now is the time for all good men to come to the aid of the party" } $1 = { "now", "is", "the", "time", "for", "all", "good", "men", "to", "come", "to", "the", "aid", "of", "the", "party" }这里要注意,使用这种特性是以性能为代价的,只要打开这种特性,即使不使用多匹配结果,也会付出性能代价,开启方式如下:
- 定义BOOST_REGEX_MATCH_EXTRA,建议定义在boost/regex/user.hpp,然后重新编译工程
- 在相关匹配算法函数(regex_search,regex_match,或者regex_iterator)调用时,设置match_extra参数
阅读全文
0 0
- Understanding Marked Sub-Expressions and Captures
- Strings and Regular Expressions
- XPath and Regular Expressions
- Constexpr and Constant Expressions
- Expressions and Statemen
- Expressions and Statemen
- Expressions and Statemen
- about function and sub
- Understanding HBase and BigTable
- Understanding Windows and Screens
- Understanding imports and PYTHONPATH
- Understanding Luminance and Chrominance
- Understanding HBase and BigTable
- understanding process and update
- Understanding RPO and RTO
- Understanding HBase and BigTable
- Understanding HBase and BigTable
- Understanding HBase and BigTable
- CTF实验吧-上传绕过【0x00截断】
- HDU 1281 棋盘游戏 二分匹配
- 逻辑与
- Oil Deposits
- Stream:java1.8新特性
- Understanding Marked Sub-Expressions and Captures
- PX4log文件分析工具
- JUnit4单元测试入门
- 474. Ones and Zeroes
- PHP常用函数
- 【学习笔记】初识JavaScript
- AC自动机1——适用于utf-8编码的Trie树
- Java基础知识之数组的定义,初始化,数组的使用,遍历数组,排序,查找
- Battleships in a Board [Leetcode]