Regular Expressions in C++ with Boost.Regex(3)
来源:互联网 发布:mysql安装不上 编辑:程序博客网 时间:2024/04/28 11:36
Parsing
Not only does regex_match
confirm or deny whether a string satisfies some expression, it also lets you parse your string into pieces. It does this by storing the results in a match_results
object, which is a sequence (in the sense of a standard library sequence) over which you can iterate to examine the results.
Like matching, Boost.Regex lets you search a string for a regular expression in two ways. In the simplest case, you may just want to know if a given string contains a match for your regular expression. Example 3 is a trivial implementation of the grep
program that reads in each line from a file and prints it out if it contains a string that satisfies the regular expression pattern.
Example 2 is a modified version of Example 1. This new version includes a cmatch
object, which is simply a typedef
for match_results<const char*>
. Boost.Regex, like standard library string
s, supports both narrow- and wide-character strings).
#include <string>
#include <boost/regex.hpp>
using namespace std;
int main( ) {
std::string s, sre;
boost::regex re;
boost::cmatch matches;
while(true)
{
cout << "Expression: ";
cin >> sre;
if (sre == "quit")
{
break;
}
cout << "String: ";
cin >> s;
try
{
// Assignment and construction initialize the FSM used
// for regexp parsing
re = sre;
}
catch (boost::regex_error& e)
{
cout << sre << " is not a valid regular expression: ""
<< e.what() << """ << endl;
continue;
}
// if (boost::regex_match(s.begin(), s.end(), re))
if (boost::regex_match(s.c_str(), matches, re))
{
// matches[0] contains the original string. matches[n]
// contains a sub_match object for each matching
// subexpression
for (int i = 1; i < matches.size(); i++)
{
// sub_match::first and sub_match::second are iterators that
// refer to the first and one past the last chars of the
// matching subexpression
string match(matches[i].first, matches[i].second);
cout << "matches[" << i << "] = " << match << endl;
}
}
else
{
cout << "The regexp "" << re << "" does not match "" << s << """ << endl;
}
}
}
Example 2. Parsing a string using subexpressions
In Example 2, matches
is a sequence of sub_match
objects. The sub_match
class has the members first
and second
, which are iterators which refer to the first and one-past-the-last elements in the original string. matches[0]
contains the entire original string, and the sub_match
objects at indexes matches[1...n]
each refer to the substrings n that match the corresponding subexpression in the original expression.
A subexpression is a part of the original regular expression that is contained within parentheses. For example, this regular expression has three subexpressions:
(/d{1,2})//(/d{1,2})//(/d{2}|/d{4})
This particular expression will match a date of the form MM/DD/YY or MM/DD/YYYY (of course, it doesn't validate the semantics of the values, so the month can be greater than 12). How do you grab each of the parts? Figure 1 should give you an idea, it shows the what a match_results
object will look like if you use the expression above and give it the string 11/5/2005
.
Figure 1. The results of a regex_match
After parsing this date, there are four elements in matches
. The element at index zero refers to the entire string, and each of the elements in matches
refers to the elements in the original string that satisfy the corresponding subexpression (this can vary, though). The entire string successfully matches the regular expression, so each of the subexpressions is available via indexes 1-3, respectively, in the match_results
sequence.
Depending on the type of subexpressions you are using, the contents of match_results
may surprise you. Consider the URL example above. This regular expression has four emboldened subexpressions:
(ftp|http|https):////(/w+/.)*(/w*)//([/w/d]+//{0,1})+
Using repeating subexpressions (for example, (/w+/.)*
) means that the subexpression can match any number of times. This, in turn, means that match_results
can contain a different number of values based on the string you try to match. Here's what you will see with a sample run of Example 2 using the URL regular expression I just gave:
You probably noticed right away that the "www." is missing from the results. This is because the repeating subexpression only stores the last subexpression matched. If you want to, for example, grab the full host name out of this URL, you have to add another subexpression, which I have indicated with new bold parentheses below: This will put the entire host name into one of the subexpressions. The order of the corresponding The results are the same as before, except this time you also get the match for the host name subexpression in By using these techniques, and perhaps after some practice experimenting with regular expression syntax, you can use Boost.Regex to validate and parse a wide variety of strings. But these examples only provide a glimpse into the expressive power of regular expressions. If you aren't already familiar with regular expressions, experiment some more--you may be surprised how often they do just what you need.Expression: (ftp|http|https):////(/w+/.)*(/w*)//([/w/d]+//{0,1})+
String: http://www.foo.com/bar
matches[0] = http://www.foo.com/bar
matches[1] = http
matches[2] = foo.
matches[3] = com
matches[4] = bar(ftp|http|https):////((/w+/.)*(/w*))//([/w/d]+//{0,1})+
sub_match
objects in the match_results
sequence is as though the tree of nested subexpressions were traversed depth-first, left to right. Here's the output with this modified regular expression:Expression: (ftp|http|https):////((/w+/.)*(/w*))//([/w/d]+//{0,1})+
String: http://www.foo.com/bar
matches[0] = http://www.foo.com/bar
matches[1] = http
matches[2] = www.foo.com
matches[3] = foo.
matches[4] = com
matches[5] = barmatches[2]
.
- Regular Expressions in C++ with Boost.Regex(3)
- Regular Expressions in C++ with Boost.Regex(1)
- Regular Expressions in C++ with Boost.Regex(2)
- Regular Expressions in C++ with Boost.Regex(4)
- Regular Expressions (3) ---- Using Regex in .NET
- Regular Expressions (3) ---- Using Regex in .NET
- Working with regular expressions in .NET
- Regular Expressions in Python with re module
- Validate With Regular Expressions
- Regular Expressions in Java
- Regular expressions in Perl
- neo4j where with Regular expressions
- Regular Expressions in 10 Minutes
- Optimizing regular expressions in Java
- Regular Expressions in ASP.NET
- Regular Expressions used in JavaScript
- Optimizing regular expressions in Java
- Java Regular Expressions: Taming the java.util.regex Engine
- 嵌入式调试的好方法
- Polygon(二)
- Aix简介
- 成为java高手的八大条件!
- 一种JS实现的广告打开特效
- Regular Expressions in C++ with Boost.Regex(3)
- jsp誓言
- try except and try catch ZZ
- 使用VMnet1或VMnet8上网
- Regular Expressions in C++ with Boost.Regex(4)
- Java实现的常用排序算法及对比
- CCNP路由部分精华(网络管理)
- 给年轻工程师的十大忠告(转) --- 网络文摘
- bios in this system is not fully ACPI compliant 问题的解决