Regular Expressions in C++ with Boost.Regex(4)
来源:互联网 发布:mysql安装不上 编辑:程序博客网 时间:2024/04/27 22:29
Searching
Matching and parsing a single string in its entirety does not address the important and ubiquitous use case of searching a string that contains a substring you want, but possibly a lot of other characters you don't.
Like matching, Boost.Regex lets you search a string for a regular expression in two ways. In the simplest case, you may just want to know if a given string contains a match for your regular expression. Example 3 is a trivial implementation of thegrep
program that reads in each line from a file and prints it out if it contains a string that satisfies the regular expression pattern.#include <string>
#include <boost/regex.hpp>
#include <fstream>
using namespace std;
const int BUFSIZE = 10000;
int main(int argc, char** argv) {
// Safety checks omitted...
boost::regex re(argv[1]);
string file(argv[2]);
char buf[BUFSIZE];
ifstream in(file.c_str());
while (!in.eof())
{
in.getline(buf, BUFSIZE-1);
if (boost::regex_search(buf, re))
{
cout << buf << endl;
}
}
}
Example 3. Trivial grep
You can see that you use regex_search
in the same way as regex_match
.
This comes in handy sometimes, but has limited appeal. More often, you will enumerate over all substrings that match a given pattern. For example, maybe you are writing a web crawler and want to iterate over all anchor
tags in a page. Craft a regular expression to grab anchor
tags:
<a/s+href="([/-:/w/d/.//]+)">
You don't want the whole line returned, though, as in the grep
example above; you want the target URL. To do this, use the second subexpression in match_results
. Example 4, a slightly modified version of Example 3, will do just that.
#include <string>
#include <boost/regex.hpp>
#include <fstream>
using namespace std;
const int BUFSIZE = 10000;
int main(int argc, char** argv) {
// Safety checks omitted...
boost::regex re("<a/s+href="([/-:/w/d/.//]+)">");
string file(argv[1]);
char buf[BUFSIZE];
boost::cmatch matches;
string sbuf;
string::const_iterator begin;
ifstream in(file.c_str());
while (!in.eof())
{
in.getline(buf, BUFSIZE-1);
sbuf = buf;
begin = sbuf.begin();
while (boost::regex_search(begin, sbuf.end(), matches, re))
{
string url(matches[1].first, matches[1].second);
cout << "URL: " << url << endl;
// Update the beginning of the range to the character
// following the match
begin = matches[1].second;
}
}
}
Example 4. Enumerating anchor
tags
The hard-coded regular expression in Example 4 contains lots of backslashes. This is necessary because I am escaping certain characters twice: once for the compiler, and once for the regular expression engine.
Example 4 uses a different overload of regex_search
than Example 3; this version takes two bidirectional iterator arguments that refer to the beginning and end of a range of characters to be searched. To access every matching substring, all I have to do is update begin
to point to the character following the last match, which is in matches[1].second
.
This is not the only way to iterate over all occurrences of a pattern. If you prefer (or require) iterator semantics, use a regex_token_iterator
, which is an iterator interface to the results from a regular expression search. In Example 4, you could just as easily have iterated over the results of the URL search:
boost::sregex_token_iterator p(s.begin(), s.end(), re, 0);
boost::sregex_token_iterator end;
for (;p != end;count++, ++p)
{
string m(p->first, p->second);
cout << m << endl;
}
That's not all, though. The first token iterator here passes a zero as the last argument to its constructor. This tells it to iterate over the strings that satisfy the regular expression. Change it to -1 and you get the opposite: iteration over substrings that do not satisfy the expression. In other words, it tokenizes the string, where each token is something that satisfies the regular expression. This is a cool feature, because it lets you tokenize a string of characters based on complex delimiters. To use the example of parsing a web page, you could, for example, break the document into sections by its headers, using header tags such as There is, of course, more to Boost.Regex than I've presented here, but this should give you a good idea of what you can do with regular expressions in C++. The documentation on the Boost.Regex page is comprehensive, and there are plenty of examples you can copy and experiment with. In addition to searching Above all, you should experiment with regular expression syntax. There are different ways to do the same thing, and it's fun to see how concise you can make an expression that does what you want. Once you're a pro at regular expressions, you will be surprised at how often you can use them to validate, search, or parse a string. Boost.Regex is the library in the Boost project that implements a regular expression engine in C++. You can use it to match, search, or search and replace with regular expressions against a target string, instead of writing ugly and cumbersome string-parsing code. Boost.Regex has been accepted as part of the next C++ standard library, and you will see it appearing in implementations of TR1 (in the Ryan Stephens is a software engineer, writer, and student living in Tempe, Arizona. He enjoys programming in virtually any language, especially C++. <h1>...</h1>
, <h3>...</h3>
, etc.Stuff to Check Out
string
s as I did above, you can:
Conclusion
tr1
namespace) from standard library vendors very soon. Check out Boost.Regex to get a feel for how useful it is, and while you're at it, take a look at many of the other libraries in Boost--there's a lot of good stuff there.
- Regular Expressions in C++ with Boost.Regex(4)
- Regular Expressions in C++ with Boost.Regex(1)
- Regular Expressions in C++ with Boost.Regex(2)
- Regular Expressions in C++ with Boost.Regex(3)
- Regular Expressions (3) ---- Using Regex in .NET
- Regular Expressions (3) ---- Using Regex in .NET
- Working with regular expressions in .NET
- Regular Expressions in Python with re module
- Validate With Regular Expressions
- Regular Expressions in Java
- Regular expressions in Perl
- neo4j where with Regular expressions
- Regular Expressions in 10 Minutes
- Optimizing regular expressions in Java
- Regular Expressions in ASP.NET
- Regular Expressions used in JavaScript
- Optimizing regular expressions in Java
- Algs4_5.4 Regular Expressions
- 一种JS实现的广告打开特效
- Regular Expressions in C++ with Boost.Regex(3)
- jsp誓言
- try except and try catch ZZ
- 使用VMnet1或VMnet8上网
- Regular Expressions in C++ with Boost.Regex(4)
- Java实现的常用排序算法及对比
- CCNP路由部分精华(网络管理)
- 给年轻工程师的十大忠告(转) --- 网络文摘
- bios in this system is not fully ACPI compliant 问题的解决
- 休息的已经够长了,是时候活动活动筋骨了
- C++复习 08 标准IO库
- 如何加主要配置中的一项
- URL映射 php 实现