Boost学习笔记-tokenizer

来源：互联网发布：怎么允许软件访问网络编辑：程序博客网时间：2024/04/29 21:40

在实际应用中，字符串分解相当重要，在以前一直没有找到简单的字符串分解的库，一般直接使用MFC的CString进行分词，虽然lex也可以完成这些工作，但是有种杀鸡用牛刀的味道。

最近学习boost，发现boost的tokenizer可以完成这些功能，并且使用起来也相当简单。在boost文档中的一些用例都比较短，也而且也不难扩展，
Tokenizer 类
template <class TokenizerFunc = char_delimiters_separator<char>,
class Iterator = std::string::const_iterator,
class Type = std::string>
class tokenizer
其中TokenizerFunc为用于分解的函数对象，Iterator为访问该序列的迭代器类型，Type为字符串类型。
例子1 使用默认TokenizerFunc的字符串分解
// simple_example_1.cpp
#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>
int main(){
using namespace std;
using namespace boost;
string s = "This is, a test";
tokenizer<> tok(s);
for(tokenizer<>::iterator beg=tok.begin(); beg!=tok.end();++beg) {
cout << *beg << "/n";
    }
}
在例子1中tokenizer使用了默认的模板，并且使用了默认的TokenizerFunc。其调用的构造函数为
template<class Container>
    tokenizer(const Container& c,const TokenizerFunc& f = TokenizerFunc())
例子2
// char_sep_example_1.cpp
#include <iostream>
#include <boost/tokenizer.hpp>
#include <string>
int main(){
    std::string str = ";;Hello|world||-foo--bar;yow;baz|";
    typedef boost::tokenizer<boost::char_separator<char> >
    tokenizer; boost::char_separator<char> sep("-;|");
    tokenizer tokens(str, sep);
    for (tokenizer::iterator tok_iter = tokens.begin();tok_iter != tokens.end(); ++tok_iter)
        std::cout << "<" << *tok_iter << "> ";
    std::cout << "/n";
    return EXIT_SUCCESS;
}
输出结果是：
    <Hello> <world> <foo> <bar> <yow> <baz>
在以上的例子中通过使用char_separator这个TokenizerFunc的实现，来完成更复杂的应用。

如果将tokenizer模板设为宽字符wstring,则可以对中文进行处理
    std::wstring str = _T(";;Hello我是一个兵，|world||-foo--bar;yow;baz|");
    typedef boost::tokenizer<boost::char_separator<wchar_t> ,std::wstring::const_iterator,std::wstring>
        tokenizer;
    boost::char_separator<wchar_t> sep(_T("-;|,是"));
    tokenizer tokens(str, sep);
    std::wcout.imbue(std::locale("chs"));
    for (tokenizer::iterator tok_iter = tokens.begin();
        tok_iter != tokens.end(); ++tok_iter)
        std::wcout << "<" << *tok_iter << "> ";
    std::wcout << "/n";
输出结果是
<Hello我> <一个兵，> <world> <foo> <bar> <yow> <baz>

更多例子可以参见boost 文档