Boost Tokenizer 使用介绍

来源：互联网发布：软件家园编辑：程序博客网时间：2024/05/16 14:40

Boost Tokenizer 使用介绍

-------------------------

1. 介绍

Boost Tokenizer提供了一种把字符序列转换成一组Token的能力，当然，你也可以定义TokenizerFunction来自定义序列的切分符号，如果不指定，默认是以空格为分割，去掉一些标点符号。

2. 几个简单的例子

下面是一个简单的例子：

// simple_example_1.cpp#include<iostream>#include<boost/tokenizer.hpp>#include<string>int main(){   using namespace std;   using namespace boost;   string s = "This is,  a test";   tokenizer<> tok(s);   for(tokenizer<>::iterator beg=tok.begin(); beg!=tok.end();++beg){       cout << *beg << "\n";   }}

结果如下：
This
is
a
test

这里已经过滤了标点符号。

下面是一个以字符步长来进行分割的例子：

// simple_example_3.cpp#include<iostream>#include<boost/tokenizer.hpp>#include<string>int main(){   using namespace std;   using namespace boost;   string s = "12252001";   int offsets[] = {2,2,4};   // 这里指定了三个步长   offset_separator f(offsets, offsets+3);   tokenizer<offset_separator> tok(s,f);   for(tokenizer<offset_separator>::iterator beg=tok.begin(); beg!=tok.end();++beg){       cout << *beg << "\n";   }}

结果如下：
12
23
2001

3. 什么是TokenizerFunction

TokenizerFunction是一个用于查询符合匹配要求的token，目前提供了三种TokenizerFunction模板，
× escaped_list_separator 主要用于解析csv格式的字符串
explicit escaped_list_separator(Char e = '\\', Char c = ',',Char q = '\"')
escaped_list_separator(string_type e, string_type c, string_type q):
× offset_separator 主要用于解析基于特定步长的要求

template<typename Iter>

offset_separator(Iter begin,Iter end,bool bwrapoffsets = true, bool breturnpartiallast = true)

× char_separator 主要是用于解析基于特定字符分割的需求

explicit char_separator(const Char* dropped_delims,

const Char* kept_delims = "",
empty_token_policy empty_tokens = drop_empty_tokens)

4. 一个简单的解析/etc/passwd的例子

/** * @auth lemo.lu * @date 2011.11.03 * * example of Boost tokenizer template usage,This example uses delimiter * separator.  */// stl header#include <iostream>                  // iostream#include <string>                    // string#include <fstream>                   // ifstream// boost#include <boost/tokenizer.hpp>       // boost Tokenizerint main(){    std::ifstream passwdFile;    passwdFile.open("/etc/passwd",std::ifstream::in);    // store password line    char passwdString[256];        typedef boost::tokenizer<boost::char_separator<char> > passwdTokenizer;    // set a TokenizerFunction , dropped delimiters ":" and keep delimiters ""    boost::char_separator<char> tokenSep(":", "", boost::keep_empty_tokens);    // passwd format information    static const char* passwd_st[] = { "Account","password","UID","GID","GECOS","Dir","Shell"    };    // iterator the passwd file    while(passwdFile.good())    {        // get line        passwdFile.getline(passwdString,256);        passwdTokenizer tok(std::string(passwdString), tokenSep);        int passwd_c = 0;        for(passwdTokenizer::iterator curTok=tok.begin(); curTok!=tok.end(); ++curTok)            std::cout << passwd_st[passwd_c++] << ":" << *curTok  << std::endl;        std::cout << "---------------------" << std::endl;    }passwdFile.close();}

部分结果如下：

Account:root
password:x
UID:0
GID:0
GECOS:root
Dir:/root
Shell:/bin/bash
---------------------
Account:daemon
password:x
UID:1
GID:1
GECOS:daemon
Dir:/usr/sbin
Shell:/bin/sh
---------------------

5. 参考

http://www.boost.org/doc/libs/1_47_0/libs/tokenizer/