简单常识——关于stream

来源：互联网发布：java 图片木马过滤编辑：程序博客网时间：2024/04/17 01:36

从文件中读入一行

简单，这样就行了：

ifstream ifs("input.txt");
char buf[1000];
ifs.getline(buf, sizeof buf);

string input(buf);

当然，这样没有错，但是包含不必要的繁琐和拷贝，况且，如果一行超过1000个字符，就必须用一个循环和更麻烦的缓冲管理。下面这样岂不是更简单？

string input;
input.reserve(1000);
ifstream ifs("input.txt");
getline(ifs, input);

不仅简单，而且安全，因为全局函数 getline 会帮你处理缓冲区用完之类的麻烦，如果你不希望空间分配发生的太频繁，只需要多 reserve 一点空间。

这就是“简单常识”的含义，很多东西已经在那里，只是我一直没去用。

---------------------------------------------------------------------------

一次把整个文件读入一个 string

我希望你的答案不要是这样：

string input;
while( !ifs.eof() )
{
    string line;
    getline(ifs, line);
    input.append(line).append(1, '/n');
}

当然了，没有错，它能工作，但是下面的办法是不是更加符合 C++ 的精神呢？

string input(
istreambuf_iterator(instream.rdbuf()),
istreambuf_iterator()
);

同样，事先分配空间对于性能可能有潜在的好处：

string input;
input.reserve(10000);
input.assign(
istreambuf_iterator(ifs.rdbuf()),
istreambuf_iterator()
);

很简单，不是么？但是这些却是我们经常忽略的事实。
补充一下，这样干是有问题的：

    string input;
    input.assign(
        istream_iterator(ifs),
        istream_iterator()
    );

因为它会忽略所有的分隔符，你会得到一个纯“字符”的字符串。最后，如果你只是想把一个文件的内容读到另一个流，那没有比这更快的了：

fstream fs("temp.txt");
cout << fs.rdbuf();

因此，如果你要手工 copy 文件，这是最好的（如果不用操作系统的 API）：

   ifstream ifs("in.txt");
   ofstream ofs("out.txt");
   ofs << in.rdbuf();

-------------------------------------------------------------------------

open 一个文件的那些选项

ios::in     Open file for reading
ios::out    Open file for writing
ios::ate    Initial position: end of file
ios::app    Every output is appended at the end of file
ios::trunc If the file already existed it is erased
ios::binary Binary mode

-------------------------------------------------------------------------

还有 ios 的那些 flag

flageffect if setios_base::boolalphainput/output bool objects as alphabetic names (true, false).ios_base::decinput/output integer in decimal base format.ios_base::fixedoutput floating point values in fixed-point notation.ios_base::hexinput/output integer in hexadecimal base format.ios_base::internalthe output is filled at an internal point enlarging the output up to the field width.ios_base::leftthe output is filled at the end enlarging the output up to the field width.ios_base::octinput/output integer in octal base format.ios_base::rightthe output is filled at the beginning enlarging the output up to the field width.ios_base::scientificoutput floating-point values in scientific notation.ios_base::showbaseoutput integer values preceded by the numeric base.ios_base::showpointoutput floating-point values including always the decimal point.ios_base::showposoutput non-negative numeric preceded by a plus sign (+).ios_base::skipwsskip leading whitespaces on certain input operations.ios_base::unitbufflush output after each inserting operation.ios_base::uppercaseoutput uppercase letters replacing certain lowercase letters.

There are also defined three other constants that can be used as masks:

constantvalueios_base::adjustfieldleft | right | internalios_base::basefielddec | oct | hexios_base::floatfieldscientific | fixed

--------------------------------------------------------------------------

用我想要的分隔符来解析一个字符串，以及从流中读取数据

这曾经是一个需要不少麻烦的话题，由于其常用而显得尤其麻烦，但是其实 getline 可以做得不错：

    getline(cin, s, ';');
    while ( s != "quit" )
    {
        cout << s << endl;
        getline(cin, s, ';');
    }

简单吧？不过注意，由于这个时候 getline 只把 ; 作为分隔符，所以你需要用 ;quit; 来结束输入，否则 getline 会把前后的空格和回车都读入 s ，当然，这个问题可以在代码里面解决。

同样，对于简单的字符串解析，我们是不大需要动用什么 Tokenizer 之类的东西了：

#include
#include
#include

using namespace std;

int main()
{
    string s("hello,world, this is a sentence; and a word, end.");
    stringstream ss(s);

    for ( ; ; )
    {
        string token;
        getline(ss, token, ',');
        if ( ss.fail() ) break;

        cout << token << endl;
    }
}

输出：

hello
world
this is a sentence; and a word
end.

很漂亮不是么？不过这么干的缺陷在于，只有一个字符可以作为分隔符。

--------------------------------------------------------------------------

把原本输出到屏幕的东西输出到文件，不用到处去把 cout 改成 fs

#include
#include

using namespace std;

int main()
{
    ofstream ofs("temp.txt");

    cout << "output to screen." << endl;
    ofs << "output to file." << endl;

    streambuf *buf = cout.rdbuf(ofs.rdbuf());

    cout << "output to cout, but to file." << endl;
}

输出到屏幕的是：

output to screen.

输出到文件的是：

output to file.
output to cout, but to file.

也就是说，只要改变 ostream 的 rdbuf ，就可以重定向了，但是这招对 fstream 和 stringstream 都没用。

--------------------------------------------------------------------------

关于 istream_iterator 和 ostream_iterator

经典的 ostream_iterator 例子，就是用 copy 来输出：

#include
#include
#include
#include
#include
#include

using namespace std;

int main()
{
    vector vect;
    for ( int i = 1; i <= 9; ++i )
        vect.push_back(i);

    copy(vect.begin(), vect.end(),
        ostream_iterator(cout, " ")
    );
    cout << endl;

    ostream_iterator os_iter(cout, " ~ ");
    *os_iter = 1.0;
    os_iter++;
    *os_iter = 2.0;
    *os_iter = 3.0;
}

输出：

1 2 3 4 5 6 7 8 9
1 ~ 2 ~ 3 ~

很明显，ostream_iterator 的作用就是允许对 stream 做 iterator 的操作，从而让算法可以施加于 stream 之上，这也是 STL 的精华。与前面的“读取文件”相结合，我们得到了显示一个文件最方便的办法：

    copy(istreambuf_iterator(ifs.rdbuf()),
         istreambuf_iterator(),
         ostreambuf_iterator(cout)
    );

同样，如果你用下面的语句，得到的会是没有分隔符的输出：

    copy(istream_iterator(ifs),
         istream_iterator(),
         ostream_iterator(cout)
    );

那多半不是你要的结果。如果你硬是想用 istream_iterator 而不是 istreambuf_iterator 呢？还是有办法：

    copy(istream_iterator(ifs >> noskipws),
         istream_iterator(),
         ostream_iterator(cout)
    );

但是这样不是推荐方法，它的效率比第一种低不少。
如果一个文件 temp.txt 的内容是下面这样，那么我的这个从文件中把数据读入 vector 的方法应该会让你印象深刻。

12345 234 567
89 10

程序：

#include
#include
#include
#include
#include

using namespace std;

int main()
{
    ifstream ifs("temp.txt");

    vector vect;
    vect.assign(istream_iterator(ifs),
        istream_iterator()
    );

    copy(vect.begin(), vect.end(), ostream_iterator(cout, " "));
}

输出：

12345 234 567 89 10

很酷不是么？判断文件结束、移动文件指针之类的苦工都有 istream_iterator 代劳了。

-----------------------------------------------------------------------

其它算法配合 iterator

计算文件行数：

    int line_count =
        count(istreambuf_iterator(ifs.rdbuf()),
              istreambuf_iterator(),
              '/n');

当然确切地说，这是在计算文件中回车符的数量，同理，你也可以计算文件中任何字符的数量，或者某个 token 的数量：

    int token_count =
        count(istream_iterator(ifs),
              istream_iterator(),
              "#include");

注意上面计算的是 “#include” 作为一个 token 的数量，如果它和其他的字符连起来，是不算数的。

------------------------------------------------------------------------
Manipulator

Manipulator 是什么？简单的说，就是一个接受一个 stream 作为参数，并且返回一个 stream 的函数，比如上面的 unskipws ，它的定义是这样的：

inline ios_base&
noskipws(ios_base& __base)
{
__base.unsetf(ios_base::skipws);
return __base;
}

这里它用了更通用的 ios_base 。知道了这一点，你大概不会对自己写一个 manipulator 有什么恐惧感了，下面这个无聊的 manipulator 会忽略 stream 遇到第一个分号之前所有的输入（包括那个分号）：

template
inline std::basic_istream&
ignoreToSemicolon (std::basic_istream& s)
{
s.ignore(std::numeric_limits::max(), s.widen(';'));
return s;
}

不过注意，它不会忽略以后的分号，因为 ignore 只执行了一次。更通用一点，manipulator 也可以接受参数的，下面这个就是 ignoreToSemicolon 的通用版本，它接受一个参数， stream 会忽略遇到第一个该参数之前的所有输入，写起来稍微麻烦一点：

struct IgnoreTo {
    char ignoreTo;
    IgnoreTo(char c) : ignoreTo(c)
    {}
};

std::istream& operator >> (std::istream& s, const IgnoreTo& manip)
{
    s.ignore(std::numeric_limits::max(), s.widen(manip.ignoreTo));
    return s;
}

但是用法差不多：

    copy(istream_iterator(ifs >> noskipws >> IgnoreTo(';')),
         istream_iterator(),
         ostream_iterator(cout)
    );

其效果跟 IgnoreToSemicolon 一样。

----------------------------------------------------------------------------