A beginner's guide to writing a custom stream buffer (std::streambuf)

来源：互联网发布：mac apache 编辑：程序博客网时间：2024/05/29 03:21

原文：http://www.mr-edd.co.uk/blog/beginners_guide_streambuf

Streams are one of the major abstractions provided by the STL as part of the C++ standard library. Every newcomer to C++ is taught to write"Hello, world!" to the console usingstd::cout, which is itself anstd::ostream object and std::cin an std::istream object.

There's a lot more to streams than cout and cin, however. In this post I'll look at how we can extend C++ streams by creating our own custom stream buffers. Note thatbeginner in the title of this post refers to someone that's never implemented a custom stream buffer before and not necessarily a beginner to C++ in general; you will need at least a basic knowledge of how C++ works in order to follow this post. The code for all the examples is available at the end.

The C++ standard library provides the primary interface to manipulating the contents of files on disk through thestd::ofstream,std::ifstream and std::fstream classes. We also havestringstreams, which allow you to treat strings as streams and therefore compose a string from the textual representations of various types.

std::ostringstream oss;oss << "Hello, world!\\n";oss << 123 << '\\n';std::string s = oss.str();

Similarly, we're able to read data from a string by employing an std::istringstream and using the natural extraction (>>) operator.

Boost's lexical_cast facility uses this to good effect to allow conversions between types whose text representations are compatible, as well as a simple facility for quickly getting a string representation for an object of an 'output streamable type'.

using boost::lexical_cast;using std::string;int x = 5;string s = lexical_cast<string>(x);assert(s == "5");

At the heart of this flexibility is the stream buffer, which deals with the buffering and transportation of characters to or from their target or source, be it a file, a string, the system console or something else entirely. We could stream text over a network connection or from the flash memory of a particular device all through the same interface. The stream buffer is defined in a way that is orthogonal to the stream that is using it and so we are often able to swap and change the buffer a given stream is using at any given moment to redirect output elsewhere, if we so desire. I guess C++ streams are an example of thestrategy design pattern in this respect.

For instance, we can edit the standard logging stream (std::clog) to write in to a string stream, rather than its usual target, by making it use the string stream's buffer:

#include <iostream>#include <iomanip>#include <string>#include <sstream>int main(){    std::ostringstream oss;    // Make clog use the buffer from oss    std::streambuf *former_buff =        std::clog.rdbuf(oss.rdbuf());    std::clog << "This will appear in oss!" << std::flush;    std::cout << oss.str() << '\\n';    // Give clog back its previous buffer    std::clog.rdbuf(former_buff);    return 0;}

However, creating your own stream buffer can be a little tricky, or at least a little intimidating when you first set out to do so. So the idea of this post is to provide some example implementations for a number of useful stream buffers as a platform for discussion.

Let's first look at some of the underlying concepts behind a stream buffer. All stream buffers are derived from thestd::streambuf base class, whose virtual functions we must override in order to implement the customised behaviour of our particular stream buffer type. Anstd::streambuf is an abstraction of an array of chars that has its data sourced from or sent to a sequential access device. Under certain conditions the array will be re-filled (for an input buffer) orflushed and emptied (for an output buffer).

When inserting data in to an ostream (using <<, for example), data is written in to the buffer's array. When this arrayoverflows, the data in the array is flushed to the destination (orsink) and the state associated with the array is reset, ready for more characters.

When extracting data from an istream (using >>, for example), data is read from the buffer's array. When there is no more data left to read, that is, when the arrayunderflows, the contents of the array are re-filled with data from the source and the state associated with the array is reset.

To keep track of the different areas in the stream buffer arrays, six pointers are maintained internally, three for input and three for output.

For an output stream buffer, there are:

the put base pointer, as returned from std::streambuf::pbase(), which points to the first element of the buffer's internal array,
the put pointer, as returned from std::streambuf::pptr(), which points to the next character in the buffer that may be written to
and the end put pointer as returned from std::streambuf::epptr(), which points to one-past-the-last-element of the buffer's internal array.

Typically, pbase() and epptr() won't change at all; it will only bepptr() that changes as the buffer is used.

For an input stream buffer, we have 3 different pointers to contend with, though they have a roughly analogous purpose. We have:

the end back pointer, as returned from std::streambuf::eback(), which points to the last character (lowest in address) in the buffer's internal array in to which a character may beput back,
the get pointer, as returned from std::streambuf::gptr(), which points to the character in the buffer that will beextracted next by theistream
and the end get pointer, as returned from std::streambuf::egptr(), which points to one-past-the-last-element of the buffer's internal array.

Again, it is typically the case that eback() and egptr() won't change during the life-time of thestreambuf.

Input stream buffers, written for use with istreams, tend to be a little bit more complex than output buffers, written forostreams. This is because we should endeavor to allow the user to put characters back in to the stream, to a reasonable degree, which is done through thestd::istream's putback() member function. What this means is that we need to maintain a section at the start of the array forput-back space. Typically, one character of put-back space is expected, though there's no reason we shouldn't be able to provide more, in general.

Now you may have noticed that we are deriving from std::streambuf in order to create both an output buffer and an input buffer; there is nostd::istreambuf orstd::ostreambuf. This is because it is possible to provide a stream buffer that manipulates the same internal array as a buffer for both reading from and writing to an external entity. This is whatstd::fstream does, for example. However, implementing a dual-purposestreambuf is a fair bit trickier, so I won't be considering it in this post.

It is also possible to create buffers for wide character streams. std::streambuf is actually atypedef forstd::basic_streambuf<char>. Similarly there existsstd::wstreambuf, atypedef for std::basic_streambuf<wchar_t> which is the equivalent for wide character streams.

Example 1: `FILE` buffers to integrate with C code

For our first example, let's look at the case where you might have some legacy C code to contend with. Let's say you're handed aFILE* but you want to use a C++ stream interface to read or write data, rather than the traditionalFILE* interface provided by the C standard library. We'll start out with the case where we have aFILE* that is open for reading and we would like to wrap it in anstd::istream in order to extract the data.

Here's the interface.

#include <streambuf>#include <vector>#include <cstdlib>#include <cstdio>class FILE_buffer : public std::streambuf{    public:        explicit FILE_buffer(FILE *fptr, std::size_t buff_sz = 256, std::size_t put_back = 8);    private:        // overrides base class underflow()        int_type underflow();        // copy ctor and assignment not implemented;        // copying not allowed        FILE_buffer(const FILE_buffer &);        FILE_buffer &operator= (const FILE_buffer &);    private:        FILE *fptr_;        const std::size_t put_back_;        std::vector<char> buffer_;};

In the simplest implementation, we only have to override a single virtual function from the base class and add our own constructor, which is nice.

The constructor specifies the FILE* we'll be reading from, and the size of the internal array, which is specified via a combination of the remaining two arguments. To keep the implementation simple, we'll mandate that the following invariants hold (and are set up by the constructor):

The put-back area that we reserve will be the largest of 1 and that given as the 3rd constructor argument
The remaining buffer area will be at least as big as the put-back area i.e. the largest of the put-back area's size and the size given by the 2nd constructor argument

Now, we'll use an std::vector<char> as our buffer area. The firstput_back_ characters of this buffer will be used as our put-back area.

So let's have a look at the constructor's implementation, first of all:

#include "file_buffer.hpp"#include <algorithm>#include <cstring>using std::size_t;FILE_buffer::FILE_buffer(FILE *fptr, size_t buff_sz, size_t put_back) :    fptr_(fptr),    put_back_(std::max(put_back, size_t(1))),    buffer_(std::max(buff_sz, put_back_) + put_back_){    char *end = &buffer_.front() + buffer_.size();    setg(end, end, end);}

In the initialiser list, we're setting up the invariants that I spoke of. Now in the body of the constructor, we callstd::streambuf::setg() with the end address of the buffer as all three arguments.

Calling setg() is how we tell the streambuf about any updates to the positions ofeback(),gptr() and egptr(). To start with we'll have them all point to the same location, which will signal to us that we need to re-fill the buffer in our implementation ofunderflow(), which we'll look at now.

underflow() is contractually bound to give us the current character from the data source. Typically, this means it should return the next available character in the buffer (the one atgptr()).

However, if we've reached the end of the buffer, underflow() should re-fill it with data from the sourceFILE* and then return the first character of the newly replenished array. However, if the buffer is re-filled, we will need to callsetg() again, to tell the streambuf that we've updated the three delimiting pointers.

When the data source really is depleted, an implementation of underflow() needs to returntraits_type::eof().traits_type is a typedef that we inherited from thestd::streambuf base class. Note thatunderflow() returns anint_type, which is an integral type large enough to accommodate the value ofeof(), as well as the value of any givenchar.

std::streambuf::int_type FILE_buffer::underflow(){    if (gptr() < egptr()) // buffer not exhausted        return traits_type::to_int_type(*gptr());    char *base = &buffer_.front();    char *start = base;    if (eback() == base) // true when this isn't the first fill    {        // Make arrangements for putback characters        std::memmove(base, egptr() - put_back_, put_back_);        start += put_back_;    }    // start is now the start of the buffer, proper.    // Read from fptr_ in to the provided buffer    size_t n = std::fread(start, 1, buffer_.size() - (start - base), fptr_);    if (n == 0)        return traits_type::eof();    // Set buffer pointers    setg(base, start, start + n);    return traits_type::to_int_type(*gptr());}

The first line of the function looks to see if the buffer is exhausted. If not, it simply returns the current character, as given by*gptr().

In the case where the buffer is exhausted, we must (re-)fill it before returning the first new character. Recall that in the constructor, we set all three buffer pointers to the address of the character that is one-past-the-last element in the buffer. If we find in underflow() that the pointers aren't in this state, we know that the buffer has been filled at least once before now.

Now if we are re-filling the buffer, we memmove the last put_back_ characters from the end of the buffer to the put-back area (we don't usememcpy because our second invariant means thatmemmove() is sufficient).

Once we've dealt with the possible filling of the put-back area, we can fread() data from the source FILE* in to our buffer proper. Iffread() fails to read any data, we'll treat this as if the end-of-file condition has been met (which is a simplification that is probably correct in 99.99% of all cases and safe to assume all the time, anyhow).

But if fread() succeeded in sourcing us some new data, we tell thestreambuf as much by updating it's knowledge of three buffer pointers. Once that's done we return the current character from the newly replenished buffer.

That's about all we have to do for a basic implementation, which I hope you'll agree wasn't too hard. However, there is some extra functionality that we might like to add. In particular, we'd like to be able to seek within the stream. I'll perhaps save that for another post, but if you'd like to look at how to do that yourself, then look up thestd::streambuf::seekoff() andstd::streambuf::seekpos() virtual member functions.

We could also implement an output stream buffer for use with FILE*s opened for writing. But once you've seen the 3rd example, which implements an outputstreambuf you should be able to do this yourself, with any luck!

Example 2: reading from an array of bytes in memory

In this example, we'll look at the situation where you already have a read-only array of bytes in memory and you'd like to wrap it in anstd::istream to pluck out data in a formatted manner. This example is a little different from the previous one in that we don't really need a buffer. There's simply no advantage to having one because the data is the buffer, here. So all our stream buffer will do is pass through characters one at a time from the source.

So ideally, we'd like our class to have this trivial implementation:

class char_array_buffer : public std::streambuf{    public:        char_array_buffer(const char *begin, const char *end)        {            setg(begin, begin, end);        }        int_type underflow()        {            return  gptr() == egptr() ?                    traits_type::eof() :                    traits_type::to_int_type(*gptr());        }};

But alas, this just won't fly, because setg() takes pointers to non-constchars as its arguments. This is for good reason; if the buffer wasn't writeable, we wouldn't be able to provide a put-back facility in the general case. So we'll have to work around this, which is a pain, but it's not hard to do. This also gives us a chance to look at some of the other functions you might want to override.

So here's our char_array_buffer.hpp header:

#include <streambuf>class char_array_buffer : public std::streambuf{    public:        char_array_buffer(const char *begin, const char *end);        explicit char_array_buffer(const char *str);    private:        int_type underflow();        int_type uflow();        int_type pbackfail(int_type ch);        std::streamsize showmanyc();        // copy ctor and assignment not implemented;        // copying not allowed        char_array_buffer(const char_array_buffer &);        char_array_buffer &operator= (const char_array_buffer &);    private:        const char * const begin_;        const char * const end_;        const char * current_;};

You'll note that we've got a few more private functions this time. These all override virtual functions inherited from thestd::streambuf base class.

The first constructor takes two pointers that specify a contiguous sequence ofchars, using the STL-style convention whereby the interval is closed at the start and open at the end:[begin, end). The second constructor will take the base address of achar array and deduce its end address using std::strlen().

I'll describe what the new functions uflow(), pbackfail() andshowmanyc() do when we get around to defining them. Essentially they exist because we're not going to be callingsetg() (as we have no writeable buffer) and so we need to override some additional behaviours present in the base class to stop it thinking that we're continually at the end of the buffer.

Ok, so let's whizz through the constructor definitions. They simply set up the three private pointers to point in to the given array:

#include "char_array_buffer.hpp"#include <functional>#include <cassert>#include <cstring>char_array_buffer::char_array_buffer(const char *begin, const char *end) :    begin_(begin),    end_(end),    current_(begin_){    assert(std::less_equal<const char *>()(begin_, end_));}char_array_buffer::char_array_buffer(const char *str) :    begin_(str),    end_(begin_ + std::strlen(str)),    current_(begin_){}

As before, underflow() has to return the current character in the source, ortraits_type::eof() if the source is depleted. Nice and easy:

char_array_buffer::int_type char_array_buffer::underflow(){    if (current_ == end_)        return traits_type::eof();    return traits_type::to_int_type(*current_);}

Now we come on to uflow(), whose responsibility it is to return the current character and then increment the buffer pointer. The default implementation instd::streambuf::uflow() is to callunderflow() and return the result after incrementing the get pointer (as returned bygptr()). However, we aren't using the get pointer (we haven't calledsetg()) and so the default implementation is inappropriate. So we overrideuflow() like so:

char_array_buffer::int_type char_array_buffer::uflow(){    if (current_ == end_)        return traits_type::eof();    return traits_type::to_int_type(*current_++);}

Note that the default implementation of uflow() did exactly the right thing for ourFILE_buffer. You'll find that the need to overrideuflow() typically arises in stream buffers that don't use a writeable array for intermediate storage.

The next function we come to is pbackfail(). When you call std::istream::unget() or std::istream::putback(some_character), it is down to the stream buffer to putsome_character back in to the stream, if possible. Now a buffer can't do this if its end back pointer is equal to its end get pointer (i.e. ifeback() == gptr()).

However, in our char_array_buffer we always have the condition thateback() == gptr() since they are both initialised to0 by default. In this case,pbackfail(ch) will be called as a last resort withch having the value of the character to put back in to the stream ortraits_type::eof() if the character that was already at the previous position should be left unchanged.

If pbackfail() is able to put back the given character, it should return something other thantraits_type::eof(). The default implementation ofpbackfail() is to always returntraits_type::eof(), so we need to override it:

char_array_buffer::int_type char_array_buffer::pbackfail(int_type ch){    if (current_ == begin_ || (ch != traits_type::eof() && ch != current_[-1]))        return traits_type::eof();    return traits_type::to_int_type(*--current_);}

Now, we really can't put back a character if current_ == begin_ or if the character to put back in the stream isn't the same as the one atcurrent_[-1] (because the characters in the array are immutable). So we check these conditions first. If we get through to the other side, we can decrementcurrent_ and return something that isn'ttraits_type::eof() to indicate success.

Note that we could have considered overriding pbackail() in the FILE_buffer class, too, by attempting to fseek() backwards and refill the buffer.

The final override to consider is showmanyc(). This is called by std::streambuf::in_avail() (which is public) when gptr() == egptr() in order to return the number of characters that can definitely be extracted from the stream before itblocks. Since we're always in the situation wheregptr() == egptr() it would only be polite to overrideshowmanyc() to return something sensible, rather than the default value of0:

std::streamsize char_array_buffer::showmanyc(){    assert(std::less_equal<const char *>()(current_, end_));    return end_ - current_;}

So this stream buffer was a little bit more complicated than the last, but not overly so. The extra complexity comes from the fact that we aren't using a buffer internally and so we're required to override the functions instd::streambuf that expect us to do so by default.

Example 3: a capitalisation buffer

So far we've only looked at input stream buffers for use with std::istreams. Now let's have a look at an output stream buffer. If you've got this far, you'll find the next example pretty easy.

We'll implement a buffer that transforms the first letter of every sentence in to its upper case equivalent. We'll stick with the default locale for this example. It should be trivial to plumb in support custom locales if need be. Here's the caps_buffer.hpp header:

#include <streambuf>#include <iosfwd>#include <cstdlib>#include <vector>class caps_buffer : public std::streambuf{    public:        explicit caps_buffer(std::ostream &sink, std::size_t buff_sz = 256);    protected:        bool do_caps_and_flush();    private:        int_type overflow(int_type ch);        int sync();        // copy ctor and assignment not implemented;        // copying not allowed        caps_buffer(const caps_buffer &);        caps_buffer &operator= (const caps_buffer &);    private:        bool cap_next_;        std::ostream &sink_;        std::vector<char> buffer_;};

All we have to do is override overflow() and sync(), which we inherit from thestd::streambuf base class.overflow is called whenever the put pointer is equal to the end put pointer i.e. whenpptr() == epptr(). It isoverflow()'s responsibility to write the contents of any internal buffer and the character it is given as an argument to the target. It should return something other thantraits_type::eof() on success.

It is sync()'s job to write the current buffered data to the target, even when the buffer isn't full. This could happen when thestd::flush manipulator is used on the stream, for example.sync() should return-1 on failure.

We'll add a helper function, do_caps_and_flush() that performs the capitalisation work on the buffer contents and then writes the modified contents tosink_, which is work that is common to both overridden functions. We'll use thecap_next_ member to signal that the next letter we come across should be capitalised. It will be set totrue on construction and whenever we come across a'.' character in the buffer and back tofalse once we've transformed a letter in to upper case.

Let's have a look at the constructor's implementation:

#include "caps_buffer.hpp"#include <cctype>#include <ostream>#include <functional>#include <cassert>caps_buffer::caps_buffer(std::ostream &sink, std::size_t buff_sz) :    cap_next_(true),    sink_(sink),    buffer_(buff_sz + 1){    sink_.clear();    char *base = &buffer_.front();    setp(base, base + buffer_.size() - 1); // -1 to make overflow() easier}

Here we see that the smallest possible size of buffer_ is 1. We usesetp() in the implementation of an output buffer in a similar way tosetg() for input buffers. However, we only need to specify two pointers this time: the put base pointer (pbase()) and the end put pointer (epptr()). This is because we don't have to worry about a put-back area like we did for input buffers.

But you'll note that the second argument to setp() isn't the usual address of the element at position one-past-the-end. Instead it is one byte less. This makes the implementation ofoverflow() easier, since inside there we'll need to deal with the character given as an argument before flushing the buffer to the sink; we'll always have space to put thischar on the end of the buffer if we setepptr() as shown.

So let's now take a look at the implementation of overflow():

caps_buffer::int_type caps_buffer::overflow(int_type ch){    if (sink_ && ch != traits_type::eof())    {        assert(std::less_equal<char *>()(pptr(), epptr()));        *pptr() = ch;        pbump(1);        if (do_caps_and_flush())            return ch;    }    return traits_type::eof();}

Here we write ch to the buffer (assuming it's not traits_type::eof() and the sink is in a fit state) and then incrementpptr() by callingpbump(1). It's always safe to write ch to *pptr() in this way because we reserved an extrachar at the end of our buffer in the constructor.

Once ch is in the buffer and pptr() has been incremented to delimit the open end of the range, we calldo_caps_and_flush() to perform our dirty work, which will returntrue on success.

The implementation of sync() is trivial:

int caps_buffer::sync(){return do_caps_and_flush() ? 0 : -1;}

So let's have a look at do_caps_and_flush(). It's exactly as you might expect:

bool caps_buffer::do_caps_and_flush(){    for (char *p = pbase(), *e = pptr(); p != e; ++p)    {        if (*p == '.')            cap_next_ = true;        else if (std::isalpha(*p))        {            if (cap_next_)                *p = std::toupper(*p);            cap_next_ = false;        }    }    std::ptrdiff_t n = pptr() - pbase();    pbump(-n);    return sink_.write(pbase(), n);}

Note that we didn't really have to use an internal buffer for this example, we could have simply processed characters one at a time and immediately sent them to the sink inoverflow() (in which case the default implementation ofsync() would have been sufficient, too). However, I thought it would be more useful to see how to create a true buffered implementation.

Introducing the Boost IOStreams library

If you were new to stream buffers before you read this post, I hope you feel a little more comfortable with them now. All the implementations were pretty basic, but a lot more is possible. However, I've found that once I start attempting more extravagant buffers, things can get fiddly pretty quickly. This is when I'll reach for the Boost IOStreams library, which provides a framework for implementing more involved buffers and streams.

It also allows you to treat the sources, sinks, filters and other concepts independently of one another. In our final example, we hard coded the sink as anotherstd::ostream. What if we wanted the data to go somewhere that doesn't have a stream interface? The Boost IOStreams library allows more flexibility in this area by isolating concepts that I've had to mash together in my example code.

Downloads

example_code.zip

0 0