C++ XML Serialisation, with Compression

来源:互联网 发布:java去除所有html标签 编辑:程序博客网 时间:2024/06/07 06:08
  • Download documentation - 127 Kb
  • Download source - 1.25 MB

Image

Executive Summary

The following C++ XML serialisation classes and templates can be used with or without MFC/STL. The XML may also, optionally, be compressed to save storage space. XML can be highly compressed due to its repeatedness. A compression ratio of 98% is quite common.

Introduction

Being able to save and load your data in XML is very useful for many different reasons. Here are a few benefits:

  • Human readable.
  • Platform independent.
  • Seamless schema versioning.

The target development and test platform used to create the XML serialisation classes is Microsoft Visual C++ 6 onwards. The XML serialisation classes and templates can be used with or without MFC/STL. The compression classes use zlib to compress the XML data before saving, and to uncompress before loading. This is useful as XML data can be quite large and XML compresses well. This also provides a small amount of protection/privacy from prying eyes, as the raw XML will not be readily viewable. The compression is optional. You only have to concern yourself with the macros, but documentation is provided for all the other classes and templated functions.

Here is a very brief example for clarity:

#include "HsXmlArchive.h"class CFred{private:    int m_cost;    float m_markup;public:    CFred() : m_cost(123), m_markup(456.789f) { }    DECLARE_XML_SERIAL;};bool CFred::SerializeXml( HS::CXmlArchive &archive,                           MSXML::IXMLDOMNodePtr pCurNode){    IMPLEMENT_XML_SERIAL_BEGIN(CFred);        XML_ELEMENT(m_cost);        XML_ELEMENT(m_markup);    IMPLEMENT_XML_SERIAL_END;}void main(void){    CFred fred;    HS::CXmlArchive archive(HS::CXmlArchive::save,                             false, "output.xml");    MSXML::IXMLDOMNodePtr pCurNode =          archive.Start("Your message goes here");        fred.SerializeXml(archive, pCurNode);    archive.End();}

The generated output placed in output.xml is:

<?xml version="1.0" ?><!-- Your message goes here --><root>    <CFred>        <m_cost>123</m_cost>        <m_markup>456.789</m_markup>    </CFred></root>

How to Implement

The design goal of the XML serialisation classes and macros was to facilitate the minimum amount of coding for the developer. MFC and STL containers are also supported.

Compiler settings

Paths

If you are going to use compression, then a path to the zlib.h header file is required. You can accomplish this in one of two ways:

  • VC 6:
    1. Project settings --> C++ tab --> Pre-processor category. Specify the path in the Additional Include Directories edit box, or,
    2. Tools --> Options --> Directories tab. Add in the path for the include directories.
  • VC 7:
    1. Project properties --> Configuration properties --> C/C++. Specify the path in the Additional Include Directories edit box, or,
    2. Tools --> Options --> Projects --> VC++ Directories. Select "Include Files" from the ‘Show directories for’ combo box. Add in the path for the include directories.

A path to the XML serialisation files is also required. This can be accomplished using the same method as shown above.

Definitions

If you are going to use compression, then some definitions are required. You can accomplish this in one of two ways:

  • VC 6: Select Project settings – C++ tab, General category, Preprocessor definitions: Append the following to the definitions already there:
    ,HS_USE_COMPRESSION,_WINDOWS,ZLIB_DLL
  • VC 7: Project properties --> Configuration properties --> C/C++ --> Preprocessor. Append the following to the definitions already there:
    ;HS_USE_COMPRESSION;_WINDOWS;ZLIB_DLL
  • Add the following to the stdafx.h file:
    #define HS_USE_COMPRESSION// Required for zlib#define _WINDOWS#define ZLIB_DLL

Header file

In the header file of your class, you need to add the following #include statement: #include "HSXmlArchive.h".

Within the class, you need to add the following macro in a public section: DECLARE_XML_SERIAL;.

Here is an example of a simple class, with the XML serialisation highlighted in bold:

#include "HSXmlArchive.h"class CSimpleClass{private:    // Member variables here    float m_num;public:    CSimpleClass();    ...     // Other functions here    DECLARE_XML_SERIAL;    // The SerializeXml prototype macro};

That’s all that is required in the header file. You may need to provide a path to HSXmlArchive.h.

Implementation file

  • In the implementation file, you need to add the following function as this was declared using the DECLARE_XML_SERIAL macro in the header file:
    bool SerializeXml(HS::CXmlArchive &archive,                   MSXML::IXMLDOMNodePtr pCurNode)
  • Use the following macro, passing in the name of your class:
    IMPLEMENT_XML_SERIAL_BEGIN(your class name goes here);
  • Use the relevant element macros, passing in the variable to be serialised:
    XML_ELEMENT(variable);XML_ELEMENT_NAMED(XmlVariableName, variable);

    Remember to indent these as the IMPLEMENT_XML_SERIAL_BEGIN macro declares an open curly bracket {. This is strictly not necessary, but static checkers such as PC-Lint would complain about lack of indentation.

  • If your class has inherited from one or more base classes that also declare the DECLARE_XML_SERIAL macro, then simply add the following line for each base class:
    SERIALIZE_XML_BASE_CLASS(base class name goes here);
  • Here is a full example of our simple class.
    bool CSimpleClass::SerializeXml(HS::CXmlArchive &archive,                                 MSXML::IXMLDOMNodePtr pCurNode){    IMPLEMENT_XML_SERIAL_BEGIN(CSimpleClass);        XML_ELEMENT(m_num);    IMPLEMENT_XML_SERIAL_END;}

An Example

For this example, three classes are going to be serialised. CFred, CFred2, and CMyStringVector. These can be found in the "MFC demo" directory.

Figure 1

Classes

CMyStringVector

This class inherits from CObject and utilises CString, and therefore uses MFC. It contains two public member variables, both of which utilise STL containers:

  • m_FredList attribute is an STL container of type: vector<CFred>
  • m_list attribute is an STL container of type: vector<CString>

CFred

This is a simple class, and contains four private member variables that can be seen in Figure 1.

CFred2

This class utilises multiple inheritance by inheriting from both CFred and CMyStringVector. This class contains three protected member variables that can be seen in Figure 1.

How classes save / load their state via XML

CMyStringVector

The two STL containers are serialised as shown below:

IMPLEMENT_XML_SERIAL_BEGIN(CMyStringVector);    SERIALIZE_XML_STL_VARIANT(m_list, true);    SERIALIZE_XML_STL_CLASS(m_FredList, true);IMPLEMENT_XML_SERIAL_END;
  • The m_list contains a list of CHsVariant compatible variables, therefore the SERIALIZE_XML_STL_VARIANT macro is used. Note the ‘true’ parameter, this tells the XML serialisation classes that this STL container has the reserve function and that reserve should be called during a load operation to pre-allocate memory. A vector implements reserve. More on this topic later.
  • The m_FredList contains a list of classes, which must all utilise the DECLARE_XML_SERIAL macro; therefore, the SERIALIZE_XML_STL_CLASS macro is used.

CFred

Having four variables, it is implemented simply as follows:

IMPLEMENT_XML_SERIAL_BEGIN(CFred);    XML_ELEMENT(m_a);    XML_ELEMENT(m_b);    XML_ELEMENT(m_c);    XML_ELEMENT(m_d);IMPLEMENT_XML_SERIAL_END;

CFred2

Contains three variables (two integers and one MFC CString). XML serialisation is just the same, utilising the XML_ELEMENT macro. This class inherited from two classes, both of which declare DECLARE_XML_SERIAL. To serialise a base class, the SERIALIZE_XML_BASE_CLASS macro is used.

IMPLEMENT_XML_SERIAL_BEGIN(CFred2);    XML_ELEMENT(c);    XML_ELEMENT(d);    XML_ELEMENT(txt);    SERIALIZE_XML_BASE_CLASS(CFred);    SERIALIZE_XML_BASE_CLASS(CMyStringVector);IMPLEMENT_XML_SERIAL_END;

Dynamic types

This section deals with having a list that contains base class pointers pointing to various derived types. For example:

If you have a container that has a list of CMyGraphicalObject base class pointers, those pointers could point to any type derived from CMyGraphicalObject, such as CMyCircle, CMyTriangle, or CMySquare. In order to save/load these dynamic types, the XML serialisation needs a little help from you. We can’t use the mechanism used by MFC and the CRuntimeClass class, as this would not be a generic solution suitable for non-MFC applications.

The help required is as follows:

  • The base class inherits from the HS::CHsObject abstract class.
  • An enumeration named eHsObjectType needs to be created.
  • Each of the dynamically creatable classes need to implement the HsObjectType() virtual function. This will return one of the eHsObjectType enumerated types relevant to that class.
  • A function named CreateHsObject needs to be created. This function takes the eHsObjectType enumerated type as a parameter. It should return a newly created object of that type.

MFC

Inside the MyList.h, there is a function named SerializeXmlDynamicHsObject that performs the saving/loading of dynamic types. The saving code looks like this:

int nCount(m_nCount);XML_ELEMENT(nCount);POSITION pos = GetHeadPosition();while(pos){    TYPE *p = GetNext(pos);    eHsObjectType type(p->HsObjectType());    XML_ELEMENT_ENUM(type, eHsObjectType);    p->SerializeXml(archive, pCurNode);}

The loading code looks like this:

RemoveAll();int nCount(0);XML_ELEMENT(nCount);while(nCount--){    eHsObjectType type(NO_OBJECT);    XML_ELEMENT_ENUM(type, eHsObjectType);    TYPE *p = CreateHsObject(type);    ASSERT(p);    if(p)    {        AddTail(p);        p->SerializeXml(archive, pCurNode);    }}

STL

For STL containers, there are two macros inside HsXmlArchive.h, named:

  • SERIALIZE_XML_STL_CLASS_DynamicHsObject
  • SERIALIZE_XML_STL_MAP_CLASS_DynamicHsObject

These macros expand to the following:

HS::STLSerializeClassTypeDynamic<(bCallReserve)>( pCurNode,                          (list_name), archive, (#list_name))HS::STLSerializeMapClassTypeDynamic( pCurNode,             (list_name), archive, (#list_name))

How it Works

As you can see from the examples above, you only have to utilise the macros provided. This saves a lot of typing and aggravation, and makes your code look neater and simpler.

All the XML serialisation classes have the namespace HS (short for Hicrest Systems).

You only have to concern yourself with the macros, but documentation is provided for all the other classes and templated functions. You may possibly need to extend CHsVariant if you have a type that cannot be converted to a variant.

Macros

Please refer to the Word documentation pertaining to the macros, there are too many to list here.

CHsVariant

This class is a replacement for the _variant_t class. It adds extra functionality not present in _variant_t. It should be noted that the destructor for _variant_t is not virtual, and could cause problems during destruction. Therefore, replacing _variant_t with this class seemed logical.

I have added extra functionality to convert int, CString and std::string into the relevant variant types. You may also want to add extra functionality to this class to help convert your types.

It will convert the following types for you:

short, long, float, double, CY, _bstr_t, wchar_t, char*, IDispatch*, bool, IUnknown, DECIMAL, BYTE, int, CString, std::string

When loading an XML file, CHsVariant is called upon to set the variable. This function is called GetValue():

template <typename t>void GetValue(T *pVar) const throw(_com_error)// Sets variable passed in pVar{    *pVar = *this;    // T cannot be const as it is being set here}

You will see GetValue() being used within CXmlArchive() during a load. If you experience a compiler error at the *pVar = *this assignment, then you passed in a const variable. This is OK for saving, but not for loading. As the same function is used for saving and loading, a non-const variable should be passed.

CXmlArchive

This class is the heart of the XML serialisation mechanism; all other classes and macros are peripheral to this class.

Member variables

public:    static enum eState { save, load };        private:    // Main document pointer    MSXML::IXMLDOMDocumentPtr m_pDom;    // True if all is ok    bool m_bIsOk;    // True if to compress the XML data during save/load.    bool m_bCompress;    // Saving or loading    const eState m_eState;    // Filename to save/load    const _variant_t m_sFileName;    // Error string if there is an error    std::string m_sError;
  • eState – This is an enumeration for the saving or loading state. It is public so that you can pass in the state to the construction of CXmlArchive to specify whether you are loading or saving. This is the only variable you have to concern yourself with, as it is the only public one.
  • m_pDom – This is the main XML document pointer, and points to an instance of the IXMLDOMDocument interface.
  • m_bIsOk – ‘true’ if m_pDom points to a successful creation instance of DOMDocument, ‘false’ otherwise.
  • m_bCompress – ‘true’ if the XML data is to be compressed before saving or uncompressed during loading. ‘false’ if the raw XML text is to be used.
  • m_eState – The saving/loading state as passed in by you to the CXmlArchive constructor. See the declaration of eState above for the valid values.
  • m_sFileName – The filename to save or load as passed in by you to the CXmlArchive constructor.
  • m_sError – If an error occurs, then this string holds the error information.

Please refer to the Word documentation for the full documentation for this class.

Global stuff

STLSerializeVariantType

template <bool bCallReserve, class T>bool STLSerializeVariantType(    MSXML::IXMLDOMNodePtr pCurNode,                T &lst,                CXmlArchive &archive,                const char *name)

Save or load an STL collection that contains CHsVariant compatible types. The parameters are as follows:

  • bCallReserve – ‘true’ if reserve should be called on the STL container during a load to pre-allocate memory.
  • pCurNode – The current DOM node.
  • lst – This is the STL container.
  • archive – The CXmlArchive class.
  • name – The name of the STL container.

Please also see the CReserve template described below as this is used within this function.

STLSerializeClassType

template<class T>bool STLSerializeClassType(    MSXML::IXMLDOMNodePtr pCurNode,            T &lst,            CXmlArchive &archive,            const char *name)

Save or load an STL collection that contains classes that declare DECLARE_XML_SERIAL. The parameters are as follows:

  • bCallReserve – ‘true’ if reserve should be called on the STL container during a load to pre-allocate memory.
  • pCurNode – The current DOM node.
  • lst – This is the STL container.
  • archive – The CXmlArchive class.
  • name – The name of the STL container.

Please also see the CReserve template described below as this is used within this function.

CReserve

This template was created to facilitate calling reserve on a vector to pre-allocate memory. However, passing in ‘true’ or ‘false’ to STLSerializeVariantType or STLSerializeClassType to make it call reserve or not would cause a compiler error if passing in an STL container that did not provide this function. For example, the following would not compile if passing in a non std::vector container:

if(bCallReserve)    lst.reserve(nCount);

This is because the call to lst.reserve() has to be compiled whether bCallReserve is true or not. The way round this problem is with a clever technique using template specialisation. The code for the CReserve is as follows:

template <int v>struct Int2Type{    enum { value = v };};template <bool b, class T>class CReserve{private:    static void reserve(T &lst, const int &n,                         Loki::Int2Type<true>)    { lst.reserve(n); }    static void reserve(T &lst, const int &n,                         Loki::Int2Type<false>)    { (void)lst; (void)n; }public:    static void reserve(T &lst, const int &n)    { reserve(lst, n, Loki::Int2Type<b>()); }};

The call to STLSerializeVariantType has a template parameter bCallReserve, which is a bool. This is ‘true’ if reserve should be called on the STL container, ‘false’ to not even compile a call to reserve. CReserve is used as follows:

CReserve<bCallReserve, T>::reserve(lst, nCount);

An instance of CReserve is not necessary as all the functions are static, hence we can call straight in to the public reserve function. This public function calls one of the two private reserve functions depending on the template parameter b. Template specialisation is performed for the two private reserve functions, one is ‘true’, the other ‘false’. This causes the compiler to only compile the required function.

For example:

  • If we have a std::vector, we want to pass ‘true’ to call reserve on the container before populating it.
std::vector<int> intList;…    // Populate intListSERIALIZE_XML_STL_VARIANT(intList, true);

This expands to:

std::vector<int> intList;…    // Populate intListHS::STLSerializeVariantType<true>(pCurNode, intList, archive, "intList");

CReserve then compiles as:

template <bool b, class T>class CReserve{private:    static void reserve(T &lst,            const int &n, Loki::Int2Type<true>)    { lst.reserve(n); }public:    static void reserve(T &lst, const int &n)    { reserve(lst, n, Loki::Int2Type<b>()); }};

So the call to:

CReserve<true, T>::reserve(lst, nCount);

Will call:

intList.reserve(nCount);
  • If we have a std::list, we want to pass ‘false’ so as not to call reserve on the container as this function does not exist.
std::list<int> intList;…    // Populate intListSERIALIZE_XML_STL_VARIANT(intList, false);

This expands to:

std::list<int> intList;…    // Populate intListHS::STLSerializeVariantType<false>(pCurNode, intList, archive, "intList");

CReserve then compiles as:

template <bool b, class T>class CReserve{private:    static void reserve(T &lst,            const int &n, Loki::Int2Type<false>)    { (void)lst; (void)n; }public:    static void reserve(T &lst, const int &n)    { reserve(lst, n, Loki::Int2Type<b>()); }};

So the call to:

CReserve<true, T>::reserve(lst, nCount);

Will call:

(void)lst; (void)n;

This does nothing. It just pretends to use the lst and n parameters so as not to get a compiler warning when using level 4 error detection.

Int2Type

Just a quick note about the Int2Type template. It converts each integral constant into a unique type. Invocation: Int2Type<V>, where V is a compile-time constant integral. Defines 'value', an enum that evaluates to V. This class was designed by Andrei Alexandrescu who wrote the "Modern C++ Design: Generic Programming and Design Patterns Applied" book published by Addison-Wesley. The Loki library is free, but undocumented, so you really need the book which is well worth the price anyway. This book fully describes the Loki library and the design patterns used behind it. Fundamentally, this book demonstrates ‘generic patterns’ or ‘pattern templates’ as a powerful new way of creating extensible designs in C++. A new way to combine templates and patterns that we may never have dreamt as possible, but is. If your work involves C++ design and coding, you should read this book. Highly recommended… Loki can be freely downloaded from here, or just Google for it.

CXmlCompression

Compresses and decompresses the XML data provided by CXmlArchive. Again, please refer to the Word documentation for the full documentation for this class.

CHsZlibFile

This class is a simple wrapper class for the gzFile zlib file handler functions. It makes sure that the file is closed upon destruction. Again, please refer to the Word documentation for full documentation for this class.

Files

Here is a list of the XML serialisation files, in alphabetical order:

Filename Brief description Usage HsAssert.h Provides ASSERT capability. Will either use MFC's ASSERT if defined, or otherwise uses assert. Also declares TRUE, FALSE, and NULL, if not already defined. All HsBuffer.h and .cpp Provides a buffer that will automatically be de-allocated during destruction. Compression HsCreationPolicy.h Not used. Provides the following creation templates: CreateUsingNew, CreateUsingMalloc, CreateStatic. None HsVariant.h and .cpp Extends _variant_t for extra types All HsXmlArchive.h and .cpp Provides an XML serialisation mechanism for classes and STL containers. Also supports MFC. All HsXmlCompression.h and .cpp Compresses and decompresses the XML data provided by CXmlArchive. Compression HsZlibFile.h and .cpp This class is a simple wrapper class for the gzFile zlib file handler functions. It makes sure that the file is closed upon destruction. Compression TypeManip.h Part of the Loki library written by Andrei Alexandrescu. Provides the class template Int2Type. All

Testing

There are several test programs supplied with the XML serialisation classes. These are:

  • MFC demo – Saves and loads MFC objects and data. It also saves classes that contain STL data.
  • Non MFC demo. – Saves and loads a simple class.
  • STL demo – Saves and loads a std::vector and a std::list which contain classes and std::strings.
  • Compression simple – Similar to the non-MFC demo, except the XML is compressed using zlib.
  • Compression complex – Similar to the MFC demo, except the XML is compressed using zlib.

The demo classes all perform the same way in order to validate and test the XML serialisation mechanism. They each:

  • Initialise the class data to be saved.
  • Save the XML output to "objects.xml". This XML output is also retrieved and stored in the xml_saved string for validation purposes.
  • Class data is destroyed.
  • Reads in the XML file named “objects.xml" and processes it.
  • Saves the XML output, but not supplying an output filename. This XML output is also retrieved and stored in the xml_loaded string for validation purposes.
  • The xml_saved and xml_loaded strings are compared. They should be identical.

Please note that the compression examples output filename is "objects.gz".

Miscellaneous

Currently, the creation policies are not used, these can be found in HsCreationPolicy.h.

Any comments or improvements please send them to me: Simon Hughes.

Revision Log

Date By Ver. Description 21 Aug 2001 SJH 1.0 First release. 6 Sep 2001 SJH 1.1 Added ZLIB compression. 27 May 2004 SJH 1.2 Coloured the C++ code, and added instructions relating to VC 7 project settings.

Sorry it took so long to publish.

 
原创粉丝点击