The STL+ C++ library
Andy Rushton
Persistence is the ability to dump a data structure to disk and then restore it again later either in the same run of the program, a later run of a program, or even in a different program. It is an easy way to save a program's state or to communicate information in a structural form between programs.
In fact, persistence is not limited to disk dumps, since the same idea can be used to transfer information from one program to another down a pipe or even an Internet connection. In effect you can use persistence to communicate a data structure of any complexity between two programs even if they are running on different computers under different operating systems.
At a more basic level, persistence does away with the need to design file formats. Instead, just design a data structure to carry the information required between the programs and then make that data structure persistent. The file format is designed for you by the persistence subsystem.
The persistent format is a binary format so is extremely efficient both in data size and in CPU time required. For example, text formats tend to be dominated by the processing required to convert integer valued between their machine form (2's-complement binary) and the text form (sign-magnitude decimal). This problem doesn't occur with the persistence format which dumps and restores in the native binary form.
The purpose of the data persistence subsystem is to provide a toolkit which makes it easy if not trivial to make a data structure persistent. However, it is not totally automatic - C++ is too flexible a language to be able to take any data structure and just dump it. This is why the approach has been to provide a toolkit out of which persistence routines can be written.
The toolkit provides a set of functions for dumping and restoring a wide
range of types. All the basic C types are made persistent, as are C++ types
like string and complex. However, the real power of the
persistence functions is that template functions are provided for making all
of the STL and STLplus container classes persistent.
The idea is that a container is made persistent by dumping its contents
using a dump routine for the contained data type. For example, a
vector of strings is dumped by dumping vector-specific
information and then repeatedly calling the
dump routine for string. The restore function
restores the vector and then repeatedly calls the restore function for string
to restore the vector's contents.
The same concept is applied to all the container classes. Therefore, to
make a container persistent, all you have to do is supply dump and
restore functions for the contained type. If the contained type is a
basic C or C++ type, then these functions are already provided and the data
structure is already persistent.
The dump operation is controlled by an object of type dump_context which is
defined in persistent.hpp. This is
initialised with an output device (any derivative of otext) and is then passed down through the hierarchy of
dump routines. At the end of the dump, the output device can be checked to see
if an output error occurred. Here's a typical example of how to dump a data
structure:
oftext output(filename); dump_context dumper(output); dump(dumper, data);
In this example, you can see how an output file is created (oftext
is an output file device - see fileio). Then the
dump_context object is initialised with this output device. Then the
dump function for the data structure is called. The output device
should be closed, but this will be done by its destructor.
Similarly, a restore operation is controlled by a restore_context
object. This is initialised with an input device. Here's an example of how to
restore a data structure:
iftext input(filename); restore_context restorer(input); restore(restorer, data);
The TextIO device must be in binary mode for persistence to work correctly. The context object automatically places the TextIO device into binary mode when it is passed to the context's constructor, so you don't have to worry about that issue. Be careful however to ensure that binary mode is used if you transmit dumped data over networks - some programs (such as FTP) may try to convert data that looks like line endings to the 'correct' form for the operating system - corrupting the persistent data irretrievably.
To start with, I'll demonstrate how to dump and restore a simple data type containing only simple C types. The following class will be used for the demonstration:
class point
{
private:
int m_x;
int m_y;
int m_z;
public:
...
}
The required parameter profile of the dump/restore functions is:
void dump(dump_context&, const type&); void restore(restore_context&, type&);
These functions should be declared as
stand-alone functions and not methods. In this case this will be done by
making them friends of the class, meaning they are not methods but can access
the data members even though the members are declared as private.
So, here is the point class with the persistence functions'
declarations added:
class point
{
private:
int m_x;
int m_y;
int m_z;
public:
...
friend void dump(dump_context& context, const point& pt);
friend void restore(restore_context& context, point& pt);
}
The dump and restore functions are written using the
existing dump and restore functions for int, the
type used for the three dimensions of a point:
void dump(dump_context& context, const point& pt)
{
dump(context,pt.m_x);
dump(context,pt.m_y);
dump(context,pt.m_z);
}
void restore(restore_context& context, point& pt)
{
restore(context,pt.m_x);
restore(context,pt.m_y);
restore(context,pt.m_z);
}
Note that
neither the dump nor the restore actually do any file I/O
themselves, it is all delegated to the pre-written functions provided in persistent.hpp for type
int.
Enumeration types are essentially small integers. However, each type is considered to be a different type by the compiler - so therefore they are not actually be treated as simple integer types - you get a compilation error. The solution that I supply is a pair of template functions that adapt themselves to the type of the enum being made persistent. The functions are:
template<typename T> void dump_enum(dump_context& str, const T& data) throw(persistent_dump_failed); template<typename T> void restore_enum(restore_context& str, T& data) throw(persistent_restore_failed);
Consider the following example. The enum defines a traffic light sequence:
enum traffic_lights {red, red_amber, green, amber};
This can be used with dump_enum and restore_enum directly, but is is better style to write
dump and restore functions that call the template
functions, thus hiding the use of the template:
void dump(dump_context& context, const traffic_lights& lights)
{
dump_enum(context, lights);
}
void restore(restore_context& context, traffic_lights& lights)
{
restore_enum(context, lights);
}
A real data structure of course has many layers. The persistence functions
are designed to be used in a layered way. The dump/restore
functions written above can be used stand-alone to dump a single point, but
they can also be used to dump a point stored as part of a different data
structure. In this way, dump and restore routines can be
built up a layer at a time.
The example will represent an edge as two points:
class edge
{
private:
point m_begin;
point m_end;
public:
...
friend void dump(dump_context& context, const edge& pt);
friend void restore(restore_context& context, edge& pt);
};
Once again the dump/restore functions can be written in terms of the
dump/restore functions for the data members:
void dump(dump_context& context, const edge& e)
{
dump(context,e.m_begin);
dump(context,e.m_end);
}
void restore(restore_context& context, edge& e)
{
restore(context,e.m_begin);
restore(context,e.m_end);
}
In this case, to dump an edge means dumping two points
which uses the dump function for the point class written in
the last section. This layering can be continued ad infinitum.
The template classes provided by the STL and the template classes provided
by STLplus have been made persistent using template
dump/restore functions. Because of problems with overloading
of template functions in Visual C++, the functions are actually called
dump_class and restore_class where
class is the name of the template class. For example, the persistence
functions for the STL map are called dump_map and
restore_map.
The persistence functions for templates are themselves templates, so are
automatically adapted to the type that the container holds. For example,
dump_vector which is the dump routine for the STL vector,
will adapt to the type being held in the vector. If the
vector contains int, then the dump_vector function
will dump ints by calling the dump function defined for
int. If the vector contains edges (defined in the
last section) then the dump_vector function will dump edges.
The template function requires that there is a function called dump
for the element type of the vector. If there isn't one already, you
need to write one.
To demonstrate, a vector of edges will be used. In this
case we need a dump function for a single edge. This has
already been written in the last section. Therefore, the dump
function for a vector of edges is very simple to write, as
is the restore function:
void dump(dump_context& context, const vector<edge>& e)
{
dump_vector(context, e);
}
void restore(restore_context& context, vector<edge>& e)
{
restore_vector(context, e);
}
I have not been able to implement a general solution to the problem of persistent iterators for STL templates. However, I have added persistence to the iterators for the STLplus template classes ntree and digraph.
The ntree class has three types of iterator - a simple iterator and the traversal iterators prefix_iterator and postfix_iterator. All of these have been made pwersistent by the addition of template dump and restore functions:
// simple iterators template<typename T, typename TRef, typename TPtr> void dump_ntree_iterator(dump_context&, const ntree_iterator<T,TRef,TPtr>&) throw(persistent_dump_failed); template<typename T, typename TRef, typename TPtr> void restore_ntree_iterator(restore_context&, ntree_iterator<T,TRef,TPtr>&) throw(persistent_restore_failed); // prefix iterators template<typename T, typename TRef, typename TPtr> void dump_ntree_prefix_iterator(dump_context&, const ntree_prefix_iterator<T,TRef,TPtr>&) throw(persistent_dump_failed); template<typename T, typename TRef, typename TPtr> void restore_ntree_prefix_iterator(restore_context&, ntree_prefix_iterator<T,TRef,TPtr>&) throw(persistent_restore_failed); // postfix iterators template<typename T, typename TRef, typename TPtr> void dump_ntree_postfix_iterator(dump_context&, const ntree_postfix_iterator<T,TRef,TPtr>&) throw(persistent_dump_failed); template<typename T, typename TRef, typename TPtr> void restore_ntree_postfix_iterator(restore_context&, ntree_postfix_iterator<T,TRef,TPtr>&) throw(persistent_restore_failed);
As with other template classes, the convention is to write dump/restore functions for a specific template instantiation in terms of these template functions. For example, given a tree of strings, the following functions would be used to make the iterator persistent:
void dump(dump_context& context, const ntree<string>::iterator& i)
{
dump_ntree_iterator(context, i);
}
void restore(restore_context& context, ntree<string>::iterator& i)
{
restore_ntree_iterator(context, i);
}
There is a restriction: the ntree must be dumped before any iterators are dumped - if not, an exception will be thrown.
Similarly, digraph node and arc iterators are made persistent by the following template functions:
// node iterators template<typename NT, typename AT, typename NRef, typename NPtr> void dump_digraph_iterator(dump_context& str, const digraph_iterator<NT,AT,NRef,NPtr>& data) throw(persistent_dump_failed); template<typename NT, typename AT, typename NRef, typename NPtr> void restore_digraph_iterator(restore_context& str, digraph_iterator<NT,AT,NRef,NPtr>& data) throw(persistent_restore_failed); // arc iterators template<typename NT, typename AT, typename NRef, typename NPtr> void dump_digraph_arc_iterator(dump_context& str, const digraph_arc_iterator<NT,AT,NRef,NPtr>& data) throw(persistent_dump_failed); template<typename NT, typename AT, typename NRef, typename NPtr> void restore_digraph_arc_iterator(restore_context& str, digraph_arc_iterator<NT,AT,NRef,NPtr>& data) throw(persistent_restore_failed);
There same restriction applies: the digraph must be dumped before any iterators are dumped - if not, an exception will be thrown.
Pointers are a special problem for persistent data types because there may be more than one pointer to the same object in a data structure. If this was dumped in a naive way, there would be two identical copies of the object in the dump, rather than one object and two pointers to it. It would also be impossible to dump a structure with back pointers because the dump mechanism would get stuck in an infinite recursion. The key to dumping such a structure is to determine the primary structure and dump that. Then, in a second pass, dump secondary links such as back pointers and cross links.
The dump function for a pointer will dump the contents of the
pointer on the first visit to the object along with a unique magic key that
identifies the object pointed to. The second time the object is visited (for
example due to a back pointer), only the magic key is dumped. On restoration,
on restoring the object itself, the object is added to a map along
with its magic key. When the magic key is found again in the input stream, it
is converted by the map into a pointer to the restored object.
The importance of dumping the primary structure first should be clear from this - dumping the primary links first causes the data structures to be dumped in this pass since each object will be visited for the first time. When back pointers or cross pointers are dumped, all the objects they are pointing to have already been dumped so only magic keys get dumped.
There is a template function pair
dump_pointer/restore_pointer which implements this
algorithm. It assumes that a pointer points to a single object (for example,
an int* points to an int). Pointers to arrays of objects
cannot be supported in this way and will need to be hand-implemented (you need
to know the size of the array as well to be able to dump it). You should be
using vectors anyway!
The one exception is char* which is treated as a null
terminated array and not a pointer to a single char. The
char* persistence functions are not templates and so have the
simple names dump/restore. Multiple pointers to the
same char array will be dumped once and the same magic key method
used as for pointers to other types.
Thus, dump_pointer will dump a magic key to the file and then, if
this is the first visit, it will call dump on the object being
pointed to. You need to provide that dump function if it doesn't
already exist. Similarly, the restore function restores the magic
key, checks to see if it is a new key and if it is it restores the contents of
the pointer. If it is an already-restored key, then it is simply mapped onto
its target object.
The STLplus smart pointer classes are a special case of template container classes in that they contain pointers to objects, whereas most template containers contain objects themselves. Therefore, persistence of smart pointers is implemented by calling the persistence functions for pointers.
There are two interpretations of pointers though: a simple pointer to an
object of a known type and a polymorphic pointer which has the type of a
pointer to a superclass but which can in fact point to any subclass of the
pointer type. These two interpretations are handled by two variants of the
smart pointer classes: the smart_ptr variant is intended for use
with simple pointers and so uses the persistence functions for simple pointers
(see Persistence of Pointers), whereas the
smart_ptr_clone variant, which is designed for pointing to
polymorphic types, uses the persistence for polymorphic pointers (see Persistence of Polymorphic Classes using Interfaces).
In C++ you can define a superclass and then derive subclasses from it (some people prefer the terminology base class for superclass and derived class for subclass). This set of classes based on a common superclass is referred to as a set of Polymorphic classes.
Polymorphic classes are manipulated through a pointer to the superclass. The pointer can then point to any object of any subclass of the common superclass. Subclass-specific operations are provided through the use of virtual functions.
If this is still making no sense, you need to read a book on C++ since the purpose of this document is to explain the STLplus, not to teach C++ basics. Otherwise, the rest of this section is on how to make Polymorphic classes persistent.
Polymorphic classes represent a problem for persistence. So far all the persistence functions have used knowledge of the exact type of the object at compile time to select the correct overloaded dump or restore function. However, with polymorphism, only the superclass is known from the type of the pointer. The actual subclass being pointed to is unknown at compile time and must be determined at run time. This means that run-time type information must be used to determine the type. This is usually achieved by defining virtual methods.
This is the solution used to make polymorphic types persistent - although there is an alternative implementation that uses callback functions instead which is described in the next section.
The set of virtual functions used to make a class persistent is defined by an
interface called persistent. To
make a polymorphic class persistent, the first stage is to derive the base
class of your family of polymorphic classes from this interface.
class base : public persistent
The persistent interface defines two abstract methods that you
must provide for all subclasses to be made persistent:
class persistent : public clonable
{
public:
virtual void dump(dump_context&) const throw(persistent_dump_failed) = 0;
virtual void restore(restore_context&) throw(persistent_restore_failed) = 0;
};
However, you can see that this in turn inherits the clonable
interface which allows copying of polymorphic types:
class clonable
{
public:
virtual clonable* clone(void) const = 0;
};
This method is also required by the
smart_ptr_clone container which is also
used to store polymorphic classes, so once you've made a class persistent,
you've automatically made it suitable for use in this smart pointer.
In order to demonstrate the way polymorphic classes are made persistent, consider the following noddy example:
class base
{
int m_value;
public:
base(int value = 0) : m_value(value) {}
virtual ~base(void) {}
virtual int value (void) const {return m_value;}
virtual void set(int value = 0) {m_value = value;}
};
class derived : public base
{
string m_image;
public:
derived(int value = 0) : base(value), m_image(to_string(value)) {}
derived(string value = string()) : base(to_int(value)), m_image(value) {}
virtual ~derived(void) {}
virtual void set(int value = 0) {m_image = to_string(value); base::set(value);}
};
In order to make these two classes persistent, the base class must inherit
from the persistent interface and then both classes must have the
three abstract methods clone, dump and
restore added.
Here's these classes with the additions:
class base : public persistent
{
int m_value;
public:
base(int value = 0) : m_value(value) {}
virtual ~base(void) {}
virtual int value (void) const {return m_value;}
virtual void set(int value = 0) {m_value = value;}
clonable* clone(void) const
{
return new base(*this);
}
void dump(dump_context& context) const throw(persistent_dump_failed)
{
::dump(context,m_value);
}
void restore(restore_context& context) throw(persistent_restore_failed)
{
::restore(context,m_value);
}
};
class derived : public base
{
string m_image;
public:
derived(int value = 0) : base(value), m_image(to_string(value)) {}
derived(string value) : base(to_int(value)), m_image(value) {}
virtual ~derived(void) {}
virtual void set(int value = 0) {m_image = to_string(value); base::set(value);}
clonable* clone(void) const
{
return new derived(*this);
}
void dump(dump_context& context) const throw(persistent_dump_failed)
{
base::dump(context);
::dump(context,m_image);
}
void restore(restore_context& context) throw(persistent_restore_failed)
{
base::restore(context);
::restore(context,m_image);
}
};
Note the use of a common trick here. The subclass derived dumps its
superclass by simply calling the superclass's dump method (in this case,
base::dump). This is in keeping with the general C++ convention that subclasses
should not use knowledge of the internals of the superclass. This convention
is easy to follow: call the dump/restore method of the immediate superclass
of the subclass first, then dump/restore the subclass-specific data.
The solution for persistence of Polymorphic classes requires that every derivative class be registered with the dump_context or restore_context before the dump or restore operation commences. Furthermore, where there are many polymorphic types being handled, the order of registration must be the same for the restore operation as it was for the dump operation.
Consider first the dump operation. The dump_context class provides
the following method for registration:
unsigned short dump_context::register_interface(const std::type_info& info);
This is called once for each polymorphic type to be dumped. So, for the example above it is called twice:
dump_context context(output);
context.register_interface(typeid(base));
context.register_interface(typeid(derived));
The typeid operator is built-in to C++ and provides a means of
getting the type name from a type or expression as a char*. This is mapped
internally onto a magic key which is an integer value unique to that subclass.
The return value of
the register_type method is the magic key for that type and
is used in the dump to differentiate between the different classes. There's no
real reason for capturing this key except maybe for debugging the data
stream. Keys are allocated in the order of registration of class types. This
is why class types must be registered in the same order for both the dump and
restore operations.
For the restore operation it is necessary to register a sample object of the
class. This is because the restore operation creates objects of the class by
cloning the sample. The sample is stored in a
smart_ptr_clone:
typedef smart_ptr_clone<persistent> persistent_ptr;
The restore_context
class provides the following registration function:
unsigned short restore_context::register_interface(const persistent_ptr&);
The objects are registered in the same order as the types were registered
into the dump context, because it is this ordering that provides the mapping
from the unique key used in the dump to the correct sample object used in
the restore. During the dump, the class base was registered first,
then class derived. The sample objects are therefore registered in the
same order for the restore:
restore_context context(input);
context.register_interface(base());
context.register_interface(derived());
An alternative way of registering these interfaces is to wrap their registration up in an installer function. This installer can then be used to install all classes in a single step.
In fact, two installer functions are required - one for dumping and one for restoring. It is up to you to check that these installer functions install their callbacks in the same order. The type profiles for these installer functions are:
void (*dump_context::installer)(dump_context&); void (*restore_context::installer)(restore_context&);
In other words, the installer type for a dump_context is a pointer to a function that takes a dump_context& and returns void. Similarly the installer type for a restore_context is a pointer to a function that takes a restore_context& and returns void. For the above example they might look like this:
void make_base_persistent(dump_context& context)
{
context.register_interface(typeid(base));
context.register_interface(typeid(derived));
}
void make_base_persistent(restore_context& context)
{
context.register_interface(base());
context.register_interface(derived());
}
The functions can be called whatever you like, but I prefer to give them the same name and use overload resolution to pick the right one according to the type profile. In use, after creating a dump or restore context, call the method register_all with the above installer as the argument. For example, using the earlier example again, rewritten to use an installer:
dump_context context(output);
context.register_all(make_base_persistent);
Now that the classes are registered, the actual dump and restore of a superclass pointer is handled by the following functions:
template<typename T>void dump_interface(dump_context& str, const T*& data); template<typename T>void restore_interface(restore_context& str, T*& data);
For example, given the above example using classes base and
derived, specific dump and restore functions can be
written that simply call the above template functions:
void dump(dump_context& context, const base*& ptr)
{
dump_interface(context,ptr);
}
void restore(restore_context& context, base*& ptr)
{
restore_interface(context,ptr);
}
Note: since polymorphic types are handled in C++ via pointers, the same behaviour is implemented for multiple pointers to the same object as was implemented for simple pointers. When two pointers to the same object are dumped, they will be restored as pointers to the same object.
Alternatively, a smart_ptr_clone
can be used. This class is specifically designed to point to a polymorphic
type which uses the clonable interface. Furthermore, the
persistence functions for smart_ptr_clone call the persistence
functions for polymorphic types using the clonable interface. For
example, say you have the following type declarations:
typedef smart_ptr_clone<base> base_ptr; typedef vector<base_ptr> base_vector;
These types can be made persistent in the usual way, by creating layers of functions
called dump and restore building up from the
low-level contained type to the composite type by calling the template
functions for vector and smart_ptr_clone.
We already have persistence of base* handled by the callbacks installed
above. To support smart_ptr_clone<base> which contains a
base* is simply a case of writing a function that calls the
template dump/restore for the smart pointer class:
void dump(dump_context& context, const base_ptr& ptr)
{
dump_smart_ptr_clone(context,ptr);
}
void restore(restore_context& context, base_ptr& ptr)
{
restore_smart_ptr_clone(context,ptr);
}
The final stage is to make a vector of these persistent:
void dump(dump_context& context, const base_vector& vec)
{
dump_vector(context,vec);
}
void restore(restore_context& context, base_vector& vec)
{
restore_vector(context,vec);
}
The previous section described how polymorphic types could be made
persistent in an object-oriented way through inheritance and virtual methods.
However, it is not always possible to use this approach. For example, you
might want to make a class persistent that you cannot change. Therefore an
alternative solution is needed that uses a non-intrusive approach to
persistence. In order to achieve this non-intrusive approach, I have provided
the option to use dump and restore callbacks to
perform the persistence functionality and not
virtuals. The callbacks are associated with the subclass, which can be
determined at run time. The callbacks are stored in the
dump_context object during the dump and in the
restore_context object during a restore.
However, this is still not a complete solution. During restore, it is
necessary to create an object of the right subclass before its restore
callback can be called. There is no concept of a virtual constructor in C++,
nor is there a means of creating an object of any type from, say, the name of
the type. The solution uses create callbacks rather than sample objects. A
create callback is a function that, when called, creates an object and
returns a pointer to it. In order to make the method as general as possible,
the create callback returns this pointer as a void*.
Thus, the non-intrusive solution to persistence of polymorphic types requires no changes to existing classes - no extra virtual functions for example. However, the cost of this solution is that it does require three callback functions to be written for each subclass to be made persistent.
In order to demonstrate the way polymorphic classes are made persistent, consider the following noddy example:
class base
{
int m_value;
public:
base(int value = 0) : m_value(value) {}
virtual ~base(void) {}
virtual int value (void) const {return m_value;}
virtual void set(int value = 0) {m_value = value;}
};
class derived : public base
{
string m_image;
public:
derived(int value = 0) : base(value), m_image(to_string(value)) {}
derived(string value = string()) : base(to_int(value)), m_image(value) {}
virtual ~derived(void) {}
virtual void set(int value = 0) {m_image = to_string(value); base::set(value);}
};
In order to make these two classes persistent, each one must have three
callbacks added. These callbacks can be completely separate from the classes
if it is not possible to change the class definitions, but
typically it is easier to add the functions as friends of the class so that
they have direct access to the data fields. The three functions are the
create, dump and restore callbacks. The convention is to call them
create_class, dump_class and
restore_class, where class is the name of the
class that they act on.
The parameter profiles of the three callbacks is:
void dump_class(dump_context& context, const void* data) void* create_class(void) void restore_class(restore_context& context, void*& data)
For this example, these functions are added to the classes as friends:
class base
{
...
friend void dump_base(dump_context& context, const void* data)
{
dump(context,((const base*)data)->m_value);
}
friend void* create_base(void)
{
return new base;
}
friend void restore_base(restore_context& context, void*& data)
{
restore(context,((base*)data)->m_value);
}
};
class derived
{
...
friend void dump_derived(dump_context& context, const void* data)
{
dump_base(context,data);
const derived* derived_data = (const derived*)data;
dump(context,derived_data->m_image);
}
friend void* create_derived(void)
{
return new derived;
}
friend void restore_derived(restore_context& context, void*& data)
{
restore_base(context,data);
derived* derived_data = (derived*)data;
restore(context,derived_data->m_image);
}
};
Note the use of a common trick here. The subclass derived dumps its
superclass by simply calling the superclass's callback (in this case,
dump_base). This is in keeping with the general C++ convention that subclasses
should not use knowledge of the internals of the superclass. This convention
is easy to follow: call the dump/restore callback of the immediate superclass
of the subclass first, then dump/restore the subclass-specific data.
The solution for persistence of Polymorphic classes requires that every polymorphic class be registered with the dump_context or restore_context before the dump or restore operation commences. Furthermore, where there are many polymorphic types being handled, the order of registration must be the same for the restore operation as it was for the dump operation.
Consider first the dump operation. The dump_context class provides
the following method for registration:
unsigned short dump_context::register_type(const std::type_info& info, dump_callback);
This is called once for each polymorphic type to be dumped. So, for the example above it is called twice:
dump_context context(output);
context.register_type(typeid(base),dump_base);
context.register_type(typeid(derived),dump_derived);
The typeid operator is built-in to C++ and provides a means of
getting the type name from a type or expression as a char*. This is mapped
internally onto a magic key which is an integer value unique to that subclass.
The return value of
the register_type method is the magic key for that type and
is used in the dump to differentiate between the different classes. There's no
real reason for capturing this key except maybe for debugging the data
stream. Keys are allocated in the order of registration of class types. This
is why class types must be registered in the same order for both the dump and
restore operations.
For the restore operation it is necessary to register both a create callback
and a restore callback with the restore context. The restore_context
class provides the following registration function:
unsigned short restore_context::register_type(create_callback,restore_callback);
The callbacks are registered in the same order as the types were registered
into the dump context, because it is this ordering that provides the mapping
from the unique key used in the dump to the correct create callback used in
the restore. During the dump, the class base was registered first,
then class derived. The callbacks are therefore registered in the
same order for the restore:
restore_context context(input);
context.register_type(create_base,restore_base);
context.register_type(create_derived,restore_derived);
An alternative way of registering these callbacks is to wrap their registration up in an installer function. This installer can then be used to install all callbacks in a single step.
In fact, two installer functions are required - one for dumping and one for restoring. It is up to you to check that these installer functions install their callbacks in the same order. The type profiles for these installer functions are:
void (*dump_context::installer)(dump_context&); void (*restore_context::installer)(restore_context&);
In other words, the installer type for a dump_context is a pointer to a function that takes a dump_context& and returns void. Similarly the installer type for a restore_context is a pointer to a function that takes a restore_context& and returns void. For the above example they might look like this:
void make_base_persistent(dump_context& context)
{
context.register_type(typeid(base),dump_base);
context.register_type(typeid(derived),dump_derived);
}
void make_base_persistent(restore_context& context)
{
context.register_type(create_base,restore_base);
context.register_type(create_derived,restore_derived);
}
The functions can be called whatever you like, but I prefer to give them the same name and use overload resolution to pick the right one according to the type profile. In use, after creating a dump or restore context, call the method register_all with the above installer as the argument. For example, using the earlier example again, rewritten to use an installer:
dump_context context(output);
context.register_all(make_base_persistent);
Now that the callbacks are registered, the actual dump and restore of a superclass pointer is handled by the following functions:
template<typename T>void dump_polymorph(dump_context& str, const T*& data); template<typename T>void restore_polymorph(restore_context& str, T*& data);
For example, given the above example using classes base and
derived, specific dump and restore functions can be
written that simply call the above template functions:
void dump(dump_context& context, const base*& ptr)
{
dump_polymorph(context,ptr);
}
void restore(restore_context& context, base*& ptr)
{
restore_polymorph(context,ptr);
}
Note: since polymorphic types are handled in C++ via pointers, the same behaviour is implemented for multiple pointers to the same object as was implemented for simple pointers. When two pointers to the same object are dumped, they will be restored as pointers to the same object.
There is a set of template functions defined in the persistence.hpp header that encapsulate a common use of persistence. The functions assume that you have built up your family of dump and restore functions so that an entire data structure can be dumped by simply calling a function called dump at the top level. Similarly the data structure can be restored by simply calling restore at the top level. The shortcut functions also support the use of an installer function as described in the section on Polymorphic types. This reduces the process of dumping to common targets to a one-line function call.
Probably the most useful shortcut functions are the pair
dump_to_file/restore_from_file:
template<typename T> void dump_to_file(const T& source, const std::string& filename, dump_context::installer installer) throw(persistent_dump_failed,persistent_illegal_type); template<typename T> void restore_from_file(const std::string& filename, T& result, restore_context::installer installer) throw(persistent_restore_failed,persistent_illegal_type);
To dump a data structure to a file, simply call dump_to_file
with the first argument being the source data structure to be dumped, the
second argument being the name of the file to dump to and the final argument
being an installer function for registering any polymorphic types. The last
argument can be null if there are no polymorphic types to register.
Similarly, to restore the same data structure, simply call
restore_from_file with the name of the file as the first argument
(conceptually, the first argument is the source and the second the
destination) and the data structure to be restored as the second. Again the
third argument is an installer function for restoring polymorphic types and
may be null.
Here's an example that dumps and restores a vector of string to and from a
file. First, I need to write a dump/restore pair of
functions that make a vector of string persistent:
void dump(dump_context& context, const vector<string>& data)
{
dump_vector(context, data);
}
void restore(restore_context& context, vector<string>& data)
{
restore_vector(context, data);
}
Now here's a trivial application that takes the command-line arguments represented by argv and puts them into a vector of strings, then dumps them to a file:
int main (unsigned argc, char* argv[])
{
if (argc == 1)
ferr << "usage: " << argv[0] << " <strings>" << endl;
else
{
vector<string> source;
for (unsigned i = 1; i < argc; i++)
source.push_back(string(argv[i]));
dump_to_file(source, "strings.dat", 0);
}
return 0;
}
Here's a complementary application that restores the file and prints the results to standard output:
int main (unsigned argc, char* argv[])
{
if (argc != 1)
ferr << "usage: " << argv[0] << endl;
else
{
vector<string> copy;
restore_from_file("strings.dat", copy, 0);
fout << "restored text: " << vector_to_string(copy, ",") << endl;
}
return 0;
}
Sometimes you want to create an in-memory dump of a data structure rather than dumping to a file. For example, this would be a starting point for a routine for transferring a data structure across the internet using data persistence as the mechanism. This is done by dumping to and restoring from a string:
template<typename T> void dump_to_string(const T& source, std::string& result, dump_context::installer installer) throw(persistent_dump_failed,persistent_illegal_type); template<typename T> void restore_from_string(const std::string& source, T& result, restore_context::installer installer) throw(persistent_restore_failed,persistent_illegal_type);
This is very similar to the previous section's file-based persistence, except that the target of the dump is the string itself. Note that the std::string class is capable of storing binary data since it does not rely on null termination to work properly. A C char* could not be used in this way (but its obsolete anyway, so no worries mate).
To dump a data structure to a string, simply call dump_to_string
with the first argument being the source data structure to be dumped, the
second argument being the string to dump to and the final argument
being an installer function for registering any polymorphic types. The last
argument can be null if there are no polymorphic types to register.
Similarly, to restore the same data structure, simply call
restore_from_string with the string containing the dumped data as the first argument
and the data structure to be restored as the second. Again the
third argument is an installer function for restoring polymorphic types and
may be null.
To illustrate this, I'll use the same example as above for file-based
persistence. This example dumps and restores a vector of string to and from a
string. Since I've already written the dump/restore pair of
functions for the previous example, there's no need to do it again.
Now here's a trivial application that takes the command-line arguments represented by argv and puts them into a vector of strings, then dumps them to a string, restores them from that string and finally compares them to confirm that the two data structures are identical:
int main (unsigned argc, char* argv[])
{
if (argc == 1)
ferr << "usage: " << argv[0] << " <strings>" << endl;
else
{
vector<string> source;
for (unsigned i = 1; i < argc; i++)
source.push_back(string(argv[i]));
string binary;
dump_to_string(source, binary, 0);
vector<string> copy;
restore_from_string(binary, copy, 0);
if (source != copy)
ferr << "ERROR - restored data is different" << endl;
else
ferr << "success - restored data is the same" << endl;
}
return 0;
}
The above two short-cuts are in fact specialisations of the most general short-cut functions that dump to and restore from any TextIO device. This more general form is useful if you want to use other I/O devices than the most common ones of files and in-memory strings.
The functions are:
template<typename T> void dump_to_device(const T& source, otext& result, dump_context::installer installer) throw(persistent_dump_failed,persistent_illegal_type); template<typename T> void restore_from_device(itext& source, T& result, restore_context::installer installer) throw(persistent_restore_failed,persistent_illegal_type);
To dump a data structure to an output device, simply call dump_to_device
with the first argument being the source data structure to be dumped, the
second argument being the device to dump to and the final argument
being an installer function for registering any polymorphic types. The last
argument can be null if there are no polymorphic types to register.
Similarly, to restore the same data structure, simply call
restore_from_device with the device containing the dumped data as the first argument
and the data structure to be restored as the second. Again the
third argument is an installer function for restoring polymorphic types and
may be null.
To illustrate this, here's the bodies of the dump_to_file and
restore_from_file functions described earlier to show how they
have in fact been implemented as calls to these two general-purpose
functions:
template<typename T>
void dump_to_file(const T& source, const std::string& filename, dump_context::installer installer)
throw(persistent_dump_failed,persistent_illegal_type)
{
oftext output(filename);
dump_to_device(source, output, installer);
}
template<typename T>
void restore_from_file(const std::string& filename, T& result, restore_context::installer installer)
throw(persistent_restore_failed,persistent_illegal_type)
{
iftext input(filename);
restore_from_device(input, result, installer);
}
So, dump_to_file is implemented by creating a file output device
(class oftext) and then calling dump_to_device. You
can implement your own dump_to_xxx and
restore_from_xxx functions by simply implementing TextIO devices
for input from and output to xxx.
The persistence functions for basic C types and STL containers are defined
in persistence.hpp. as are the infrastructure classes
dump_context/restore_context. The STLplus container
classes have the persistence functions built-in.
The following table details which types have persistence, where the
persistence functions are defined (i.e. which header to include) and the names
of the functions. In this table, uppercase characters such as T are
used to represent template argument types, so that vector<T>
means any vector, with type T as its template parameter.
| Type | Library | Include | Function Names |
|---|---|---|---|
char | C | persistent.hpp | dump/restore |
signed char | C | persistent.hpp | dump/restore |
unsigned char | C | persistent.hpp | dump/restore |
short | C | persistent.hpp | dump/restore |
unsigned short | C | persistent.hpp | dump/restore |
int | C | persistent.hpp | dump/restore |
unsigned | C | persistent.hpp | dump/restore |
long | C | persistent.hpp | dump/restore |
unsigned long | C | persistent.hpp | dump/restore |
inf | STLplus | inf.hpp | dump/restore |
enum{} | C | persistent.hpp | dump_enum/restore_enum |
float | C | persistent.hpp | dump/restore |
double | C | persistent.hpp | dump/restore |
T* (simple) | C/C++ | persistent.hpp | dump_pointer/restore_pointer |
T* (polymorphic) | C++ | persistent.hpp | dump_polymorph/restore_polymorph |
smart_ptr<T> | STLplus | smart_ptr.hpp | dump_smart_ptr/restore_smart_ptr |
smart_ptr_clone<T> | STLplus | smart_ptr.hpp | dump_smart_ptr_clone/restore_smart_ptr_clone |
char* | C | persistent.hpp | dump/restore |
string | STL | persistent.hpp | dump/restore |
basic_string<T> | STL | persistent.hpp | dump_basic_string/restore_basic_string |
bitset<N> | STL | persistent.hpp | dump_bitset/restore_bitset |
complex<T> | STL | persistent.hpp | dump_complex/restore_complex |
deque<T> | STL | persistent.hpp | dump_deque/restore_deque |
list<T> | STL | persistent.hpp | dump_list/restore_list |
vector<T> | STL | persistent.hpp | dump_vector/restore_vector |
pair<T1,T2> | STL | persistent.hpp | dump_pair/restore_pair |
triple<T1,T2,T3> | STLplus | triple.hpp | dump_triple/restore_triple |
foursome<T1,T2,T3,T4> | STLplus | foursome.hpp | dump_foursome/restore_foursome |
hash<K,T,H,E> | STLplus | hash.hpp | dump_hash/restore_hash |
map<K,T> | STL | persistent.hpp | dump_map/restore_map |
multimap<K,T> | STL | persistent.hpp | dump_multimap/restore_multimap |
set<T> | STL | persistent.hpp | dump_set/restore_set |
multiset<T> | STL | persistent.hpp | dump_multiset/restore_multiset |
digraph<N,A> | STLplus | digraph.hpp | dump_digraph/restore_digraph |
matrix<T> | STLplus | matrix.hpp | dump_matrix/restore_matrix |
ntree<T> | STLplus | ntree.hpp | dump_ntree/restore_ntree |
Note that I have not done the container adaptors
queue, priority_queue and stack because
their interfaces are too restricted to allow dump and restore routines to be
written without burgling the data structure. This means that I will never do
them because it is impossible!
When designing a data structure to be made persistent, you need to bear
this in mind and use containers such as vector and
list rather than queue or stack.
I also haven't implemented any STL iterators. The design of iterators makes it nearly impossible to do this without burgling the data structure, which wouldn't be portable.
The persistence subsystem uses exceptions to indicate errors, in-line with the STLplus exceptions policy.
An exception is thrown since there is no conceivable recovery method that would allow the dump or restore to complete successfully.
The convention is that a dump function throws the
persistent_dump_failed exception and the restore
function throws the persistent_restore_failed exception if an
error is detected in the file format, but that it should keep going where
possible.
In addition, if you try to dump or restore a polymorphic type that hasn't
had its callbacks registered in advance, the exception
persistent_illegal_type will be thrown. The same exception is
used for both dump and restore.
The first two exceptions (persistent_dump_failed and
persistent_restore_failed) are subclasses of std::runtime_error.
The exception persistent_illegal_type is a subclass of
std::logic_error to reflect the fact that this can only happen due to a
programming error. All are subclasses of std::exception so can be caught by
catching this superclass.
This section discusses issues that you don't need to know about but which might give useful insights into how persistence works.
A problem that can occur when communicating between machines is the problem of byte-order. Different machine architectures store data using two different byte orders. This is referred to as Big- and Little-Endian Byte Ordering.
In both conventions, the address of an integer type points to the left end of the word but:
Bytes are addressed left to right, so in big-endian order byte 0 is the msB, whereas in little-endian order byte 0 is the lsB. For example, Intel-based machines store data in little-endian byte order so byte 0 is the lsB. Sun Sparc architectures are big-endian, so byte 0 is the msB.
The persistence functions solve the problem of inter-platform communication by always writing integers msB first so that the format is platform-independent.
The concept of file format versions was added for STLplus 1.0. The file format version of a dump is written to the dump file. When the file is restored, the version is the first thing read from the file. The idea is that, if the persistent dump format changes, then the format number changes. This will mean that it is possible to either support old file formats by branching on the format number read from the file, or at least detect them and raise an error if the old format is no longer supported. Also, if an old program tries to read a new format, it will fail but in a way that makes it easy to diagnose the problem.
You do not need to know about these format numbers unless you are personally responsible for writing dump/restore routines and then only if you ever need to change the file format for a particular data type. For example, the introduction of format numbers coincided with a change in the way all integer types are dumped. The old integer format is not supported.
The format version applies to the persistence file format, not the particular layout of your own data structures. If you want that level of fine-grain control, then give your own data structures format numbers as well.
This example shows how to make a multimap persistent. This is a
one-layer data structure because the multimap only contains the basic
types int and string (conceptually a
string is an atomic type, even if its implementation just happens
to be quite complicated - don't confuse implementation with concept).
The example is based on a test program which is used to test the persistence functions. It creates a data structure, dumps it to a file, restores the file into another data structure and then confirms that the two structures are identical.
The following definition is used to define two data structures that map an int onto a string:
multimap<int,string> data, restored;
The object called data will be used to store the data to be saved
in a file, whilst the object called restored will be used to restore
the data. It is then possible to compare the two to verify that they are the
same.
First, I fill the map with a random amount of random data, just to demonstrate the data persistence:
#define MAX_SIZE 2877
#define MAX_NUM 15254
...
// seed the random number generator with a different value each run (this is a common trick)
srand(time(0));
// select the random map size to generate
const unsigned number = (unsigned)rand() % MAX_NUM;
for (unsigned i = 0; i < number; i++)
{
// select a random key to add to the map
int key = rand();
// select random characters to add to the data string
const unsigned size = (unsigned)rand() % MAX_SIZE;
string value;
for (unsigned j = 0; j < size; j++)
{
char ch = (char)rand();
value += ch;
}
// finally, add the key/data pair to the multimap
data.insert(make_pair(key,value));
}
So, the multimap contains random integer keys mapped onto random length
strings of random data.
No functions need to be written to implement persistence of this data
structure! The pre-defined persistence functions can do the whole job (see the
table in the last section). The dump_multimap function dumps the map
by calling dump on the key and data types. The key type is
int, which already has a dump function defined. The data
type is string, which also has a dump function defined.
The first stage in saving this data structure to file is to create a
dump_context which needs to be attached to a TextIO output device. In
this case I'll choose to save the dump to file:
oftext out ("test_map.tmp");
dump_context dumper(out);
Now, the data structure can be dumped to this file:
dump_multimap(dumper,data); out.close();
In this example, the output file is explicitly closed because I'm about to
read it straight back in again. To read the file, a restore_context
needs to be created:
iftext in ("test_map.tmp");
restore_context restorer(in);
Now the data structure can be restored, in this case to a different object:
restore_multimap(restorer,restored);
The rest of the program just compares the two data structures to confirm they are identical. I don't need to go into that here.
In practice, it is clearer if you do in fact write a trivial pair of functions
called dump and restore to hide the use of the
template functions. This also means that you can always remember the name of
the persistence functions for any type you have designed - because they are
always called dump and restore. The functions
are:
void dump(dump_context& context, const multimap<int,string>& data)
{
dump_multimap(context, data);
}
void restore(restore_context& context, multimap<int,string>& data)
{
restore_multimap(context, data);
}
This example will show how to make a data structure with more than one level of structure persistent.
The example uses a vector of a user-defined class and makes it
persistent. It shows how to add persistence functions to a user-defined class
so that it can be used with the pre-defined vector persistence
functions.
The example requires the following set of includes. The reason for each include will be explained as the example unfolds:
#include <string> #include <vector> #include "stlplus.hpp" using namespace std;
The user-defined data structure is a class for storing email addresses. The class without persistence functions is:
class address
{
private:
string m_name;
string m_email;
int m_age;
public:
address(void) : m_age(0) { }
address(const string& name, const string& email, int age) : m_name(name), m_email(email), m_age(age) {}
const string& name(void) const {return m_name;}
const string& email(void) const {return m_email;}
int age(void) const {return m_age;}
};
To add persistence, it is only necessary to add a dump and
restore function which use the pre-defined dump and
restore for string and int. These are found in the
header persistent.hpp. The functions are added to the class as friend functions so that
they can access the private data fields directly:
class address
{
...
friend void dump(dump_context& str, const address& data)
{
dump(str, data.m_name);
dump(str, data.m_email);
dump(str, data.m_age);
}
friend void restore(restore_context& str, address& data)
{
restore(str, data.m_name);
restore(str, data.m_email);
restore(str, data.m_age);
}
};
The next stage is to define an address book, which is simply an unsorted
vector of addresses:
typedef vector<address> address_book;
This type is already persistent - there is a pre-defined pair of template
functions dump_vector and restore_vector defined in
persistent.hpp. However, it is more consistent to provide
overloaded non-template dump and restore functions for the
address_book type:
void dump(dump_context& str, const address_book& data)
{
dump_vector(str, data);
}
void restore(restore_context& str, address_book& data)
{
restore_vector(str, data);
}
The following test program shows how an address book can be created and
dumped, then restored to another address_book object:
int main(unsigned argc, char* argv[])
{
// create and populate an address book
address_book addresses;
addresses.push_back(address("Andy Rushton", "ajr1@ecs.soton.ac.uk", 40));
addresses.push_back(address("Andrew Brown", "adb@ecs.soton.ac.uk", 85));
addresses.push_back(address("Mark Zwolinski", "mz@ecs.soton.ac.uk", 21));
// dump the address book
oftext out ("test.tmp");
dump_context dumper(out);
dump(dumper,addresses);
out.close();
// restore the address book to a different object
address_book restored;
iftext in ("test.tmp");
restore_context restorer(in);
restore(restorer,restored);
return 0;
}
In this case I'm using persistence to a file, so I've used the FileIO
devices oftext and iftext defined in
fileio.hpp.
It would be useful to be able to print out the contents of the address book
before and after the dump/restore. To do this I'll use the
family of print functions defined in the various utilities
headers. These follow the same conventions as the persistence functions -
there is a print function for each basic type and then template
print_class functions for each template class. Like the
dump_class and restore_class functions, these
cannot be overloaded (VC++ cannot handle overloaded templates), so for example
the print function for vector is called print_vector. It is
declared in string_utilities.hpp. The print functions for
basic types are also declared in string_utilities.hpp.
The following functions are added to the address class to make it printable:
class address
{
...
friend otext& print(otext& str, const address& entry)
{
return str << entry.m_name << " <" << entry.m_email << "> aged " << entry.m_age;
}
friend otext& print(otext& str, const address& entry, unsigned indent)
{
print_indent(str, indent);
print(str, entry);
return str << endl;
}
};
The convention with print functions is to supply two functions:
one which prints inline - i.e. without line breaks - and a second with an
extra indent parameter which prints the object indented on a line of
its own. The second is typically written so that it calls the first, as in
this case.
The address_book type is now printable by using the
print_vector functions which simple call the print function
for each element. However, as before, it is more consistent to provide a
non-template function called just print:
otext& print(otext& str, const address_book& addresses, unsigned indent)
{
return print_vector(str, addresses, indent);
}
It is now possible to print the address book before and after the
dump/restore:
int main(unsigned argc, char* argv[])
{
...
ferr << "addresses:" << endl;
print(ferr, addresses, 1);
ferr << "restored addresses:" << endl;
print(ferr, restored, 1);
return 0;
}
Since this is a test program, it would be better if the program tested the
equality of the before and after address books. The STL defines vector
equality (operator==) in terms of the equality of the elements, so it
is only necessary to give the address class an equality operator and
the problem is solved:
class address
{
...
friend void operator == (const address& left, const address& right)
{
return (left.m_name == right.m_name) && (left.m_email == right.m_email) && (left.m_age == right.m_age);
}
};
The test program can now have a test for success or failure added at the end:
int main(unsigned argc, char* argv[])
{
...
// verify that the address books are the same
if (addresses != restored)
{
ferr << "restored addresses are different - Boo" << endl;
return 3;
}
ferr << "restored addresses are the same - Hooray" << endl;
return 0;
}
The output of this program when run is:
addresses: Andy Rushton <ajr1@ecs.soton.ac.uk> aged 40 Andrew Brown <adb@ecs.soton.ac.uk> aged 85 Mark Zwolinski <mz@ecs.soton.ac.uk> aged 21 restored addresses: Andy Rushton <ajr1@ecs.soton.ac.uk> aged 40 Andrew Brown <adb@ecs.soton.ac.uk> aged 85 Mark Zwolinski <mz@ecs.soton.ac.uk> aged 21 restored addresses are the same - Hooray