A Hot Backup Example

Hot backup utilities may be able to make a hot backup of an RDM database. There are a few requirements that have to be met for such a tool to work correctly with RDM:

  • Pack files have to be copied in ascending order by the pack file name
  • Any time RDM appends a pack file for the source database, the extra content needs to be appended to the corresponding pack file on the target database
  • Any time RDM creates a new pack file for the source database the backup of the previous pack file has to be completed by the hot backup before it creates the new pack file on the target database
  • The hot backup utility needs to be able to keep up with RDM where pack files must be opened before RDM has deleted them

This approach is guaranteed to work since RDM always creates pack files in ascending order and it never creates or appends to a pack file before it is finished writing the previous pack files.

If the hot backup utility is slow compared to RDM, it is likely that the last requirement listed above will not be met.

Example rdm-hot-backup for Linux

We have provided an example utility located in the installation directory of RDM under ‘GettingStarted/rdm-hot-backup’. This example has been implemented using the inotify API for monitoring file system events under Linux. A similar tool could be made for other operating systems that support monitoring of file-system events.

The example utility assumes that no events are lost due to insufficient kernel memory. This assumption is reasonable only as long as a limited number of databases are being backed up this way. One should make sure that sufficient kernel memory has been set up for the actual use case. Please consult the documentation of inotify for details.

Other software that monitors file system events also has to be taken into account. One should pay special attention to IDEs, as they use file-system events to monitor files under their control. Large software projects can easily exhaust the default reservation of kernel memory.

The example utility has another limitation. It cannot be used to backup a database located on a remote network file system (NFS). The source database has to be located on a local file system. If the use case calls for a hot backup from a remote NFS, the backup has to occur on the remote NFS server itself. The target database does not have this restriction. This means that a hot backup from one machine to another machine has to occur on the file system server of the source database.

Here is the help text for the rdm-hot-backup example tool:

Usage:

 rdm-hot-backup [OPTION]... source-dir target-dir

rdm-hot-backup is a utility that uses inotify(7) to monitor
changes to a source RDM database and then applies them to a target
database

Options:

 -h, --help       Display this usage information
     --version    Display the version information
     --no-delete  Do not delete files on the target
 -q, --quiet      Quiet mode. No information will be displayed
 source-db        The source directory for the source database without
                    the '.rdm' extension
 target-db        The target directory for the target database without
                    the '.rdm' extension

Assuming there is an application that uses RDM and that the database used is named ‘DB’, while the application is running and producing data (or not) one can make a continuous hot backup to ‘DB-COPY’ as follows:

$ rdm-hot-backup DB.rdm DB-COPY.rdm&

The command above will first copy the database and continuously extend the copy with any new data that is applied to ‘DB’ as soon as the content is written to the pack file and the kernel has notified the hot backup example utility process. At any time after the initial copy has been made, the command can be terminated and the resulting copy can be used in place of the original. This is on the assumption that the hot backup example utility can keep up with RDM as discussed in the beginning of this section.

The hot backup example utility has some advantages but also some disadvantages compared to the rdm-replicate command line tool discussed next.

The main advantage for the hot backup example utility is that it is a very simple and efficient implementation. Its correctness is obviously easier to verify than the rdm-replicate command line tool. It uses the inotify API instead of some code running as part of the RDM engine. The hot backup example utility acts as an observer of the changes made to the database files by RDM, and it is therefore very unlikely that the hot-backup example utility can trigger a potential bug in RDM. This cannot be said for the rdm-replicate command line tool. There is some code in RDM to accommodate rdm-replicate and this code has to deal with execution of other threads and the logic in the code to handle normal RDM operations as well as the replication. This makes it more complicated than the hot backup example utility.

A secondary advantage is that the hot backup example utility is very efficient. Where there are no changes, the hot backup example utility is blocked on file system calls. This saves CPU cycles, and the CPU may also be able to hibernate to save even more power. This is an important feature in many use cases. Furthermore, when there is content to be copied, the utility does not need to deal with any other remote procedure calls, other threads, complicated logic, decoding data, or encoding data. It only has to create and delete files and copy content between files as needed. This simple implementation makes it slightly more efficient than the rdm-replicate command line tool.

The disadvantage of the hot backup example utility is that the target database cannot be used while the hot backup is running. However, as soon as the hot backup process has terminated, the target database can be used. The only requirement for this database to be usable by RDM is that the hot backup example utility is able to make an initial copy and that it does not lag behind where pack files fail to be copied. If the hot backup example utility is terminated in the middle of a transaction, RDM will discard that transaction in its entirety when the target database is later opened. If additional data produced by RDM for the target database immediately goes into a new pack file, it is likely that the hot backup example utility terminated in the middle of copying a transaction.

If the target database is not used for updates after the hot backup example utility has been terminated, the hot backup example utility can be restarted. If there is still an overlap between pack files on the source and the target, the hot backup can continue normally. Any incomplete transaction on the target will be resumed.

We will discuss the rdm-replicate command line tool next.