Database Storage Design

The RaimaDB system is capable of using two different storage systems for the database image. The database image can be stored on a persistent file system (HDDClosed A computer hard disk drive (HDD) is a non-volatile data storage device. Non-volatile refers to storage devices that maintain stored data when turned off. All computers need a storage device, and HDDs are just one example of a type of storage device., SSDClosed A solid-state drive (SSD) is a new generation of storage device used in computers. SSDs store data using flash-based memory, which is much faster than the traditional hard disks they've come to replace., SD cardClosed “SD” or Secure Digital was developed by the SD Association and is a proprietary non-volatile memory card format for use in portable devices., microSD cardClosed “SD” or Secure Digital was developed by the SD Association and is a proprietary non-volatile memory card format for use in portable devices. or other persistent storage device with a supported file system), in RAM (volatile memory), or in a defined memory buffer.

Persistent Storage

Traditional databases use data structures that minimize the number of pages read. This approach is crucial when traditional HDDs are used as the time it takes to retrieve the data is proportional to the number of pages read. This is also true for SSDs to some extent as the I/O channels are optimized for HDDs, but the data can be retrieved an order of magnitudes faster from an SSD than from an HDD.

Databases normally implement durability (one of the properties of ACIDClosed An acronym that stands for: atomicity, consistency, isolation, and durability. It refers to important reliability properties of database transactions that a database management system must be able to guarantee.) using a transaction log, which may or may not include the actual user data. The pages written for the transaction log and the pages written for the updated database image are likely to end up on the same sector since they are written around the same time. This is especially true for small transactions. The pages for the transaction log will normally be made stale before the recently written pages of the database image. This means that those sectors may soon end up with some stale pages that will take up space, or the sectors may soon reach the threshold for garbage collection. Performance may decrease in both cases.

"Copy on write" is a technique used in a wide variety of applications in computing. RaimaDB uses this technique in a couple of areas. The main idea behind copy-on-write is to copy some content and then modify the copy instead of modifying the original when writing to memory, a file, or a block device. That’s where the term copy-on-write came from. Some applications are copy-on-write in functional programming in combination with "reference counting." In Unix, copy-on-write is used in sharing the virtual memory in the implementation of the fork system call with some assistance from hardware. Also, the btrfs file system in Linux uses copy-on-write to allow efficient transaction handling and snapshot isolation. This technique is also used by hyper-visors and virtual machine implementations for efficient memory management.

RaimaDB implements variable size records with updates using copy-on-write. The database imageClosed On the file system level, the database consists of a number of pack files that hold user data, indexes, and metadata needed for recovery and vacuuming. These pack files are collectively referred to as a “pack.”(pack)  is split into separate files referred to as "pack files." Any update of the database is always performed as a copy-on-write to the last pack file in the pack. The maximum size of a pack file is less than 2GiB. The pack file size can be made smaller with configuration options. See pack_file_size page in Configuration Options section.

Since RaimaDB only appends to the end of the pack, it will grow indefinitely as a result. To address this issue, RaimaDB implements a garbage collection process referred to as "vacuuming." Refer to the Database Vacuuming section for more detail.

The database files are managed by the TFS and are located in a single (user-defined) directory on the persistent storage device referred to as the docroot or Document Root. Refer to the Document Root (docroot) section for more detail.

In-memory Storage

RaimaDB contains a data storage engine optimized specifically for working with memory resident data sets. This In-memory Database Engine (IMDB) allows for significant performance gains and a reduction in processing requirements compared to the persistent storage engine. The RaimaDB IMDB runs alongside the RaimaDB persistent storage engine and databases can be opened with either one.

There is no need for proprietary keywords in the database schema to utilize the IMDB instead of the persistent storage engine. The developer sets an option prior to opening the database to configure and use the IMDB. This allows the same database schema to be used with either the persistent or the in-memory database images.

There are four different modes for an in-memory database, they are: inmemory_volatile, inmemory_keep, inmemory_load, and inmemory_persist.

In-memory Volatile

When the first client opens a database with the storage configuration option set to inmemory_volatile, the newly created database is considered to be “throw away.” The newly created database created will be empty. When the last client closes the database, the entire database is discarded. The RaimaDB Transaction File Server (TFS) maintains an open count to determine when a database is first opened and when it is no longer in use by any clients.

In-memory Keep

When the first client opens a database with the storage configuration option set to inmemory_keep, the newly created database is considered to be “throw away”, just like imemory_volatile.The difference from the former is that the database is not discarded when the last client closes the database. As long as the TFS managing the in-memory database image is running, the inmemory_keep database can be reopened with all of the previous database changes intact. When the TFS is terminated, the entire database is discarded.

The inmemory_volatile and inmemory_keep storage configuration options are the only storage options available for the diskless version of the RaimaDB TFS library.

In-memory Load

When the first client opens a database with the storage configuration option set to inmemory_load, the RaimaDB TFS will look for a database image in the docroot associated with the TFS. If the database is found the in-memory database will be loaded with the current contents of the on-disk files. When the last client closes, the entire database is discarded from memory. However, the on-disk image will contain all changes up to the last "persist" operation. Changes can be saved programmatically using the rdm_dbPersistInMemory() API at anytime by a client with the database open.

In-memory Persist

When the first client opens a database with the storage configuration option set to inmemory_persist, the RaimaDB TFS will look for a database image in the docroot associated with the TFS. If the database is found the in-memory database will be loaded with the current contents of the on-disk files. When the last client closes, all committed changes to the database since the last "persist" operation will be written to the on-disk database image before the entire database is discarded from memory. Changes can be saved programmatically using the rdm_dbPersistInMemory() API at anytime by a client with the database open.