rdm-vacuum

Database vacuum tool

Synopsis

rdm-vacuum [OPTION]… db_namespec

Description

rdm-vacuum is a command-line utility that vacuums data in the specified RDM database for reducing the amount of disk spaced used. Specify one of the options '--best', '--first', '--auto', '--schema', or '--table' to do vacuuming. One pack file will be vacuumed for each invocation of this tool. Specifying '--new-pack' or '--flush-idindex' without any of the previous options will honor those options without doing any explicit vacuuming.

Short options can be combined into one string starting with a single '-'. Mandatory arguments to long options are mandatory for short options too. Long option arguments can also be specified in a separate argument.

Options

--h=help Display this usage information
--version Display the version information
-q, --quiet Quiet mode. No information will be displayed.
--key=key Specify the encryption key for the database ([algorithm:]passcode). The valid algorithms are xor, aes128, aes192 and aes256. The aes algorithms are only available for packages that have strong encryption support. If an algorithm is not specified, the default is aes128 for strong encryption packages and xor otherwise.
--docroot=path The location of the docroot directory. (See Document Root (DOCROOT))
-b, --best Vacuum the pack file with the least in use (incompatible with tables, schema, first, auto)
-f, --first Vacuum the first pack file (incompatible with tables, schema, best, and auto)
-a, --auto Vacuum the first or the best pack file (incompatible with tables, schema, best, and auto)
--flush-idindex Force a flush of the id index prior to vacuuming
-n, --new-pack Force creation of a new pack file prior to vacuuming
-s, --schema Vacuum the schema system drawers (incompatible with first, best, and auto)
-t, --table=table_name Vacuum only the specified table. Use this option more than once to vacuum multiple tables. (incompatible with first, best, and auto
--db-size=size Maximum size of all pack files (in bytes)
--pack-file-size=size Maximum size of a pack file (in bytes)
-r, --restore Restore original database settings after vacuuming
--vacuum-percentage=percentage Percentage of in use in a pack file to vacuum
--vacuum-read-size=size Vacuum read buffer size (in bytes)
--vacuum-write-size=size Vacuum write buffer size (in bytes)
db_namespec An RDM Database URI

Comments

rdm-vacuum attempts to vacuum an existing database. Vacuuming consist of moving data from sparse pack files into continuous blocks in later pack files. The net result of such an operation is less disk usage at the cost of some additional writes.

Doing vacuuming make write operations much more efficient compared to techniques where the space in the pack files are reused. RDM 14.1 and newer insert, update, and delete row data and underlying indexes by extending the last pack file by appending to that file. Only in the case where a transaction spans more than one pack file will previous pack files be appended after the next pack file have been created. Only during recovery may pack files be truncated. Only when a pack file is no longer to any use will it be deleted.

RDM will automatically vacuum pack files when the usage in a pack file is below a given threshold. This threshold can be set programmatically using rdm_dbSetOptions() or use of the command line option --vacuum-percentage=percentage to this tool.

Conditions for vacuuming a pack file

Other options are also needed to control vacuuming. For instance, for vacuuming to take pace there are certain conditions that must be met. The following paragraphs will give some background to teach how vacuuming can be controlled.

If there is data in a pack file that has not yet been committed (or rolled back) or there are snapshots that have a hold on the data in that pack file, these operations must be committed, rolled back, or released before vacuuming can take place on this pack file. This can only be controlled by the application performing these operations. For this reason, you may want to avoid large transactions or long-lived snapshots.

For vacuuming to take place at regular intervals, the data must be broken up into reasonable pack files. The default and maximum size for a pack file is a little smaller than 2 GiB. This can be set to a smaller value using --pack-file-size=size. Alternatively, you can use this tool to request a new pack file with --new-pack.

The pack files serve the purpose of a database transaction log for recovery purposes but are also the object store for the database. It is completely self-contained. For efficient lookup into this object store an ID-index is used. The block size of the ID-index is rather large and it has been designed to not needing flushing to disk very often. However, if the ID-index has not been flushed for some data residing in an associated pack file, that pack file cannot be deleted at the time the pack file has been vacuumed. The ID-index can be explicitly flushed using this tool with --flush-idindex.

Also, a pack file that is not the oldest pack file cannot be deleted. This means that if the disk usage goes beyond the threshold (--db-size=size), the first pack file will be vacuumed instead of the best pack file. Vacuuming the first pack file can be forced using this tool with --first.

In the case where a pack file has been vacuumed but not yet deleted the disk usage may stay the same. However, the pack file is likely to be dropped from the file system cache. On Unix systems that support it, we call posix_fadvise with option POSIX_FADV_DONTNEED to make this happen sooner.

As we have seen, there are a lot of factors that affect how vacuuming operates. Everything this tool can do can also be done programmatically using rdm_dbSetOptions(), rdm_dbFlushIdIndex(), rdm_dbCreateNewPackFile(), and rdm_dbVacuum().

Usage Example

You can run rdm-vacuum to vacuum the first pack file for a RDM database as follows:

$ rdm-vacuum --first --new-pack --flush-idindex bookshop

The above command will create one new pack file, flush the ID-index (thereby allowing pack files to be removed), and vacuum the content of the first pack file into the new pack file. When vacuuming is done the first pack file will be deleted.