Index (Key) Overview

A database index is a data structure that improves the speed of data retrieval operations on a database table. It is used to optimize query performance by reducing the amount of data that needs to be scanned or searched when executing a query.

In a database, the data is typically stored in tables consisting of rows and columns. When a query is executed, it specifies certain conditions or criteria to select specific rows from the table. Without an index, the database management system (DBMS) would have to scan the entire table to find the matching rows, which can be time-consuming and resource-intensive, especially for large tables.

An index creates a separate data structure that contains a subset of the table's columns and their corresponding values. The index is organized in a way that allows for efficient lookup and retrieval of data based on the indexed columns. It acts as a pointer to the actual location of the data in the table.

The index is created on one or more columns of a table, and it can be either clustered or non-clustered. In a clustered index, the physical order of the table's data rows matches the order of the index. In a non-clustered index, the index structure is separate from the actual data, and it contains a reference to the corresponding data rows.

When a query is executed, the DBMS can utilize the index to quickly locate the relevant rows based on the indexed columns, reducing the amount of data that needs to be scanned. This significantly improves the query performance, especially for queries that involve filtering, sorting, or joining tables based on the indexed columns.

However, it's important to note that while indexes improve query performance, they also have some trade-offs. Indexes require additional storage space, as they store a duplicate subset of the data. They also introduce overhead during data modification operations (such as insert, update, or delete), as the indexes need to be updated to reflect the changes in the table.

Therefore, it's essential to carefully choose the columns to index and strike a balance between query performance and the overhead introduced by indexes. Proper indexing is an important aspect of database design and optimization, and it requires considering the specific queries and usage patterns of the database.

RaimaDB supports several indexing methods.

  • ROWID Index
  • B-Tree Index
  • Hash Index
  • R-Tree Index

Remember, while indexes increase performance of data retrieval, there is a performance cost to maintain the indexes when the row is created or updated.

You do not directly use an index from SQL, but indexes are used by the RaimaDB SQL optimizer in the selection of an access plan for retrieving data from the database. More indexes provide the optimizer with more alternatives and can greatly improve SELECT execution performance. Unfortunately, the cost associated with a large number of indexes is a large amount of required storage and a lower performance, incurred by insert, update and delete operations.

Therefore, your selection of table columns to include in an index requires careful consideration. In general, create an index on the columns through which the table's rows typically will be accessed or sorted. Do not create an index for every possible sort or query that may be of interest to a user. Create indexes on the columns you expect will be used most often in order to speed access to the rows or to order the result rows.

Index Definitions

PRIMARY KEY

For RaimaDB, the PRIMARY KEY uniquely identifies the row in the table. The characteristics of this index type are:

  • The column data cannot be NULL;
  • The column data cannot be updated after the row is inserted into the table;
  • No duplicates are allowed;
  • R-Trees cannot be designated as a PRIMARY KEY.

UNIQUE KEY

The characteristics of this index type are:

  • No duplicates are allowed (NULL columns will not be evaluated to be duplicate columns);
  • Columns included in key specification can be updated.

KEY

The characteristics of this index type are:

  • Duplicates allowed.

Compound Key

The compound key references a grouping of two or more columns into a single index. This type of index is typically used for creating a sorted group on one column with a sorted subtype of a second column. A compound key is typically used only for reporting (retrieval) but could be unique or primary key.

Index Methods

ROWID Index

The ROWID column is an index that uniquely identifies the rows in the table. The ROWID is defined as a 64 bit unsigned integer (UINT64). Upon creation of the row, the ROWID  can be set or it will be automatically generated with a unique value.

If the ROWID column is used as the retrieval access method, the rows will be ordered in numerical order for this column.

B-Tree Index

The default indexing method is a B-tree which organizes the indexed column values so that they are stored in sorted order. This allows fast access to all the rows that match a specified range of values. It also provides the ability to retrieve the table rows in the column order defined by the key specification grammar which avoids the need to do a separate sort when a data is retrieved using this access method.

Hash Index

This index method is called hashing in which the location of a row a determined from performing a "hash" of the indexed column value. However, the values are not sorted and, hence, hash indexes are only used to find rows that match a specific value. For RaimaDB, this index method is intended for situations where the data in the index column or columns is very large and there is no need for ordered data retrieval. The method increases performance since the amount of data stored in the index nodes can be reduced significantly. Some characteristics of an Hash index are:

  • The index is always a UNIQUE indexes;
  • The index cannot be used for sorted range access (no next/previous navigation).

R-Tree Index

RaimaDB supports the optimized storing of multi-dimensional data through an implementation of the R-tree index. Some characteristics of an R-tree column are:

  • The column must be an ARRAY of integer or real values. For more in formation on ARRAY types, refer to the Arrays section.
  • The R-tree index must be on a single column (no compound R-tree index definitions).
  • The R-tree specifies a low and high value in each dimension, therefore the column ARRAY dimension must be a multiple of two.
  • RaimaDB will support an R-tree with a maximum of four dimensions so the column array dimension must be either 2 (one dimension), 4 (two dimensions), 6 (three dimensions), or 8 (four dimensions).
  • The R-tree column cannot be a PRIMARY KEY for the table.

For more information, refer to the R-tree Index Overview section.