R-tree Overview

RDM supports the optimized storing of multi-dimensional data through an implementation of the R-tree index.

R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The R-tree was proposed by Antonin Guttman in 1984 and has found significant use in both theoretical and applied contexts. A common real-world usage for an R-tree might be to store spatial objects such as restaurant locations or the polygons that typical maps are made of: streets, buildings, outlines of lakes, coastlines, etc. and then find answers quickly to queries such as "Find all museums within 2 km of my current location", "retrieve all road segments within 2 km of my location" (to display them in a navigation system) or "find the nearest gas station" (although not taking roads into account

The key idea of the data structure is to group nearby objects and represent them with their minimum bounding rectangle in the next higher level of the tree; the "R" in R-tree is for rectangle. Since all objects lie within this bounding rectangle, a query that does not intersect the bounding rectangle also cannot intersect any of the contained objects. At the leaf level, each rectangle describes a single object; at higher levels the aggregation of an increasing number of objects. This can also be seen as an increasingly coarse approximation of the data set.

As with most trees, the searching algorithms (e.g., intersection, containment, nearest neighbor search) are rather simple. The key idea is to use the bounding boxes to decide whether or not to search inside a subtree. In this way, most of the nodes in the tree are never read during a search. Like B-trees, this makes R-trees suitable for large data sets and databases, where nodes can be paged to memory when needed, and the whole tree cannot be kept in main memory. (Wikipedia, 2016)

Defining an R-tree in a RDM Schema

An r-tree index can be placed on a column that meets a few specific requirements

  • The column must be an array of numeric or real values

    • Numeric type - (u)int8, (u)int16, (u)int32, (u)int64

    • Real type – float, double

  • The R-tree index must be on a single column (no compound R-tree index definitions)

  • A R-tree specifies a low and high value in each dimension, therefore the column array dimension must be a multiple of two

  • RDM will support an R-tree with a maximum of four dimensions so the column array dimension must be either 2 (one dimension), 4 (two dimensions), 6 (three dimensions), or 8 (four dimensions).

Valid R-tree index definitions in the schema:

i8 INT8 ARRAY[4] KEY USING R_TREE NOT NULL /* 2 dimensional signed 8-bit */
u16 UINT16 ARRAY[2] PRIMARY KEY USING R_TREE /* 1 dimensional unsigned 16-bit */
i32 INT32 ARRAY[6] UNIQUE KEY USING R_TREE NOT NULL /* 3 dimensional signed 32-bit */
u64 UINT64 ARRAY[8] PRIMARY KEY USING R_TREE NOT NULL /* 4 dimensional unsigned 64-bit */
dbl DOUBLE ARRAY[4] PRIMARY KEY USING R_TREE NOT NULL /* 2 dimensional double */
flt FLOAT ARRAY[2] KEY USING R_TREE NOT NULL /* 2 dimensional float */