update statistics

Update statistics for a database

Syntax

upd_stats_stmt:
               UPDATE {STATISTICS | STATS} ON database_name

Description

The UPDATE STATISTICS (or UPDATE STATS) statement will cause RDM SQL to generate a file containing data distribution statistics for each column in every table declared in database database_name which must be open in the connection from which this statement is being executed. The data is derived from an evenly distributed sampling of the rows of each table and is stored in a database named sysstats which stores the data distribution statistics for all of the databases stored in the same TFS directory.

The amount of data to be sampled is determined from these options specified in a previously executed SET STATISTICS statement:

  • The HISTOSIZE clause specifies the number of distinct values for each column whose distribution counts are to be kept. The default is 25. The minimum allowed value is 10 and the maximum allowed value is 100.
  • The SAMPLING clause specifies how much of the database is to be sampled and includes a minimum number of rows, maximum number of rows, and percent of table to be scanned (integer value between 0 and 100) of each table to be sampled. So, for example, a "sampling (10, 100, 10000)" means that 10% of the number of rows in each table will be sampled but at least 100 (if there are 100 rows) and no more than 10,000 rows must be sampled. The larger these values are, the more accurate will be the distribution statistics but UPDATE STATISTICS will take longer to run. If the SAMPLING clause is omitted the system default is (10, 1000, 10000). The minimum number of rows specified must be at least 100.
  • The AUTORUN clause specifies how often RDM SQL should automatically run an UPDATE STATISTICS. The pct_num (an integer value greater than or equal to 5) specifies a percentage change in the total number of rows stored in the database since the last UPDATE STATISTICS execution. When a percentage change in size greater than pct_num occurs, an UPDATE STATISTICS will be automatically executed. The minrows_num is an integer value that specifies the minimum number of rows that must be contained in the database before an UPDATE STATISTICS will automatically be executed. The minimum allowed minrows_num value is 100 rows.

    If the AUTORUN clause is not specified then the statistics will only be updated the next time an UPDATE STATISTICS statement is executed. If an AUTORUN was specified in a prior UPDATE STATISTICS, it can be disabled by omitting the AUTORUN clause.

    Automatic UPDATE STATISTICS executions occur on the first database open after the specified AUTORUN conditions have occurred. It will block the application (database open call) until it completes. This means that when this occurs, the database open time delay may be quite noticeable.

  • The THREADS clause can be used to explicitly specify the number of threads to use during UPDATE STATISTICS execution. If not specified, the system will invoke one thread for each CPU/core available on the computer. Each thread collects the stats on a single table. Note that multi-threaded operation is only possible when the database has been opened in shared mode.

Data distribution statistics are collected for every column declared in the database except for LONG VARCHAR/VARBINARY and ROWID PRIMARY KEY columns.

The statistics for a given database are used by the RDM SQL query optimizer to estimate the potential costs associated with each possible access method that is considered by the optimizer for a given query. By setting a larger HISTOSIZE and ensuring that a representative number of rows are included in the SAMPLING, the optimizer will have good statistics to guide its access method choices. However, a larger HISTOSIZE will require more memory for storing the optimizer stats. In general, the amount of memory needed for a database's statistics is based on the histosize and the total number of columns in the database.

For information about accessing statistics, refer to the Access to a Database's Statistics section of the documentation.