Discovering a Query's Access Plan

RDM SQL provides the ability to discover the execution plan chosen by the optimizer for a particular query. Function SQLShowPlan can be used to return a result set from one statement handle that describes the access (or execution) plan for a SELECT, UPDATE, or DELETE statement that has been compiled on another statement handle. You can also display this same result set from the rdm-sql utility using the ".X" command. The function SQLShowPlan takes two arguments. The first is the statement handle on which the result set will be returned. The second is the statement handle of the compiled SELECT, UPDATE, or DELETE statement whose access plan is to be returned. The result set returned on the first statement handle is described in the following table.

Table 12. Function SQLShowPlan Result Set Definition
SQLShowPlan Type Description
step_no smallint The step number of the plan beginning with 1. There is one step for each table listed in the FROM clause but not necessarily in that order.
table_cat char(32) The name of the database catalog.
table_schem char(32) The name of the database. Usually this is the same as the catalog but it is possible that there can be different databases that have the same catalog.
table_name char(32) The name of the table being accessed. If a correlation name was specified for the table then this will contain the correlation name.
method_name char(16) The name of the method used to access the rows of the table in this step. See table below.
key_column char(32) The name of the key column through which the rows will be accessed.
step_cost bigint The total estimated cost in logical reads incurred by this plan step.
rows_in bigint The number of candidate rows that are input to this plan step.
rows_out bigint The number of result rows that are output from this step. The difference between the rows_in and rows_out are the number of rows filtered by the WHERE clause conditionals that reference columns from this table. This is where the restriction factor is applied.
plan_cost bigint The total cost of the plan in logical reads. The value is the same for each step.

The possible access methods returned in the method/method_name column are described in the next table.

Table 13. Function SQLShowPlan Access Methods
Method Name Description
ROW COUNT Returns the current row count for a table (select count(*) from table).
VIRTUAL TABLE Access is to a virtual table.
ROWID SCAN Performs a possibly delimited scan of a table in rowid order.
ROWID FIND Finds a row by a rowid primary key value.
ROWID LIST Finds the rows by rowid primary key that match each of a list (in) of values.
FK THRU PK RID Finds the foreign key rows that match a rowid primary key value.
FK TO PK Finds the primary key table row referenced by a particular foreign key row.
PK TO FK Finds each referencing foreign key table row from its primary key row.
KEY FIND Finds a row (or rows) by key with a specified value.
KEY SCAN Performs a possibly delimited scan of a table through a key.
KEY LIST Finds the rows by key that match each of a list (in) of values.
KEY LIKE Finds the rows by key that match the like pattern.
FK THRU PK KEY Finds the foreign key rows that match a primary key value.

An example using the rdm-sql command-line utility is shown below.

 Raima Database Manager SQL command processor Utility
 Raima Database Manager 14.0.0 Build 1000 [8-1-2016] http://www.raima.com/
 Copyright (c) 2016 Raima Inc., All rights reserved.
 
 Enter ? for list of interface commands.
 rdm-sql: open database nsfawards;
 rdm-sql: set stats histosize=100 sampling=(20,100,25000) threads=4;
 rdm-sql: update stats on nsfawards;
 rdm-sql: select name, award_date, amount, title from sponsor, award 
 rdm-sql> where sponsor_nm = name and state in ("OR","WA") 
 rdm-sql> and award_date between "2001-01-01" and "2001-12-31" 
 rdm-sql> and amount >= 1000000.00;
 name                         award_date   amount title
 The Institute for System B   2001-07-30  1000000 KDI: A Framework for Particle..
 U of Washington              2001-06-08  1009466 CISE Experimental Partnerships: An..
 	. . .
 Oregon Hlth and Science U    2001-10-01  3925000 ITR/IM+AP: Quality-Scalable Informa..
 U of Oregon Eugene           2001-08-08  5000000 Cryogenic Helium Turbulence Research
 U of Washington              2001-05-23  5895234 Ship Operations
 *** 20 row(s) returned
rdm-sql: .x
*** advanced command mode is on
rdm-sql: .S
 STEP_NO TABLE_NAME METHOD_NAME KEY_COLUMN STEP_COST ROWS_IN ROWS_OUT PLAN_COST
       1 award      KEY SCAN    amount          4406    4309      411      4612
       2 sponsor    FK TO PK    sponsor_nm       205     411       13      4612

(Note that the TABLE_CAT and TABLE_SCHEM columns are not shown so that the important data can easily fit into a single row table.) To see how the optimizer arrived at this access plan choice we need to consider the choices available to it. Besides the obligatory join predicate, the WHERE clause specifies three other conditionals. Two on the award table and one on the sponsor table. Each references a column on which a key exists and, thus, each key is a candidate access method. The restriction factors for each conditional can be computed based on the histogram totals gathered from the UPDATE STATS statement which was earlier executed on the nsfawards database. The following two queries compute the restriction factor for the state in ("OR","WA") conditional.

rdm-sql: select norows, nosampled from systats.systabstats where tabname="sponsor";
               NOROWS   NOSAMPLED
                 7726        1545
rdm-sql: select SUM(eq_total)/1545.0 
rdm-sql> from systabstats natural join syscolstats natural join syscolval
rdm-sql> where tabname="sponsor" and colname="state" 
rdm-sql> and val in ("OR","WA");
   SUM(EQ_TOTAL)/1545.0
      0.0336569579288026

Thus, this means that the optimizer estimates that .03366 * 7726 = 260 rows from the sponsor table satisfy this conditional. Note that all of the sampled rows that match values in the histogram contribute to the eq_total values. Hence, the restriction factor for equality comparisons is the sum of the eq_total for the matching values divided by the number of sampled rows.

Similarly, the following queries compute both the restriction factor and the cardinality estimate (i.e., an estimate of the number of rows from the award table that satisfy the conditional) for the award_date between "2001-01-01" and "2001-12-31" condition.

rdmsql: select nosampled from sys$tabstats where tabname="award";
   NOSAMPLED
       25000
rdm-sql: select (sum(eq_total)+sum(ne_total))/25000.0 "Rest. Factor" 
rdm-sql> from syscolstats natural join syscolval
rdm-sql> where colname = "award_date"
rdm-sql> and val between "2001-01-01" and "2001-12-31";
            Rest. Factor
                 0.09544

The cardinality estimate for this restriction factor is 0.09544*132365 = 12,633 rows. .

Finally, the following query allows us to compute the restriction factor and the cardinality estimate for the amount > 1000000.00 condition.

rdm-sql: select eq_total, ne_total, val from syscolstats natural join syscolval
rdm-sql> where colname = "amount"; 
 eq_total ne_total val
         			...
           10           10 1000000
            1           89 1017028
            1            7 1141200
            1          224 1153211
            1          104 1614111
            4          233 2000000
            1           19 5367516
            1          108 6100000

(As the val column is a character column, a direct comparison using the amount > 1000000.00 condition does not work as a numeric comparison is needed). From this query (only the relevant val column values are displayed), you can see that there are 7 entries in the histogram for the column which satisfy the subject conditional with a total eq_total + ne_total value of 814 rows. Hence, the restriction factor for the amount conditional is 814/25000 = .03256 which on the 132365 total rows in the amount table gives a cardinality estimate of 0.03256 * 132365 = 4310 rows.

The following table summarizes the results found so far.

Table 14. Example Conditional Optimization Factors
Conditional Restriction Factor Cardinality Estimate
state in ("OR","WA") .03366 260
award_date between "2001-01-01" and "2001-12-31" .09544 12633
amount > 1000000.00 .03256 4310

Usually, the overall cost estimates track with the cardinality estimates so that the methods that access the fewest rows are the cheapest. From the above table it would seem that use of the key on state to access only 260 rows would be cheaper than the other two keys. However, it is important to note that there are two tables involved in this query. By starting with the key on state in order to retrieve the estimated 260 sponsors located in Oregon and Washington, an average of 132,365 (total rows in award table) / 7726 (total rows in sponsor table) = 17 rows from the award table for each sponsor need to be read against which the other two conditionals need to be checked for a total of 17*260=40,120 rows. By starting with the key on the amount column to access the estimated 4310 rows and as each award row relates to a single sponsor row through the sponsor_nm foreign key, only a single additional read to access that sponsor row is added to the overall access plan cost (total of 8620 rows). This is clearly less costly than the plan that begins with the state key. Thus, if the estimates are accurate, the optimizer has indeed chosen the optimal execution plan for this query.

The accuracy of the cardinality estimates given in Table 14 can easily be checked using the following queries.

rdm-sql: select count(*) from sponsor where state in ("OR","WA");
             COUNT(*)
                  288
rdm-sql: select count(*) from award 
rdm-sql> where award_date between "2001-01-01" and "2001-12-31"; 
             COUNT(*)
                11861
rdm-sql: select count(*) from award where amount >= 1000000.00; 
             COUNT(*)
                 4861
Table 15. Example Conditional Optimization Factors
Conditional Cardinality Estimate Actual Count
state in ("OR","WA") 260 288
award_date between "2001-01-01" and "2001-12-31" 12633 11861
amount > 1000000.00 4310 4861

As you can see, the optimizer's estimates based on the column data distribution statistics collected by UPDATE STATS are reasonably accurate.