Discovering a Query's Access Plan
RDM SQL provides the ability to discover the execution plan chosen by the optimizer for a particular query. Function SQLShowPlan can be used to return a result set from one statement handle that describes the access (or execution) plan for a SELECT, UPDATE, or DELETE statement that has been compiled on another statement handle. You can also display this same result set from the rdm-sql utility using the ".X" command. The function SQLShowPlan takes two arguments. The first is the statement handle on which the result set will be returned. The second is the statement handle of the compiled SELECT, UPDATE, or DELETE statement whose access plan is to be returned. The result set returned on the first statement handle is described in the following table.
| SQLShowPlan | Type | Description |
|---|---|---|
| step_no | smallint | The step number of the plan beginning with 1. There is one step for each table listed in the FROM clause but not necessarily in that order. |
| table_cat | char(32) | The name of the database catalog. |
| table_schem | char(32) | The name of the database. Usually this is the same as the catalog but it is possible that there can be different databases that have the same catalog. |
| table_name | char(32) | The name of the table being accessed. If a correlation name was specified for the table then this will contain the correlation name. |
| method_name | char(16) | The name of the method used to access the rows of the table in this step. See table below. |
| key_column | char(32) | The name of the key column through which the rows will be accessed. |
| step_cost | bigint | The total estimated cost in logical reads incurred by this plan step. |
| rows_in | bigint | The number of candidate rows that are input to this plan step. |
| rows_out | bigint | The number of result rows that are output from this step. The difference between the rows_in and rows_out are the number of rows filtered by the WHERE clause conditionals that reference columns from this table. This is where the restriction factor is applied. |
| plan_cost | bigint | The total cost of the plan in logical reads. The value is the same for each step. |
The possible access methods returned in the method/method_name column are described in the next table.
| Method Name | Description |
|---|---|
| ROW COUNT | Returns the current row count for a table (select count(*) from table). |
| VIRTUAL TABLE | Access is to a virtual table. |
| ROWID SCAN | Performs a possibly delimited scan of a table in rowid order. |
| ROWID FIND | Finds a row by a rowid primary key value. |
| ROWID LIST | Finds the rows by rowid primary key that match each of a list (in) of values. |
| FK THRU PK RID | Finds the foreign key rows that match a rowid primary key value. |
| FK TO PK | Finds the primary key table row referenced by a particular foreign key row. |
| PK TO FK | Finds each referencing foreign key table row from its primary key row. |
| KEY FIND | Finds a row (or rows) by key with a specified value. |
| KEY SCAN | Performs a possibly delimited scan of a table through a key. |
| KEY LIST | Finds the rows by key that match each of a list (in) of values. |
| KEY LIKE | Finds the rows by key that match the like pattern. |
| FK THRU PK KEY | Finds the foreign key rows that match a primary key value. |
An example using the rdm-sql command-line utility is shown below.
Raima Database Manager SQL command processor Utility
Raima Database Manager 14.0.0 Build 1000 [8-1-2016] http://www.raima.com/
Copyright (c) 2016 Raima Inc., All rights reserved.
Enter ? for list of interface commands.
rdm-sql: open database nsfawards;
rdm-sql: set stats histosize=100 sampling=(20,100,25000) threads=4;
rdm-sql: update stats on nsfawards;
rdm-sql: select name, award_date, amount, title from sponsor, award
rdm-sql> where sponsor_nm = name and state in ("OR","WA")
rdm-sql> and award_date between "2001-01-01" and "2001-12-31"
rdm-sql> and amount >= 1000000.00;
name award_date amount title
The Institute for System B 2001-07-30 1000000 KDI: A Framework for Particle..
U of Washington 2001-06-08 1009466 CISE Experimental Partnerships: An..
. . .
Oregon Hlth and Science U 2001-10-01 3925000 ITR/IM+AP: Quality-Scalable Informa..
U of Oregon Eugene 2001-08-08 5000000 Cryogenic Helium Turbulence Research
U of Washington 2001-05-23 5895234 Ship Operations
*** 20 row(s) returned
rdm-sql: .x
*** advanced command mode is on
rdm-sql: .S
STEP_NO TABLE_NAME METHOD_NAME KEY_COLUMN STEP_COST ROWS_IN ROWS_OUT PLAN_COST
1 award KEY SCAN amount 4406 4309 411 4612
2 sponsor FK TO PK sponsor_nm 205 411 13 4612
(Note that the TABLE_CAT and TABLE_SCHEM columns are not shown so that the important data can easily fit into a single row table.) To see how the optimizer arrived at this access plan choice we need to consider the choices available to it. Besides the obligatory join predicate, the WHERE clause specifies three other conditionals. Two on the award table and one on the sponsor table. Each references a column on which a key exists and, thus, each key is a candidate access method. The restriction factors for each conditional can be computed based on the histogram totals gathered from the UPDATE STATS statement which was earlier executed on the nsfawards database. The following two queries compute the restriction factor for the state in ("OR","WA") conditional.
rdm-sql: select norows, nosampled from systats.systabstats where tabname="sponsor";
NOROWS NOSAMPLED
7726 1545
rdm-sql: select SUM(eq_total)/1545.0
rdm-sql> from systabstats natural join syscolstats natural join syscolval
rdm-sql> where tabname="sponsor" and colname="state"
rdm-sql> and val in ("OR","WA");
SUM(EQ_TOTAL)/1545.0
0.0336569579288026
Thus, this means that the optimizer estimates that .03366 * 7726 = 260 rows from the sponsor table satisfy this conditional. Note that all of the sampled rows that match values in the histogram contribute to the eq_total values. Hence, the restriction factor for equality comparisons is the sum of the eq_total for the matching values divided by the number of sampled rows.
Similarly, the following queries compute both the restriction factor and the cardinality estimate (i.e., an estimate of the number of rows from the award table that satisfy the conditional) for the award_date between "2001-01-01" and "2001-12-31" condition.
rdmsql: select nosampled from sys$tabstats where tabname="award";
NOSAMPLED
25000
rdm-sql: select (sum(eq_total)+sum(ne_total))/25000.0 "Rest. Factor"
rdm-sql> from syscolstats natural join syscolval
rdm-sql> where colname = "award_date"
rdm-sql> and val between "2001-01-01" and "2001-12-31";
Rest. Factor
0.09544
The cardinality estimate for this restriction factor is 0.09544*132365 = 12,633 rows. .
Finally, the following query allows us to compute the restriction factor and the cardinality estimate for the amount > 1000000.00 condition.
rdm-sql: select eq_total, ne_total, val from syscolstats natural join syscolval
rdm-sql> where colname = "amount";
eq_total ne_total val
...
10 10 1000000
1 89 1017028
1 7 1141200
1 224 1153211
1 104 1614111
4 233 2000000
1 19 5367516
1 108 6100000
(As the val column is a character column, a direct comparison using the amount > 1000000.00 condition does not work as a numeric comparison is needed). From this query (only the relevant val column values are displayed), you can see that there are 7 entries in the histogram for the column which satisfy the subject conditional with a total eq_total + ne_total value of 814 rows. Hence, the restriction factor for the amount conditional is 814/25000 = .03256 which on the 132365 total rows in the amount table gives a cardinality estimate of 0.03256 * 132365 = 4310 rows.
The following table summarizes the results found so far.
| Conditional | Restriction Factor | Cardinality Estimate |
|---|---|---|
| state in ("OR","WA") | .03366 | 260 |
| award_date between "2001-01-01" and "2001-12-31" | .09544 | 12633 |
| amount > 1000000.00 | .03256 | 4310 |
Usually, the overall cost estimates track with the cardinality estimates so that the methods that access the fewest rows are the cheapest. From the above table it would seem that use of the key on state to access only 260 rows would be cheaper than the other two keys. However, it is important to note that there are two tables involved in this query. By starting with the key on state in order to retrieve the estimated 260 sponsors located in Oregon and Washington, an average of 132,365 (total rows in award table) / 7726 (total rows in sponsor table) = 17 rows from the award table for each sponsor need to be read against which the other two conditionals need to be checked for a total of 17*260=40,120 rows. By starting with the key on the amount column to access the estimated 4310 rows and as each award row relates to a single sponsor row through the sponsor_nm foreign key, only a single additional read to access that sponsor row is added to the overall access plan cost (total of 8620 rows). This is clearly less costly than the plan that begins with the state key. Thus, if the estimates are accurate, the optimizer has indeed chosen the optimal execution plan for this query.
The accuracy of the cardinality estimates given in Table 14 can easily be checked using the following queries.
rdm-sql: select count(*) from sponsor where state in ("OR","WA");
COUNT(*)
288
rdm-sql: select count(*) from award
rdm-sql> where award_date between "2001-01-01" and "2001-12-31";
COUNT(*)
11861
rdm-sql: select count(*) from award where amount >= 1000000.00;
COUNT(*)
4861
| Conditional | Cardinality Estimate | Actual Count |
|---|---|---|
| state in ("OR","WA") | 260 | 288 |
| award_date between "2001-01-01" and "2001-12-31" | 12633 | 11861 |
| amount > 1000000.00 | 4310 | 4861 |
As you can see, the optimizer's estimates based on the column data distribution statistics collected by UPDATE STATS are reasonably accurate.