select

Retrieve a set of rows of data from the database

Syntax

select_stmt:
          SELECT [FIRST | ALL | DISTINCT] select_item [, select_item]...
               FROM table_ref [, table_ref]...
               [WHERE conditional_expr]
               [grouping | sorting | grouping sorting]
               [LIMIT (num {ROWS | MINS | SECS | MSECS})]
               [FOR UPDATE [OF column_name [, column_name]...]]

conditional_expr:
          relational_expr [{ AND | OR} relational_expr]...

sub_query:
          ( sub_select )

sub_select:
          SELECT {* | named_expr} FROM table_ref [, table_ref]...
               [WHERE conditional_expr] [grouping]

Description

The SELECT statement retrieves a subset of data (the result set) from a table or tables. The result set contains rows that satisfy a conditional expression (where clause). If there is no condition for the WHERE clause, the SELECT statement retrieves all rows from the table or tables. If the SELECT statement includes a GROUP BY clause, only rows that satisfy the WHERE clause are reflected in grouping calculations.

A SELECT FIRST only returns the first row of the result set. A SELECT DISTINCT will eliminate duplicate rows from the result set. Note that this necessarily requires that the rows first be sorted and can be quite an expensive (i.e., time consuming) operation and should be avoided unless absolutely necessary. The default behavior is SELECT ALL which returns all of the rows of the result set.

The select_item expressions can optionally be given an alias or alternate column heading. The [table_name.]* select_item replaces the * with each column contained in table_name or, if not specified, all tables listed in the FROM clause.

If a select_item specifies a sub_query then that subquery must return just 1 result row value.

A select_item with a specified alias_name instantiates the alias_name as a local variable which contains the current row’s value of that column expression. The alias_name can then be used in subsequent expressions in the SELECT statement for computational use. This is of particular use when the select_item is a subquery.

A "SELECT COUNT(*) FROM table_ref" returns the cardinality (the number of rows) of the specified table. RDM automatically maintains this count for every table allowing the count to be returned from a single call that does not require a database access. SQL fetches the count value without having to apply a lock to the table. Hence, the value that is fetched is "dirty"—meaning that it will account for any uncommitted INSERT or DELETE statements that have been executed by other connections. In order to fetch a "clean" value, the application should execute an explicit LOCK TABLE statement prior to the "SELECT COUNT(*) FROM table_ref".

The NATURAL JOIN specification indicates that the join is to be performed based on the common columns (names and types) from the two tables. The join is based on the columns from the table (or tables) specified on the left side of "NATURAL ... JOIN" with those columns from the table (or tables) on the right side that have the same name. Note that the SQL standard requires that a "SELECT *..." on a NATURAL JOIN eliminate each redundant columns (one of the common columns) from the result set. However, RDM SQL does not currently do so. Hence, it is recommended that you do not rely on the RDM SQL behavior in this instance as a future version will likely conform to the SQL standard.

A qualified join is like a NATURAL JOIN except that it requires that the columns on which the join is to be formed be explicitly specified. Two specification methods are provided. The USING clause requires you to name the common column names between the joined tables which are to be used to form the join allowing you to choose only the matching columns on which you want the join formed. The ON clause requires you to specify the join predicates as conditional expressions exactly as they would be specified in the WHERE clause. The ON clause is necessary whenever the join is to be performed between columns that do not have the same name.

A cross join is simply a cross product of the two tables where each row of the left table is joined with each row of the right table so that the cardinality of the result (i.e., the number of result rows) is equal to the product of the cardinality of the two tables. An ON clause cannot be specified with a cross join. However, there is nothing that restricts including join conditions in the WHERE clause. In practice, there are very few times when a cross join is needed and since it can be a very expensive operation that can potentially produce huge result sets, its use should be avoided.

Parentheses are sometimes needed to be used to group joins when more than two tables are involved in the FROM clause. They are required when one table needs to be joined with two or more tables.

The GROUP BY clause defines a set of aggregate rows upon which computations are to be made. An aggregate consists of those rows that have identical values in the columns that are named in the group by specification. Each of the other selected columns should either have a unique value within each aggregate or be a computation that uses of one or more aggregate functions (SUM, AVG, MIN, MAX, COUNT, or an aggregate UDF). Only one row is reported for each aggregate resulting from the SELECT.

The HAVING clause is similar to the WHERE clause in that it is used to conditionally select which resultant rows will be reported. However, the HAVING conditional expression is not evaluated until after the GROUP BY processing has been performed. The conditional expression will include comparisons that typically involve the aggregate functions in the SELECT column list.

The ORDER BY clause allows you to specify the column or columns on which to sort the result set. The num is the ordinal position of the SELECT expression on which to sort where num = 1 refers to the first expression. The column_name is either the specified alias_name or the name of a column from the specified table_name or the first one of the tables referenced in the FROM clause which contains a column with that name. The default sort order is ASC (ascending) but DESC (descending) can be specified to reverse the order. If more than one order by column is specified each subsequent column specifies a sort order within each value from the outer sort column(s). If SELECT DISTINCT is specified, duplicate rows in the result set will be eliminated.

The LIMIT clause can be specified to limit either the number of rows that are returned or the amount of time the SELECT statement is allowed to run. This feature is particularly useful when retrieving data from a virtual table which may represent a never-ending source of data (such as from a weather sensor network).

The FOR UPDATE clause indicates that the SELECT statement is updateable by a positioned UPDATE on a separate statement handle in the same connection that references the cursor name associated with this SELECT.. An updateable SELECT is one for which the SELECT result expressions are only simple column names, only one table is listed in the FROM clause, and no ORDER BY clause is specified. If an OF column name list clause is specified then only those SELECT result columns can be updated. If the OF column name list clause is not specified then any of the SELECT result columns can be updated. Any columns declared in the table can be referenced in the associated UPDATE (i.e., used in the SET assignment of one of the updateable columns). The cursor name associated with the SELECT statement can be set by a call to function SQLSetCursorName or the system-generated cursor name can be retrieved through a call to SQLGetCursorName. The cursor name needs to be specified in the WHERE CURRENT OF clause of the related positioned UPDATE statement.

Example

select name, sum(amount) from sponsor join award on sponsor_nm = name
    group by name order by 2 desc;
	...
select sum(if(gender="M",1,0)) men, sum(if(gender="F",1,0)) women 
 	from award natural join investigator natural join person;
	...	
select loc_long, loc_lat, convert(rdg_time,date), hour(rdg_time), 
 		avg(temperature), avg(pressure), avg(humidity), avg(light) from weather_data
 		group by 1,2,4 
 		limit(4 hours);
	...
select bookid, publ_year, last_name, title from book where publ_year < 1800;
	...
select aucid, count(*) from auction natural join bid where start_date = curdate()
	group by 1;
	...