A Language for Describing a Language
SQL stands for "Structured Query Language". You have probably seen many different methods used in programming manuals to show how to use a specific programming language. The two most common methods use syntax flow diagrams and what is known as Backus-Naur Form (BNF) which is a formal language for describing a programming language. In this document we use a simplified BNF method that seeks to represent the language in a way that closely matches the way you will code your own SQL statements for your application.
For example, the following SELECT statement:
select sale_name, company, city, state from salesperson natural join customer;
can be described by this syntax rule:
select_stmt:
SELECT identifier[, identifier]… FROM identifier [NATURAL JOIN identifier]
where "select_stmt
" is the name of the rule (sometimes called a non-terminal); the uppercase identifiers SELECT, FROM, NATURAL, and JOIN are key words (sometimes called terminal symbols); identifier is like a function argument that stands in place of a user-specified value (technically, it too is the name of a rule that is matched by any user-specified value that begins with a letter followed by any sequence consisting of letters, digits, and the underscore ("_") character). Rule names are identifiers and their definitions are specified by giving the rule name beginning in column 1 and terminating the rule with a colon (":") as shown above.
There are also special meta-symbols that are part of the syntax descriptor language. Two are shown in the above select_stmt
syntax rule. The brackets ("[" and "]") enclose optional elements. The ellipsis ("…") specifies that the preceding item can be repeated zero or more times. Other meta-symbols include a vertical bar (i.e., an "or" symbol) that is used to separate alternative elements and braces ("{" and "}") which enclose a set of alternatives from which one must always be matched. All other special characters (e.g., the "," and ";" in the select_stmt
rule) are considered to be part of the language definition. Meta-symbols that are themselves part of the language will be enclosed in single quotes (e.g., '[') in the syntax rule.
Rule names can be used in other rules. For example, the syntax for a stored procedure that can contain multiple SELECT statements could be described by the following rule:
create_proc:
CREATE PROCEDURE identifier AS
select_stmt[; select_stmt]…
END PROC
In order to make the syntax more readable, any non-bold, italicized name is considered to be matched as an identifier. Thus, the select_stmt rule can also be written as follows…
select_stmt:
SELECT column_name[, column_name]… FROM table_name [NATURALJOIN table_name]
where column_name represents identifiers that correspond to table column names and table_name represents identifiers that correspond to table names.
Some italicized terms are used to match specific text patterns. E.g., number matches any text pattern that can be used to represent a number (either integer or decimal) and integer matches any pattern that represents an integer number.
These rules are summarized in the table below.
Syntax Element | Description |
---|---|
KEYWORD | Uppercase words that identify the special words used in the language that specify actions and usage. Sometimes called reserved words. Examples, SELECT, INSERT, CREATE, USING. |
identifier | Italicized word corresponding to an identifier: sequences of letters, digits, and "_" that begin with a letter. |
number | Any text that corresponds to an integer or decimal number. |
integer | Any text that corresponds to an integer. |
[option1 | option2] | A selection in which either nothing or option1 or option2 is specified. |
{option1 | option2} | Either option1 or option2 must be specified. |
element… | Repeat element zero or more times. |
identifier | Normal-faced identifiers correspond to the names of syntax rules. Syntax rules are defined by the name starting in column 1 and ending with a ":". |