A ZetaSQL statement comprises a series of tokens. Tokens include identifiers, quoted identifiers, literals, keywords, operators, and special characters. You can separate tokens with whitespace (for example, space, backspace, tab, newline) or comments.
Identifiers are names that are associated with columns, tables, and other database objects. They can be unquoted or quoted.
- Identifiers can be used in path expressions that return a
STRUCT
orPROTO
. - Some identifiers are case-sensitive and some are not. For details, see Case Sensitivity.
- Unquoted identifiers must begin with a letter or an underscore character. Subsequent characters can be letters, numbers, or underscores.
- Quoted identifiers must be enclosed by backtick (`) characters.
- Quoted identifiers can contain any character, such as spaces or symbols.
- Quoted identifiers cannot be empty.
- Quoted identifiers have the same escape sequences as string literals.
- A reserved keyword must be a quoted identifier if it is a standalone keyword or the first component of a path expression. It may be unquoted as the second or later component of a path expression.
- Table name identifiers have additional syntax to support dashes (-)
when referenced in
FROM
andTABLE
clauses.
Examples
These are valid identifiers:
Customers5
`5Customers`
dataField
_dataField1
ADGROUP
`tableName~`
`GROUP`
These path expressions contain valid identifiers:
foo.`GROUP`
foo.GROUP
foo().dataField
(foo).dataField
list[OFFSET(3)].dataField
list[ORDINAL(3)].dataField
@parameter.dataField
These are invalid identifiers:
5Customers
_dataField!
GROUP
5Customers
begins with a number, not a letter or underscore. _dataField!
contains the special character "!" which is not a letter, number, or underscore.
GROUP
is a reserved keyword, and therefore cannot be used as an identifier
without being enclosed by backtick characters.
You do not need to enclose table names that include dashes with backticks. These are equivalent:
SELECT * FROM data-customers-287.mydatabase.mytable
SELECT * FROM `data-customers-287`.mydatabase.mytable
A literal represents a constant value of a built-in data type. Some, but not all, data types can be expressed as literals.
Both string and bytes literals must be quoted, either with single ('
) or
double ("
) quotation marks, or triple-quoted with groups of three single
('''
) or three double ("""
) quotation marks.
Quoted literals:
Literal | Examples | Description |
---|---|---|
Quoted string |
|
Quoted strings enclosed by single (' ) quotes can contain unescaped double (" ) quotes, as well as the inverse. Backslashes ( \ ) introduce escape sequences. See the Escape Sequences table below.Quoted strings cannot contain newlines, even when preceded by a backslash ( \ ). |
Triple-quoted string |
|
Embedded newlines and quotes are allowed without escaping - see fourth example. Backslashes ( \ ) introduce escape sequences. See Escape Sequences table below.A trailing unescaped backslash ( \ ) at the end of a line is not allowed.End the string with three unescaped quotes in a row that match the starting quotes. |
Raw string |
|
Quoted or triple-quoted literals that have the raw string literal prefix (r or R ) are interpreted as raw/regex strings.Backslash characters ( \ ) do not act as escape characters. If a backslash followed by another character occurs inside the string literal, both characters are preserved.A raw string cannot end with an odd number of backslashes. Raw strings are useful for constructing regular expressions. |
Prefix characters (r
, R
, b
, B)
are optional for quoted or triple-quoted strings, and indicate that the string is a raw/regex string or a byte sequence, respectively. For
example, b'abc'
and b'''abc'''
are both interpreted as type bytes. Prefix characters are case insensitive.
Quoted literals with prefixes:
Literal | Example | Description |
---|---|---|
Bytes |
|
Quoted or triple-quoted literals that have the bytes literal prefix (b or B ) are interpreted as bytes. |
Raw bytes |
|
The r and b prefixes can be combined in any order. For example, rb'abc*' is equivalent to br'abc*' . |
The table below lists all valid escape sequences for representing non-alphanumeric characters in string and byte literals. Any sequence not in this table produces an error.
Escape Sequence | Description |
---|---|
\a |
Bell |
\b |
Backspace |
\f |
Formfeed |
\n |
Newline |
\r |
Carriage Return |
\t |
Tab |
\v |
Vertical Tab |
\\ |
Backslash (\ ) |
\? |
Question Mark (? ) |
\" |
Double Quote (" ) |
\' |
Single Quote (' ) |
\` |
Backtick (` ) |
\ooo |
Octal escape, with exactly 3 digits (in the range 0–7). Decodes to a single Unicode character (in string literals) or byte (in bytes literals). |
\xhh or \Xhh |
Hex escape, with exactly 2 hex digits (0–9 or A–F or a–f). Decodes to a single Unicode character (in string literals) or byte (in bytes literals). Examples:
|
\uhhhh |
Unicode escape, with lowercase 'u' and exactly 4 hex digits. Valid only in string literals or identifiers. Note that the range D800-DFFF is not allowed, as these are surrogate unicode values. |
\Uhhhhhhhh |
Unicode escape, with uppercase 'U' and exactly 8 hex digits. Valid only in string literals or identifiers. The range D800-DFFF is not allowed, as these values are surrogate unicode values. Also, values greater than 10FFFF are not allowed. |
Integer literals are either a sequence of decimal digits (0–9) or a hexadecimal
value that is prefixed with "0x
" or "0X
". Integers can be prefixed by "+
"
or "-
" to represent positive and negative values, respectively.
Examples:
123
0xABC
-123
An integer literal is interpreted as an INT64
.
Coercion (implicit casting) of integer literals to other integer types can occur
if casting does not result in truncation. For example, if the integer 55 of type
INT32
is compared to the integer literal 77, the
literal value 77 is coerced into type INT32
because
77
can be represented by the INT32
type.
You can construct NUMERIC literals using the
NUMERIC
keyword followed by a floating point value in quotes.
Examples:
SELECT NUMERIC '0';
SELECT NUMERIC '123456';
SELECT NUMERIC '-3.14';
SELECT NUMERIC '-0.54321';
SELECT NUMERIC '1.23456e05';
SELECT NUMERIC '-9.876e-3';
You can construct BIGNUMERIC
literals using the BIGNUMERIC
keyword followed
by a floating point value in quotes.
Examples:
SELECT BIGNUMERIC '0';
SELECT BIGNUMERIC '123456';
SELECT BIGNUMERIC '-3.14';
SELECT BIGNUMERIC '-0.54321';
SELECT BIGNUMERIC '1.23456e05';
SELECT BIGNUMERIC '-9.876e-3';
Syntax options:
[+-]DIGITS.[DIGITS][e[+-]DIGITS]
[DIGITS].DIGITS[e[+-]DIGITS]
DIGITSe[+-]DIGITS
DIGITS
represents one or more decimal numbers (0 through 9) and e
represents the exponent marker (e or E).
Examples:
123.456e-67
.1E4
58.
4e2
Numeric literals that contain either a decimal point or an exponent marker are presumed to be type double.
Implicit coercion of floating point literals to float type is possible if the value is within the valid float range.
There is no literal representation of NaN or infinity, but the following case-insensitive strings can be explicitly cast to float:
- "NaN"
- "inf" or "+inf"
- "-inf"
Array literals are comma-separated lists of elements
enclosed in square brackets. The ARRAY
keyword is optional, and an explicit
element type T is also optional.
You can write an empty array of a specific type using ARRAY<type>[]
. You can
also write an untyped empty array using []
, in which case ZetaSQL
attempts to infer the array type from the surrounding context. If
ZetaSQL cannot infer a type, the default type ARRAY<INT64>
is used.
Examples:
[1, 2, 3]
['x', 'y', 'xy']
ARRAY[1, 2, 3]
ARRAY<string>['x', 'y', 'xy']
ARRAY<int64>[]
[]
Syntax:
(elem[, elem...])
where elem
is an element in the struct. elem
must be a literal data type, not an expression or column name.
The output type is an anonymous struct type (structs are not named types) with anonymous fields with types matching the types of the input expressions.
Example | Output Type |
---|---|
(1, 2, 3) |
STRUCT<int64,int64,int64> |
(1, 'abc') |
STRUCT<int64,string> |
Syntax:
DATE 'YYYY-M[M]-D[D]'
Date literals contain the DATE
keyword followed by a string literal that conforms to the canonical date format, enclosed in single quotation marks. Date literals support a range between the
years 1 and 9999, inclusive. Dates outside of this range are invalid.
For example, the following date literal represents September 27, 2014:
DATE '2014-09-27'
String literals in canonical date format also implicitly coerce to DATE type when used where a DATE-type expression is expected. For example, in the query
SELECT * FROM foo WHERE date_col = "2014-09-27"
the string literal "2014-09-27"
will be coerced to a date literal.
Syntax:
TIME '[H]H:[M]M:[S]S[.DDDDDD]]'
Time literals contain the TIME
keyword and a string literal that conforms to
the canonical time format, enclosed in single quotation marks.
For example, the following time represents 12:30 p.m.:
TIME '12:30:00.45'
Syntax:
DATETIME 'YYYY-[M]M-[D]D [[H]H:[M]M:[S]S[.DDDDDD]]'
Datetime literals contain the DATETIME
keyword and a string literal that
conforms to the canonical datetime format, enclosed in single quotation marks.
For example, the following datetime represents 12:30 p.m. on September 27, 2014:
DATETIME '2014-09-27 12:30:00.45'
Datetime literals support a range between the years 1 and 9999, inclusive. Datetimes outside of this range are invalid.
String literals with the canonical datetime format implicitly coerce to a datetime literal when used where a datetime expression is expected.
For example:
SELECT * FROM foo
WHERE datetime_col = "2014-09-27 12:30:00.45"
In the query above, the string literal "2014-09-27 12:30:00.45"
is coerced to
a datetime literal.
A datetime literal can also include the optional character T
or t
. This
is a flag for time and is used as a separator between the date and time. If
you use this character, a space can't be included before or after it.
These are valid:
DATETIME '2014-09-27T12:30:00.45'
DATETIME '2014-09-27t12:30:00.45'
Syntax:
TIMESTAMP 'YYYY-[M]M-[D]D [[H]H:[M]M:[S]S[.DDDDDD] [timezone]]'
Timestamp literals contain the TIMESTAMP
keyword and a string literal that
conforms to the canonical timestamp format, enclosed in single quotation marks.
Timestamp literals support a range between the years 1 and 9999, inclusive. Timestamps outside of this range are invalid.
A timestamp literal can include a numerical suffix to indicate the time zone:
TIMESTAMP '2014-09-27 12:30:00.45-08'
If this suffix is absent, the default time zone, which is implementation defined, is used.
For example, the following timestamp represents 12:30 p.m. on September 27, 2014 in the default time zone, which is implementation defined:
TIMESTAMP '2014-09-27 12:30:00.45'
For more information about time zones, see Time zone.
String literals with the canonical timestamp format, including those with
time zone names, implicitly coerce to a timestamp literal when used where a
timestamp expression is expected. For example, in the following query, the
string literal "2014-09-27 12:30:00.45 America/Los_Angeles"
is coerced
to a timestamp literal.
SELECT * FROM foo
WHERE timestamp_col = "2014-09-27 12:30:00.45 America/Los_Angeles"
A timestamp literal can include these optional characters:
T
ort
: A flag for time. Use as a separator between the date and time.Z
orz
: A flag for the default timezone. This cannot be used with[timezone]
.
If you use one of these characters, a space can't be included before or after it. These are valid:
TIMESTAMP '2017-01-18T12:34:56.123456Z'
TIMESTAMP '2017-01-18t12:34:56.123456'
TIMESTAMP '2017-01-18 12:34:56.123456z'
TIMESTAMP '2017-01-18 12:34:56.123456Z'
Since timestamp literals must be mapped to a specific point in time, a time zone is necessary to correctly interpret a literal. If a time zone is not specified as part of the literal itself, then ZetaSQL uses the default time zone value, which the ZetaSQL implementation sets.
ZetaSQL represents time zones using strings in the following canonical format, which represents the offset from Coordinated Universal Time (UTC).
Format:
(+|-)H[H][:M[M]]
Examples:
'-08:00'
'-8:15'
'+3:00'
'+07:30'
'-7'
Time zones can also be expressed using string time zone names from the
tz database{: class=external target=_blank }. For a less
comprehensive but simpler reference, see the
List of tz database time zones{: class=external target=_blank }
on Wikipedia. Canonical time zone names have the format
<continent/[region/]city>
, such as America/Los_Angeles
.
Note: Not all time zone names are interchangeable even if they do happen to
report the same time during a given part of the year. For example, America/Los_Angeles
reports the same time as UTC-7:00
during Daylight Savings Time, but reports the same time as UTC-8:00
outside of Daylight Savings Time.
Example:
TIMESTAMP '2014-09-27 12:30:00 America/Los_Angeles'
TIMESTAMP '2014-09-27 12:30:00 America/Argentina/Buenos_Aires'
There is no syntax for enum literals, but integer or string literals will coerce to enum type when necessary, or can be explicitly CAST to a specific enum type name. See "Literal Coercion" in Expressions, Functions, and Operators.
ZetaSQL follows these rules for case sensitivity:
Category | Case Sensitive? | Notes |
---|---|---|
Keywords | No | |
Function names | No | |
Table names | See Notes | Table names are usually case insensitive, but may be case sensitive when querying a database that uses case-sensitive table names. |
Column names | No | |
All type names except for protocol buffer type names | No | |
Protocol buffer type names | Yes | |
String values | Yes | Includes enum value strings |
String comparisons | Yes | |
Aliases within a query | No | |
Regular expression matching | See Notes | Regular expression matching is case sensitive by default, unless the expression itself specifies that it should be case insensitive. |
LIKE matching |
Yes |
Keywords are a group of tokens that have special meaning in the ZetaSQL language, and have the following characteristics:
- Keywords cannot be used as identifiers unless enclosed by backtick (`) characters.
- Keywords are case insensitive.
ZetaSQL has the following reserved keywords.
ALL AND ANY ARRAY AS ASC ASSERT_ROWS_MODIFIED AT BETWEEN BY CASE CAST COLLATE CONTAINS CREATE CROSS CUBE CURRENT DEFAULT DEFINE DESC DISTINCT ELSE END |
ENUM ESCAPE EXCEPT EXCLUDE EXISTS EXTRACT FALSE FETCH FOLLOWING FOR FROM FULL GROUP GROUPING GROUPS HASH HAVING IF IGNORE IN INNER INTERSECT INTERVAL INTO |
IS JOIN LATERAL LEFT LIKE LIMIT LOOKUP MERGE NATURAL NEW NO NOT NULL NULLS OF ON OR ORDER OUTER OVER PARTITION PRECEDING PROTO RANGE |
RECURSIVE RESPECT RIGHT ROLLUP ROWS SELECT SET SOME STRUCT TABLESAMPLE THEN TO TREAT TRUE UNBOUNDED UNION UNNEST USING WHEN WHERE WINDOW WITH WITHIN |
You can optionally use a terminating semicolon (;
) when you submit a query
string statement through an Application Programming Interface (API).
In a request containing multiple statements, you must separate statements with semicolons, but the semicolon is generally optional after the final statement. Some interactive tools require statements to have a terminating semicolon.
You can optionally use a trailing comma (,
) at the end of a column list in a
SELECT
statement. You might have a trailing comma as the result of
programmatically creating a column list.
Example
SELECT name, release_date, FROM Books
You can use query parameters to substitute arbitrary expressions. However, query parameters cannot be used to substitute identifiers, column names, table names, or other parts of the query itself. Query parameters are defined outside of the query statement.
Client APIs allow the binding of parameter names to values; the query engine substitutes a bound value for a parameter at execution time.
Query parameters cannot be used in the SQL body of these statements:
CREATE FUNCTION
, CREATE TABLE FUNCTION
, CREATE VIEW
, CREATE MATERIALIZED VIEW
, and CREATE PROCEDURE
.
Syntax:
@parameter_name
A named query parameter is denoted using an identifier
preceded by the @
character. Named query
parameters cannot be used alongside positional query
parameters.
Example:
This example returns all rows where LastName
is equal to the value of the
named query parameter myparam
.
SELECT * FROM Roster WHERE LastName = @myparam
Positional query parameters are denoted using the ?
character.
Positional parameters are evaluated by the order in which they are passed in.
Positional query parameters cannot be used
alongside named query parameters.
Example:
This query returns all rows where LastName
and FirstName
are equal to the
values passed into this query. The order in which these values are passed in
matters. If the last name is passed in first, followed by the first name, the
expected results will not be returned.
SELECT * FROM Roster WHERE FirstName = ? and LastName = ?
@{ hint [, ...] }
hint:
[engine_name.]hint_name = value
The purpose of a hint is to modify the execution strategy for a query without changing the result of the query. Hints generally do not affect query semantics, but may have performance implications.
Hint syntax requires the @
character followed by curly braces.
You can create one hint or a group of hints. The optional engine_name.
prefix allows for multiple engines to define hints with the same hint_name
.
This is important if you need to suggest different engine-specific
execution strategies or different engines support different hints.
You can assign identifiers and literals to hints.
- Identifiers are useful for hints that are meant to act like enums.
You can use an identifier to avoid using a quoted string.
In the resolved AST, identifier hints are represented as string literals,
so
@{hint="abc"}
is the same as@{hint=abc}
. Identifier hints can also be used for hints that take a table name or column name as a single identifier. - NULL literals are allowed and are inferred as integers.
Hints are meant to apply only to the node they are attached to, and not to a larger scope.
Examples
In this example, a literal is assigned to a hint. This hint is only used
with two database engines called database_engine_a
and database_engine_b
.
The value for the hint is different for each database engine.
@{ database_engine_a.file_count=23, database_engine_b.file_count=10 }
Comments are sequences of characters that the parser ignores. ZetaSQL supports the following types of comments.
Use a single-line comment if you want the comment to appear on a line by itself.
Examples
# this is a single-line comment
SELECT book FROM library;
-- this is a single-line comment
SELECT book FROM library;
/* this is a single-line comment */
SELECT book FROM library;
SELECT book FROM library
/* this is a single-line comment */
WHERE book = "Ulysses";
Use an inline comment if you want the comment to appear on the same line as
a statement. A comment that is prepended with #
or --
must appear to the
right of a statement.
Examples
SELECT book FROM library; # this is an inline comment
SELECT book FROM library; -- this is an inline comment
SELECT book FROM library; /* this is an inline comment */
SELECT book FROM library /* this is an inline comment */ WHERE book = "Ulysses";
Use a multiline comment if you need the comment to span multiple lines. Nested multiline comments are not supported.
Examples
SELECT book FROM library
/*
This is a multiline comment
on multiple lines
*/
WHERE book = "Ulysses";
SELECT book FROM library
/* this is a multiline comment
on two lines */
WHERE book = "Ulysses";