ZetaSQL supports simple data types such as integers, as well as more complex types such as ARRAY, PROTO, and STRUCT. This page provides an overview of each data type, including allowed values. For information on data type literals and constructors, see Lexical Structure and Syntax.
When storing and querying data, it is helpful to keep the following data type properties in mind:
Property | Description | Applies To |
---|---|---|
Nullable | NULL is a valid value. |
All data types. |
Orderable | Can be used in an ORDER BY clause. |
All data types except for:
|
Groupable | Can generally appear in an expression followingGROUP BY ,
DISTINCT , and PARTITION BY .However, PARTITION BY expressions cannot includefloating point types. |
All data types except for:
|
Comparable | Values of the same type can be compared to each other. | All data types, with the following exceptions:
JOIN Types for an explanation of join conditions. |
Name | Description |
---|---|
ARRAY |
Ordered list of zero or more elements of any non-ARRAY type. |
An ARRAY is an ordered list of zero or more elements of non-ARRAY values.
ARRAYs of ARRAYs are not allowed. Queries that would produce an ARRAY of
ARRAYs will return an error. Instead a STRUCT must be inserted between the
ARRAYs using the SELECT AS STRUCT
construct.
An empty ARRAY and a NULL
ARRAY are two distinct values. ARRAYs can contain
NULL
elements.
ARRAY<T>
ARRAY types are declared using the angle brackets (<
and >
). The type
of the elements of an ARRAY can be arbitrarily complex with the exception that
an ARRAY cannot directly contain another ARRAY.
Examples
Type Declaration | Meaning |
---|---|
ARRAY<INT64>
|
Simple ARRAY of 64-bit integers. |
ARRAY<STRUCT<INT64, INT64>>
|
An ARRAY of STRUCTs, each of which contains two 64-bit integers. |
ARRAY<ARRAY<INT64>>
(not supported) |
This is an invalid type declaration which is included here just in case you came looking for how to create a multi-level ARRAY. ARRAYs cannot contain ARRAYs directly. Instead see the next example. |
ARRAY<STRUCT<ARRAY<INT64>>>
|
An ARRAY of ARRAYS of 64-bit integers. Notice that there is a STRUCT between the two ARRAYs because ARRAYs cannot hold other ARRAYs directly. |
Name | Description |
---|---|
BOOL |
Boolean values are represented by the keywords TRUE and
FALSE (case insensitive). |
Boolean values are sorted in this order, from least to greatest:
NULL
FALSE
TRUE
Name | Description |
---|---|
BYTES |
Variable-length binary data. |
STRING and BYTES are separate types that cannot be used interchangeably. Most functions on STRING are also defined on BYTES. The BYTES version operates on raw bytes rather than Unicode characters. Casts between STRING and BYTES enforce that the bytes are encoded using UTF-8.
Name | Range |
---|---|
DATE |
0001-01-01 to 9999-12-31. |
The DATE type represents a logical calendar date, independent of time zone. A DATE value does not represent a specific 24-hour time period. Rather, a given DATE value represents a different 24-hour period when interpreted in different time zones, and may represent a shorter or longer day during Daylight Savings Time transitions. To represent an absolute point in time, use a timestamp.
'YYYY-[M]M-[D]D'
YYYY
: Four-digit year[M]M
: One or two digit month[D]D
: One or two digit day
Name | Range |
---|---|
DATETIME |
|
A DATETIME object represents a date and time, as they might be displayed on a calendar or clock, independent of time zone. It includes the year, month, day, hour, minute, second, and subsecond. The range of subsecond precision is determined by the SQL engine. To represent an absolute point in time, use a timestamp.
YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.DDDDDD|.DDDDDDDDD]]
YYYY
: Four-digit year[M]M
: One or two digit month[D]D
: One or two digit day( |T)
: A space or a `T` separator[H]H
: One or two digit hour (valid values from 00 to 23)[M]M
: One or two digit minutes (valid values from 00 to 59)[S]S
: One or two digit seconds (valid values from 00 to 59)
<li><code>[.DDDDDDDDD|.DDDDDD]</code>: Up to six or nine fractional digits (microsecond or nanosecond precision)</li>
Name | Description |
---|---|
ENUM |
Named type that maps STRING constants to INT32 constants. |
An ENUM is a named type that enumerates a list of possible values, each of which has:
- An integer value. Integers are used for comparison and ordering ENUM values. There is no requirement that these integers start at zero or that they be contiguous.
- A string value. Strings are case sensitive.
- Optional alias values. One or more additional string values that act as aliases.
Enum values are referenced using their integer value or their string value. You reference an ENUM type, such as when using CAST, by using its fully qualified name.
You cannot create new ENUM types using ZetaSQL.
Numeric types include the following types:
INT32
UINT32
INT64
UINT64
NUMERIC
with aliasDECIMAL
BIGNUMERIC
with aliasBIGDECIMAL
FLOAT
DOUBLE
Integers are numeric values that do not have fractional components.
Name | Range |
---|---|
INT32 |
-2,147,483,648 to 2,147,483,647 |
UINT32 |
0 to 4,294,967,295 |
INT64 |
-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
UINT64 |
0 to 18,446,744,073,709,551,615 |
Decimal type values are numeric values with fixed precision and scale. Precision is the number of digits that the number contains. Scale is how many of these digits appear after the decimal point.
This type can represent decimal fractions exactly, and is suitable for financial calculations.
Name | Precision, Scale, and Range |
---|---|
NUMERIC
DECIMAL |
Precision: 38 Scale: 9 Min: -9.9999999999999999999999999999999999999E+28 Max: 9.9999999999999999999999999999999999999E+28 |
BIGNUMERIC
BIGDECIMAL |
Precision: 76.76 (the 77th digit is partial) Scale: 38 Min: -5.7896044618658097711785492504343953926634992332820282019728792003956564819968E+38 Max: 5.7896044618658097711785492504343953926634992332820282019728792003956564819967E+38 |
DECIMAL
is an alias for NUMERIC
.
BIGDECIMAL
is an alias for BIGNUMERIC
.
Floating point values are approximate numeric values with fractional components.
Name | Description |
---|---|
FLOAT |
Single precision (approximate) numeric values. |
DOUBLE |
Double precision (approximate) numeric values. |
When working with floating point numbers, there are special non-numeric values
that need to be considered: NaN
and +/-inf
Arithmetic operators provide standard IEEE-754 behavior for all finite input values that produce finite output and for all operations for which at least one input is non-finite.
Function calls and operators return an overflow error if the input is finite
but the output would be non-finite. If the input contains non-finite values, the
output can be non-finite. In general functions do not introduce NaN
s or
+/-inf
. However, specific functions like IEEE_DIVIDE
can return non-finite
values on finite input. All such cases are noted explicitly in
Mathematical functions.
Left Term | Operator | Right Term | Returns |
---|---|---|---|
Any value | + |
NaN |
NaN |
1.0 | + |
+inf |
+inf |
1.0 | + |
-inf |
-inf |
-inf |
+ |
+inf |
NaN |
Maximum DOUBLE value |
+ |
Maximum DOUBLE value |
Overflow error |
Minimum DOUBLE value |
/ |
2.0 | 0.0 |
1.0 | / |
0.0 |
"Divide by zero" error |
Comparison operators provide standard IEEE-754 behavior for floating point input.
Left Term | Operator | Right Term | Returns |
---|---|---|---|
NaN |
= |
Any value | FALSE |
NaN |
< |
Any value | FALSE |
Any value | < |
NaN |
FALSE |
-0.0 | = |
0.0 | TRUE |
-0.0 | < |
0.0 | FALSE |
Floating point values are sorted in this order, from least to greatest:
NULL
NaN
— AllNaN
values are considered equal when sorting.-inf
- Negative numbers
- 0 or -0 — All zero values are considered equal when sorting.
- Positive numbers
+inf
Special floating point values are grouped this way, including both grouping
done by a GROUP BY
clause and grouping done by the DISTINCT
keyword:
NULL
NaN
— AllNaN
values are considered equal when grouping.-inf
- 0 or -0 — All zero values are considered equal when grouping.
+inf
Name | Description |
---|---|
PROTO |
An instance of protocol buffer. |
Protocol buffers provide structured data types with a defined serialization format and cross-language support libraries. Protocol buffer message types can contain optional, required or repeated fields, including nested messages. See the Protocol Buffers Developer Guide for more detail.
Protocol buffer message types behave similarly to STRUCT types, and support
similar operations like reading field values by name. Protocol buffer types are
always named types, and can be referred to by their fully-qualified protocol
buffer name (i.e. package.ProtoName
). Protocol buffers support some additional
behavior beyond STRUCTs, like default field values, and checking for the
presence of optional fields.
Protocol buffer ENUM types are also available and can be referenced using the fully-qualified ENUM type name.
See Using Protocol Buffers for more information.
No direct comparison of PROTO values is supported. There are a couple possible workarounds:
- The most accurate way to compare PROTOs is to do a pair-wise comparison
between the fields of the PROTOs. This can also be used to
GROUP BY
orORDER BY
PROTO fields. - For simple equality comparisons, you can cast a PROTO to BYTES and compare the results.
- To get a simple approximation for inequality comparisons, you can cast PROTO to STRING. Note that this will do lexicographical ordering for numeric fields.
Name | Description |
---|---|
STRING |
Variable-length character (Unicode) data. |
Input STRING values must be UTF-8 encoded and output STRING values will be UTF-8 encoded. Alternate encodings like CESU-8 and Modified UTF-8 are not treated as valid UTF-8.
All functions and operators that act on STRING values operate on Unicode
characters rather than bytes. For example, functions like SUBSTR
and LENGTH
applied to STRING input count the number of characters, not bytes.
Each Unicode character has a numeric value called a code point assigned to it. Lower code points are assigned to lower characters. When characters are compared, the code points determine which characters are less than or greater than other characters.
Most functions on STRING are also defined on BYTES. The BYTES version operates on raw bytes rather than Unicode characters. STRING and BYTES are separate types that cannot be used interchangeably. There is no implicit casting in either direction. Explicit casting between STRING and BYTES does UTF-8 encoding and decoding. Casting BYTES to STRING returns an error if the bytes are not valid UTF-8.
Name | Description |
---|---|
STRUCT |
Container of ordered fields each with a type (required) and field name (optional). |
STRUCT<T>
STRUCT types are declared using the angle brackets (<
and >
). The type of
the elements of a STRUCT can be arbitrarily complex.
Examples
Type Declaration | Meaning |
---|---|
STRUCT<INT64>
|
Simple STRUCT with a single unnamed 64-bit integer field. |
STRUCT<x STRUCT<y INT64, z INT64>>
|
A STRUCT with a nested STRUCT named x inside it. The STRUCT
x has two fields, y and z , both of which
are 64-bit integers. |
STRUCT<inner_array ARRAY<INT64>>
|
A STRUCT containing an ARRAY named inner_array that holds
64-bit integer elements. |
(expr1, expr2 [, ... ])
The output type is an anonymous STRUCT type with anonymous fields with types matching the types of the input expressions. There must be at least two expressions specified. Otherwise this syntax is indistinguishable from an expression wrapped with parentheses.
Examples
Syntax | Output Type | Notes |
---|---|---|
(x, x+y) |
STRUCT<?,?> |
If column names are used (unquoted strings), the STRUCT field data type is
derived from the column data type. x and y are
columns, so the data types of the STRUCT fields are derived from the column
types and the output type of the addition operator. |
This syntax can also be used with STRUCT comparison for comparison expressions
using multi-part keys, e.g. in a WHERE
clause:
WHERE (Key1,Key2) IN ( (12,34), (56,78) )
STRUCT( expr1 [AS field_name] [, ... ])
Duplicate field names are allowed. Fields without names are considered anonymous
fields and cannot be referenced by name. STRUCT values can be NULL
, or can
have NULL
field values.
Examples
Syntax | Output Type |
---|---|
STRUCT(1,2,3) |
STRUCT<int64,int64,int64> |
STRUCT() |
STRUCT<> |
STRUCT('abc') |
STRUCT<string> |
STRUCT(1, t.str_col) |
STRUCT<int64, str_col string> |
STRUCT(1 AS a, 'abc' AS b) |
STRUCT<a int64, b string> |
STRUCT(str_col AS abc) |
STRUCT<abc string> |
STRUCT<[field_name] field_type, ...>( expr1 [, ... ])
Typed syntax allows constructing STRUCTs with an explicit STRUCT data type. The
output type is exactly the field_type
provided. The input expression is
coerced to field_type
if the two types are not the same, and an error is
produced if the types are not compatible. AS alias
is not allowed on the input
expressions. The number of expressions must match the number of fields in the
type, and the expression types must be coercible or literal-coercible to the
field types.
Examples
Syntax | Output Type |
---|---|
STRUCT<int64>(5) |
STRUCT<int64> |
STRUCT<date>("2011-05-05") |
STRUCT<date> |
STRUCT<x int64, y string>(1, t.str_col) |
STRUCT<x int64, y string> |
STRUCT<int64>(int_col) |
STRUCT<int64> |
STRUCT<x int64>(5 AS x) |
Error - Typed syntax does not allow AS |
STRUCTs can be directly compared using equality operators:
- Equal (
=
) - Not Equal (
!=
or<>
) - [
NOT
]IN
Notice, though, that these direct equality comparisons compare the fields of the STRUCT pairwise in ordinal order ignoring any field names. If instead you want to compare identically named fields of a STRUCT, you can compare the individual fields directly.
<td>
00:00:00 to 23:59:59.999999999<br/>
<hr/>
00:00:00 to 23:59:59.999999<br/>
</td>
Name | Range |
---|---|
TIME |
A TIME object represents a time, as might be displayed on a watch, independent of a specific date and timezone. The range of subsecond precision is determined by the SQL engine. To represent an absolute point in time, use a timestamp.
[H]H:[M]M:[S]S[.DDDDDD|.DDDDDDDDD]
[H]H
: One or two digit hour (valid values from 00 to 23)[M]M
: One or two digit minutes (valid values from 00 to 59)[S]S
: One or two digit seconds (valid values from 00 to 59)
<li><code>[.DDDDDDDDD|.DDDDDD]</code>: Up to six or nine fractional digits (microsecond or nanosecond precision)</li>
<td>
0001-01-01 00:00:00 to 9999-12-31 23:59:59.999999999 UTC<br/>
<hr/>
0001-01-01 00:00:00 to 9999-12-31 23:59:59.999999 UTC<br/>
</td>
Name | Range |
---|---|
TIMESTAMP |
A TIMESTAMP object represents an absolute point in time, independent of any time zone or convention such as Daylight Savings Time with microsecond or nanosecond precision. The range of subsecond precision is determined by the SQL engine.
- To represent a date as it might appear on a calendar, use a DATE object.
- To represent a time, as it might appear on a clock, use a TIME object.
- To represent a date and time, as they might appear on a calendar and clock, use a DATETIME object.
YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.DDDDDD|.DDDDDDDDD]][time zone]
YYYY
: Four-digit year[M]M
: One or two digit month[D]D
: One or two digit day( |T)
: A space or a `T` separator[H]H
: One or two digit hour (valid values from 00 to 23)[M]M
: One or two digit minutes (valid values from 00 to 59)[S]S
: One or two digit seconds (valid values from 00 to 59)
<li><code>[.DDDDDDDDD|.DDDDDD]</code>: Up to six or nine fractional digits (microsecond or nanosecond precision)</li>
<li><code>[time zone]</code>: String representing the time zone.
When a time zone is not explicitly specified, the
default time zone, which is implementation defined, is used.
See the <a href="#time_zones">time zones</a> section for details.
Time zones are used when parsing timestamps or formatting timestamps for display. The timestamp value itself does not store a specific time zone, nor does it change when you apply a time zone offset.
Time zones are represented by strings in one of these two canonical formats:
- Offset from Coordinated Universal Time (UTC), or the letter
Z
for UTC - Time zone name from the tz database{: class=external target=_blank }
(+|-)H[H][:M[M]]
Z
Examples
-08:00
-8:15
+3:00
+07:30
-7
Z
When using this format, no space is allowed between the time zone and the rest of the timestamp.
2014-09-27 12:30:00.45-8:00
2014-09-27T12:30:00.45Z
continent/[region/]city
Time zone names are from the tz database{: class=external target=_blank }. For a less comprehensive but simpler reference, see the List of tz database time zones{: class=external target=_blank } on Wikipedia.
Examples
America/Los_Angeles
America/Argentina/Buenos_Aires
When using a time zone name, a space is required between the name and the rest of the timestamp:
2014-09-27 12:30:00.45 America/Los_Angeles
Note that not all time zone names are interchangeable even if they do happen to
report the same time during a given part of the year. For example,
America/Los_Angeles
reports the same time as UTC-7:00
during Daylight
Savings Time, but reports the same time as UTC-8:00
outside of Daylight
Savings Time.
If a time zone is not specified, the default time zone value is used.
A timestamp is simply an offset from 1970-01-01 00:00:00 UTC, assuming there are exactly 60 seconds per minute. Leap seconds are not represented as part of a stored timestamp.
If the input contains values that use ":60" in the seconds field to represent a leap second, that leap second is not preserved when converting to a timestamp value. Instead that value is interpreted as a timestamp with ":00" in the seconds field of the following minute.
Leap seconds do not affect timestamp computations. All timestamp computations are done using Unix-style timestamps, which do not reflect leap seconds. Leap seconds are only observable through functions that measure real-world time. In these functions, it is possible for a timestamp second to be skipped or repeated when there is a leap second.