description |
---|
Learn how to query Pinot using SQL |
Pinot provides a SQL interface for querying, which uses the Calcite SQL parser to parse queries and the MYSQL_ANSI dialect. For details on the syntax, see the the Calcite documentation. To find supported SQL operators, see Class SqlLibraryOperators.
In Pinot 1.0, the multi-stage query engine supports inner join, left-outer, semi-join, and nested queries out of the box. It's optimized for in-memory process and latency. For more information, see how to enable and use the multi-stage query engine.
Pinot also supports using simple Data Definition Language (DDL) to insert data into a table from file directly. For details, see programmatically access the multi-stage query engine. More DDL supports will be added in the future. But for now, the most common way for data definition is using the Controller Admin API.
{% hint style="info" %} Note: For queries that require a large amount of data shuffling, require spill-to-disk, or are hitting any other limitations of the multi-stage query engine (v2), we still recommend using Presto. {% endhint %}
In Pinot SQL:
- Double quotes(") are used to force string identifiers, e.g. column names
- Single quotes(') are used to enclose string literals. If the string literal also contains a single quote, escape this with a single quote e.g
'''Pinot'''
to match the string literal'Pinot'
Misusing those might cause unexpected query results, like the following examples:
WHERE a='b'
means the predicate on the columna
equals to a string literal value'b'
WHERE a="b"
means the predicate on the columna
equals to the value of the columnb
If your column names use reserved keywords (e.g. timestamp
or date
) or special characters, you will need to use double quotes when referring to them in queries.
Note: Define decimal literals within quotes to preserve precision.
//default to limit 10
SELECT *
FROM myTable
SELECT *
FROM myTable
LIMIT 100
SELECT "date", "timestamp"
FROM myTable
SELECT COUNT(*), MAX(foo), SUM(bar)
FROM myTable
SELECT MIN(foo), MAX(foo), SUM(foo), AVG(foo), bar, baz
FROM myTable
GROUP BY bar, baz
LIMIT 50
SELECT MIN(foo), MAX(foo), SUM(foo), AVG(foo), bar, baz
FROM myTable
GROUP BY bar, baz
ORDER BY bar, MAX(foo) DESC
LIMIT 50
SELECT COUNT(*)
FROM myTable
WHERE foo = 'foo'
AND bar BETWEEN 1 AND 20
OR (baz < 42 AND quux IN ('hello', 'goodbye') AND quuux NOT IN (42, 69))
For performant filtering of IDs in a list, see Filtering with IdSet.
SELECT COUNT(*)
FROM myTable
WHERE foo IS NOT NULL
AND foo = 'foo'
AND bar BETWEEN 1 AND 20
OR (baz < 42 AND quux IN ('hello', 'goodbye') AND quuux NOT IN (42, 69))
SELECT *
FROM myTable
WHERE quux < 5
LIMIT 50
SELECT foo, bar
FROM myTable
WHERE baz > 20
ORDER BY bar DESC
LIMIT 100
Note that results might not be consistent if the ORDER BY
column has the same value in multiple rows.
SELECT foo, bar
FROM myTable
WHERE baz > 20
ORDER BY bar DESC
LIMIT 50, 100
The example below counts rows where the column airlineName
starts with U
:
SELECT COUNT(*)
FROM myTable
WHERE REGEXP_LIKE(airlineName, '^U.*')
GROUP BY airlineName LIMIT 10
Pinot supports the CASE-WHEN-ELSE
statement, as shown in the following two examples:
SELECT
CASE
WHEN price > 30 THEN 3
WHEN price > 20 THEN 2
WHEN price > 10 THEN 1
ELSE 0
END AS price_category
FROM myTable
SELECT
SUM(
CASE
WHEN price > 30 THEN 30
WHEN price > 20 THEN 20
WHEN price > 10 THEN 10
ELSE 0
END) AS total_cost
FROM myTable
Pinot doesn't currently support injecting functions. Functions have to be implemented within Pinot, as shown below:
SELECT COUNT(*)
FROM myTable
GROUP BY DATETIMECONVERT(timeColumnName, '1:MILLISECONDS:EPOCH', '1:HOURS:EPOCH', '1:HOURS')
For more examples, see Transform Function in Aggregation Grouping.
Pinot supports queries on BYTES column using hex strings. The query response also uses hex strings to represent bytes values.
The query below fetches all the rows for a given UID:
SELECT *
FROM myTable
WHERE UID = 'c8b3bce0b378fc5ce8067fc271a34892'