- SQL for Beginners
- DB Fiddle
- Dataset
- SQL Order of Operations
- Part I: More SQL Syntax
- Set Operations (UNION, INTERSECT, EXCEPT)
- IF
- CASE
- COALESCE
- Practice 1
- ROLLUP
- Practice 2
- REPLACE
- SIMILAR TO Regular Expressions
- FILTER
- EXTRACT
- TO_CHAR
- Working with time zones
- WINDOW FUNCTIONS
- PERCENTILE
- Practice 3
- SUBQUERIES
- CTE (Common Table Expression)
- Scalar Subquery
- Practice 4
- LATERAL JOIN
- Nested SELECT
- CROSS JOIN
- SELF JOIN
- CREATING COMBINATIONS OF TWO VARIABLES
- USING and NATURAL
- Checking for values in ARRAYS
- UNNEST
- CREATE TABLES
- INSERTING DATA
- DELETE
- UPDATE
- ON CONFLICT DO UPDATE (NOTHING)
- COPY TABLE SCHEMA TO A NEW TABLE
- COPY DATA INTO ANOTHER TABLE
- FOREIGN TABLES
- FUNCTIONS
- EXPLAIN
- Miscellaneous Commands
- PostgreSQL
- Show Postgres Version
- List all public tables in database
- List table schema
- Get Table Size
- Get Database Size
- Get TOAST information
- Create materialized view
- List materialized views
- Create Role and User for Read Only
- Create Role and User for Read Write Privileges
- Revoking All Privileges from User
- Grant privileges for certain columns
- Privileges
- Rollback Transaction
- Revoking Roles from Users
- Change Table Owner
- Setting up Trigger Functions
- Create Index
- List Indexes in Database
- List schemas in Database
- Toast (The Oversized-Attribute Storage Technique)
- Changing Autovacuum Settings (Table Storage Parameters)
- Reset Autovacuum Settings (Table Storage Parameters)
- Types of SQL Commands
- PSQL
My Intro to SQL workshop can be found here as a Jupyter Notebook slide deck. This workshop builds off of that material.
- DB Fiddle is an online environment testing SQL code
- DB Fiddle is like w3schools but has more functionality
- You can create tables and query the tables in the browser
- In the left-hand window, you create the schema (columns and data types) and the data for a table
- In the right-hand window, you query the tables
- Link to db-fiddle: https://www.db-fiddle.com/
- Switch the current database to
MySQL v8.0
(which is a common database in the wild) in the top-left corner - Switch the current database to
Postgres v10.0
(for Mayniacs) in the top-left corner
- We’ll be using Michigan COVID-19 public data. I filtered it to include records since the re-opening.
- You can find it here at this gist https://gist.github.com/caocscar/b9a1418e5fd9c2cd69bb6f9d67fbc05a
- Click on the
Raw
button in the top-right corner. This will bring up a new page. - Copy the entire contents (21472 Records + Header Row)
- In the DB Fiddle window, click the
Text to DDL
button in lower-left - Give it a table name of
Covid
- Paste the contents into the
Formatted Text
window - Click
Append to Schema
SQL queries are not executed sequentially when you write it. It is executed in the following order
- FROM / JOIN
- WHERE
- GROUP BY
- aggregate functions
- HAVING
- window functions
- SELECT
- DISTINCT
- set operations (UNION, INTERSECT, EXCEPT)
- ORDER BY
- OFFSET
- LIMIT
These three set operations cover the union (UNION
), intersection (INTERSECT
) and difference (EXCEPT
) between two sets (i.e. queries). Recall, from part I, that the queries must have the same number of columns and data types. Duplicates are removed unless ALL
is used too.
MySQL
Similar to the Microsoft Excel functionality for IFIF(expression, true expression, false expression)
SELECT County,
IF (LENGTH(County) < 8, 'short', 'long') AS StringLength
FROM Covid
Note: PostgreSQL does not support IF
statement in SELECT
Used to convert a continuous variable into a categorical or ordinal variable. It is equivalent to if else if statements or switch case statements in programming.
SELECT County, Day, Cases,
CASE
WHEN Cases = 0 THEN 'Good'
WHEN Cases <= 5 THEN 'Not Bad'
WHEN Cases <= 20 THEN 'Opposite of good'
ELSE 'Stay at home'
END AS Outlook
FROM Covid
WHERE County = 'Washtenaw' AND CP = 'Confirmed'
Used to return the first non-null value in a list. We can use it to return a default value instead of null
in the query. Here we return zero instead of null
.
SELECT COALESCE(SUM(Deaths), 0)
FROM Covid
WHERE County = 'Keweenaw'
Create a variable called deathIndex that takes on the following values:
- -1 if deathIndex is 0
- 0 if deathIndex is equal to 1
- log(N) if deathIndex is greater than 1 (where N = Deaths)
MySQL
Suppose you want to add a total row to the bottom of a table. Based on the intro class, one way to do this is using UNION
.
SELECT CP, SUM(Cases) AS CaseTotal
FROM Covid
WHERE County = 'Washtenaw'
GROUP BY CP
UNION
SELECT NULL, SUM(Cases) AS CaseTotal
FROM Covid
WHERE County = 'Washtenaw'
A better way to code this and easier to understand is using the WITH ROLLUP
syntax
SELECT CP, SUM(Cases) AS CaseTotal
FROM Covid
WHERE County = 'Washtenaw'
GROUP BY CP WITH ROLLUP
If you want to substitute meaningful labels instead of null, use the GROUPING
function with the IF
function
returns 1 when null occurs in a super-aggregate row, otherwise 0
SELECT IF(GROUPING(CP), 'Total', CP),
SUM(Cases) AS CaseTotal
FROM Covid
WHERE County = 'Washtenaw'
GROUP BY CP WITH ROLLUP
PostgreSQL
Suppose you want to add a total row to the bottom of a table. Based on the intro class, one way to do this is using UNION
.
SELECT CP, SUM(Cases) AS CaseTotal
FROM Covid
WHERE County = 'Washtenaw'
GROUP BY CP
UNION
SELECT 'Total', SUM(Cases) AS CaseTotal
FROM Covid
WHERE County = 'Washtenaw'
ORDER BY CP
A better way to code this and easier to understand is using the WITH ROLLUP
syntax
SELECT CP, SUM(Cases) AS CaseTotal
FROM Covid
WHERE County = 'Washtenaw'
GROUP BY ROLLUP(CP)
ORDER BY CP
If you want to substitute meaningful labels instead of null, use the GROUPING
function with the IF
function
returns 1 when null occurs in a super-aggregate row, otherwise 0
SELECT
CASE
WHEN GROUPING(CP) = 0 THEN CP
ELSE 'Total'
END AS CP,
SUM(Cases) AS CaseTotal
FROM Covid
WHERE County = 'Washtenaw'
GROUP BY ROLLUP(CP)
ORDER BY CP
Create a table that shows the total confirmed + probable deaths for each county. Include a row that has the total for Michigan. Only include counties that have a death.
Expand the table above by CP status. What is different in the table from the result above?
To replace values within a column:
REPLACE(column, substring_to_replace, replacement_substring)
MySQL
SELECT REPLACE(County, "Washtenaw", "WTW")
FROM Covid
PostgreSQL
UPDATE
Covid
SET
County = REPLACE(County, 'Washtenaw', 'WTW');
SELECT * FROM Covid;
Use SIMILAR TO
syntax for SQL regular expressions.
SELECT *
FROM drivers
WHERE driver SIMILAR TO '%([1-9])'
Reference: https://www.postgresql.org/docs/10/functions-matching.html
If we wanted to bin and count student grades, we could use a CTE with CASE
.
WITH cte AS (
SELECT *,
CASE
WHEN Marks BETWEEN 40 AND 60 THEN 'C'
WHEN Marks BETWEEN 60 AND 80 THEN 'B'
WHEN Marks BETWEEN 80 AND 100 THEN 'A'
ELSE 'F'
END AS grades
FROM students
)
SELECT grades, count(*) as ct
FROM cte
GROUP BY grades
Alternatively, we can use the FILTER
clause with COUNT
. I also included a scalar subquery as another alternative.
SELECT COUNT(*) AS total
,COUNT(*) FILTER (WHERE Marks BETWEEN 40 AND 59) AS C
,COUNT(1) FILTER (WHERE Marks BETWEEN 60 AND 79) AS B
,(SELECT COUNT(*) FROM students WHERE Marks BETWEEN 80 AND 100) AS A
FROM students;
This function is useful from converting data from long to wide format. Assume you have long dataset with four columns: date, station, event_type, riders
. The example shows how to do it with both FILTER
and CASE
. Fill in nulls using COALESCE
.
SELECT date
,station
,SUM(riders) FILTER (WHERE event_type = 'pickup') AS pickup
,SUM(riders) FILTER (WHERE event_type = 'dropoff') AS dropoff
,SUM(CASE WHEN event_type = 'pickup' THEN riders END) AS pickup
,SUM(CASE WHEN event_type = 'dropoff' THEN riders END) AS dropoff
FROM cte
GROUP BY date, station
ORDER BY date DESC, station
Retrieves a datetime field (e.g. year, day) from a timestamp
, time
, or interval
datatype. The example shows the difference in seconds between two dates.
SELECT EXTRACT(EPOCH FROM TIMESTAMP '2021-04-17 15:45:14' - TIMESTAMP '2021-04-16 15:45:14')
SELECT EXTRACT(EPOCH FROM '2021-04-26 08:55:15'::timestamp - '2021-04-16 15:45:14'::timestamp)
Reference: https://www.postgresql.org/docs/10/functions-datetime.html
Format timestamp
as a string
SELECT TO_CHAR('2021-04-26 18:34:56' AT TIME ZONE 'America/Chicago', 'YYYY-MM-DD HH24:MI:SS')
Cast timestamp with time zone
to local time timestamp without time zone
SELECT utcz_time AT TIME ZONE 'America/Detroit' AS local_time
To convert timestamp without time zone
to timestamp with time zone
(back to UTC)
SELECT local_time AT TIME ZONE 'America/Detroit'
Cast timestamp with time zone
to timestamp without time zone
without changing the time value
SELECT utcz_time::timestamp
A window function performs an aggregate-like operation on a set of query rows. However, whereas an aggregate operation groups query rows into a single result row, a window function produces a result for each row.
SELECT *,
SUM(Cases) OVER() AS Total
FROM Covid
WHERE County = 'Genesee'
Note: MySQL requires single quotes around the alias like so SUM(Cases) OVER() AS 'Total'
You can use aggregate functions as window functions. They require the OVER
clause to specify whether it is a window function or not. Some examples we saw in the Intro Class are:
SUM()
COUNT()
MIN()
MAX()
AVG()
You can also use non-aggregate functions that are used as window functions. They always require the OVER
clause.
CUME_DIST()
DENSE_RANK()
FIRST_VALUE()
LAG()
LAST_VALUE()
LEAD()
NTH_VALUE()
NTILE()
PERCENT_RANK()
RANK()
ROW_NUMBER()
If you want to rank rows, use the RANK() OVER
function
We use ORDER BY
to determine the rank order
SELECT County,
Cases,
RANK() OVER (ORDER BY Cases) AS Rank
FROM Covid
WHERE Day IN ('2020-06-01','2020-09-25') AND CP = 'Confirmed'
Note: MySQL requires single quotes around the alias like so RANK() OVER (ORDER BY Cases) AS 'Rank'
Notice ties have the same rank. If you don't want the ranking to skip any numbers, use DENSE_RANK
instead
SELECT County,
Cases,
DENSE_RANK() OVER (ORDER BY Cases DESC) AS Rank
FROM Covid
WHERE Day IN ('2020-06-01','2020-09-25') AND CP = 'Confirmed'
Alternatively use the WINDOW
syntax
Use case: when you use the same window more than once
SELECT County,
Cases,
RANK() OVER w AS Rank,
DENSE_RANK() OVER w AS Rank2
FROM Covid
WHERE Day IN ('2020-06-01','2020-09-25') AND CP = 'Confirmed'
WINDOW w AS (ORDER BY Cases DESC)
Works like GROUP BY
but with the OVER
clause
If we wanted to have the ranking be within a single day, we can use PARTITION BY
SELECT Day,
County,
Cases,
RANK() OVER (PARTITION BY Day ORDER BY Cases) AS Rank
FROM Covid
WHERE Day IN ('2020-06-01','2020-09-25') AND CP = 'Confirmed'
Suppose we wanted to get the most recent number of daily cases per county. We could filter by date and show the cases for the most recent day but what if some counties did not report cases the day. We would want the most recent data they had. We could use the LAST_VALUE
window function in conjunction with RANGE BETWEEN
to do so. The latter function defines the frame clause in the partition. Here it goes from the first row to the last row because of the UNBOUNDED
syntax.
With cte AS (
SELECT LAST_VALUE(Cases) OVER (PARTITION BY County ORDER BY Day ASC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS recent_cases
FROM Covid
)
Here the frame_clause is RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
. It usually takes the form of
{ RANGE | ROWS } BETWEEN frame_start AND frame_end
OR
{ RANGE | ROWS } frame_start
The default frame clause is RANGE UNBOUNDED PRECEDING
, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
If you wanted to create a window that included the past 5 entries for a 5-day moving average, for example, your frame clause would be ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
. Likewise, if you wanted to create a window for a median filter of size 7, your syntax would be
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
Reference: https://www.postgresql.org/docs/10/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS
continuous percentile: returns a value corresponding to the specified fraction in the ordering, interpolating between adjacent input items if needed
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY deaths)
FROM covid
SELECT percentile_cont(ARRAY[0.25, 0.5, 0.75, 0.95]) WITHIN GROUP (ORDER BY deaths)
FROM covid
Reference: https://www.postgresql.org/docs/10/functions-aggregate.html
Rank county confirmed cases for each day in the last week (Sep 24-30th). Only rank counties starting with S. (i.e for each day, rank the “S” counties). Spell out the word “St” where applicable.
Create a variable that has the number of confirmed cases from a week ago for Wayne county. Show the most recent dates on top (Hint: Use the LAG()
function).
In the intro workshop, we saw how subqueries worked. For example,
SELECT *, RANK() OVER (ORDER BY DeathTotal DESC) as Rank
FROM (
SELECT Day, SUM(Deaths) AS DeathTotal
FROM Covid
WHERE county = 'Wayne'
GROUP BY Day
) AS Alex
The WITH
clause allows you separate your subqueries to make your code more modular and easier to read. It is also known as a Common Table Expression (CTE) or subquery factoring. It is a temporary result set that you can reference within another SELECT, INSERT, UPDATE, or DELETE statement
WITH cte AS (
SELECT Day, SUM(Deaths) AS DeathTotal
FROM Covid
WHERE county = 'Wayne'
GROUP BY Day
)
SELECT *, RANK() OVER (ORDER BY DeathTotal DESC) as Rank
FROM cte
Use a comma and omit the subsequent WITH
keyword for multiple CTEs.
WITH cte1 AS (
...
),
cte2 AS (
...
)
SELECT *
FROM cte1
JOIN cte2
ON cte1.pk = cte2.pk
A scalar subquery is an ordinary SELECT
query in parentheses that returns exactly one row with one column.
SELECT name
,(SELECT max(pop) FROM cities WHERE cities.state = states.name)
FROM states;
Return a table of Michigan cases (confirmed and probable) showing the max daily numbers for each calendar week (Hint: Use the WEEK()
function and/or the EXTRACT()
function).
This dataset
Week,Day,Cases
1,Sun,10
1,Mon,20
2,Sun,30
2,Mon,40
Should be transformed into this
Week,Max Daily Cases
1,20
2,40
Lateral
joins allow you to reference table columns in the preceding left table of the join. They serve as an alternative to using CTEs. Looks like window functions do not work with lateral joins since they require more than one row to operate properly.
SELECT orders.orderid
,statuslow
,statushigh
FROM orders
JOIN LATERAL (
SELECT orderlinestatus.name AS statuslow
,orderlines.orderid
FROM orderlines
JOIN orderlinestatus USING (orderlinestatusid)
WHERE orderlines.orderid = orders.orderid
GROUP BY orderlines.orderid, orderlinestatus.name, orderlinestatus.weight
ORDER BY orderlinestatus.weight LIMIT 1
) AS subquery_statuslow ON (subquery_statuslow.orderid=orders.orderid),
JOIN LATERAL (
SELECT orderlinestatus.name AS statushigh
,orderlines.orderid
FROM orderlines
JOIN orderlinestatus USING (orderlinestatusid)
WHERE orderlines.orderid = orders.orderid
GROUP BY orderlines.orderid, orderlinestatus.name, orderlinestatus.weight
ORDER BY orderlinestatus.weight DESC LIMIT 1
) AS subquery_statushigh ON (subquery_statushigh.orderid=orders.orderid)
Reference: https://www.postgresql.org/docs/10/queries-table-expressions.html https://www.davici.nl/blog/postgresql-lateral-join
SELECT
salesperson.name,
-- find maximum sale size for this salesperson
(SELECT MAX(amount) AS amount
FROM all_sales
WHERE all_sales.salesperson_id = salesperson.id)
AS amount,
-- find customer for this maximum size
(SELECT customer_name
FROM all_sales
WHERE all_sales.salesperson_id = salesperson.id
AND all_sales.amount =
-- find maximum size, again
(SELECT MAX(amount) AS amount
FROM all_sales
WHERE all_sales.salesperson_id = salesperson.id))
AS customer_name
FROM
salesperson;
Reference: https://dev.mysql.com/doc/refman/8.0/en/lateral-derived-tables.html
Performs a cartesian product of the two tables (combination of the two table rows). The result will be rows(table1) * rows(table2)
SELECT source, nombre
FROM suits
CROSS JOIN locale
OR equivalently you can the comma notation
SELECT source, nombre
FROM suits, locale
Join a table with itself
SELECT a.name, b.name, a.state
FROM company a, company b
WHERE a.state=b.state AND a.name != b.name
Reference: https://www.w3resource.com/sql/joins/join-a-table-to-itself.php
Creating a combination of dates and hours example using the generate_series
function to automatically generate the range needed with some filtering
SELECT date_trunc('day', date_range)::date
,hour
FROM generate_series('2021-07-26'::timestamp, '2021-10-01'::timestamp, '1 day'::interval) AS date_range
CROSS JOIN (SELECT generate_series(7,19) AS hour) AS service_hours
WHERE EXTRACT(DOW FROM date_range) BETWEEN 1 AND 5
Here is a standard join statement
SELECT customers.*, orders.*
FROM customers
JOIN orders ON customers.custid = orders.custid
USING
with parentheses ()
can be used when the column names are identical in each table (explicit)
SELECT customers.*, orders.*
FROM customers
JOIN orders USING (custid)
NATURAL
can also be used when the column names are identical in each table (implicit)
SELECT customers.*, orders.*
FROM customers
NATURAL JOIN orders
Reference: https://www.postgresql.org/docs/10/queries-table-expressions.html
Checking for single value (exact match) against an ARRAY
type. Types must match.
SELECT *
FROM health
WHERE 'VELODYNE' = ANY(error_uid)
Checking for multiple values against an ARRAY
type. Types must match thus the casting syntax.
SELECT *
FROM health
WHERE ARRAY[7,14]::SMALLINT[] && error_uid
Reference: Array Functions and Operators
https://www.postgresql.org/docs/10/functions-array.html
Equivalent to pandas explode
method expanding an ARRAY
into a set of rows.
SELECT log_name
,unnest(edge_id) AS edgeid
,unnest(lat) AS lat
,unnest(lon) AS lon
,unnest(autonomy_count) AS auto
,unnest(healthy_count) AS healthy
FROM vehicle_autonomy_edges AS vae
OR if you want to add the array index number to the result
SELECT vae.log_name
,a.edgeid
,a.lat
,a.lon
,a.auto
,a.healthy
,a.nr
FROM vehicle_autonomy_edges AS vae
CROSS JOIN LATERAL UNNEST(vae.edge_id, vae.lat, vae.lon, vae.autonomy_count, vae.healthy_count) WITH ORDINALITY AS a(edgeid, lat, lon, auto, healthy, nr)
Reference: https://www.postgresql.org/docs/10/functions-array.html https://stackoverflow.com/questions/8760419/postgresql-unnest-with-element-number
Here's a simple example of how to create a table.
CREATE TABLE reanna (
id integer,
name text
)
You can, of course, be much fancier and add constraints and defaults to the mix.
Inserting with column names (if not targeting all columns)
INSERT INTO reanna (id, name)
VALUES (1, 'bella')
,(2, 'ciao')
OR if targeting all columns for insertion
INSERT INTO reanna
VALUES (3, 'au revoir')
,(4, 'my lady')
To delete existing rows from a table and return deleted rows
DELETE FROM covid
WHERE date < '2021-03-13'
RETURNING *
To update an existing table and return updated rows
UPDATE vehicle_logs
SET start_time::timestamp AT TIME ZONE 'America/Chicago'
WHERE RIGHT(log_name, 8) = TO_CHAR(start_time, 'HH24-MI-SS')
RETURNING *
The idea is that when you insert a new row into the table, PostgreSQL will update the row if it already exists, otherwise, it will insert the new row. The UPDATE
or INSERT
operation is a.k.a. UPSERT. This is equivalent to INSERT IGNORE INTO
in other SQL languages.
The examples specifies that if the date
is suppose to be constrained or unique in the table, then literally do nothing.
INSERT INTO Covid (date, Cases)
VALUES('2021-04-01', '6036')
ON CONFLICT (date)
DO NOTHING;
Alternatively, we can update the value by using the EXCLUDED
table. The current value is accessible by the table name (in this case Covid
).
INSERT INTO Covid (date, Cases, Deaths)
VALUES('2021-04-01', '6036', '2')
ON CONFLICT (date)
DO UPDATE
SET Cases = EXCLUDED.Cases || ' (' || Covid.Cases || ')',
Deaths = EXCLUDED.Deaths;
(date)
is considered the conflict target and performs all conflicts with usable constraints and unique indexes.
Reference: https://www.postgresql.org/docs/10/sql-insert.html
SELECT *
INTO brand_new_table
FROM current_table
LIMIT 0
INSERT INTO brand_new_table (col1, col3, col4)
SELECT col1
,col3
,col4
FROM current_table
ON CONFLICT (col1, col3) DO UPDATE
SET col4 = EXCLUDED.col4;
PostgreSQL allows you to access data that resides outside PostgreSQL using regular SQL queries. Such data is referred to as foreign data. This example takes data from the same table name in two different databases in the same RDS instance and creates an equivalent foreign table in the same local database instance. They can be treated as views.
-- create extension
CREATE EXTENSION postgres_fdw;
-- create server connection
CREATE SERVER server_ann_arbor
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host '<host_name>', port '5432', dbname 'mercury_ann_arbor');
-- create user mapping; options has credentials to connect to foreign database
CREATE USER MAPPING FOR local_user
SERVER server_ann_arbor
OPTIONS (user 'alex', password 'cao');
GRANT USAGE ON FOREIGN SERVER server_ann_arbor TO readaccess;
-- create foreign table schema
CREATE FOREIGN TABLE vehicle_log_statuses_ann_arbor (
log_name TEXT
,creation_time TIMESTAMP WITH TIME ZONE
,filtered_time TIMESTAMP WITH TIME ZONE
,extracted_time TIMESTAMP WITH TIME ZONE
,processed_time TIMESTAMP WITH TIME ZONE
,archived_time TIMESTAMP WITH TIME ZONE
)
SERVER server_ann_arbor
OPTIONS (table_name 'vehicle_log_statuses');
-- repeat steps above
CREATE SERVER server_arlington
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'mercury.us-east-2.rds.amazonaws.com', port '5432', dbname 'mercury_arlington');
CREATE USER MAPPING FOR may
SERVER server_arlington
OPTIONS (user 'alex', password 'cao');
GRANT USAGE ON FOREIGN SERVER server_arlington TO readaccess;
CREATE FOREIGN TABLE vehicle_log_statuses_arlington (
log_name TEXT
,creation_time TIMESTAMP WITH TIME ZONE
,filtered_time TIMESTAMP WITH TIME ZONE
,extracted_time TIMESTAMP WITH TIME ZONE
,processed_time TIMESTAMP WITH TIME ZONE
,archived_time TIMESTAMP WITH TIME ZONE
)
SERVER server_arlington
OPTIONS (table_name 'vehicle_log_statuses');
SELECT *
FROM vehicle_log_statuses_arlington
UNION
SELECT *
FROM vehicle_log_statuses_ann_arbor
Alternatively, you can simply import the tables without the need to manually create it using IMPORT FOREIGN SCHEMA
.
IMPORT FOREIGN SCHEMA public LIMIT TO (vehicle_log_statuses)
FROM SERVER server_arlington INTO public;
Reference: https://www.postgresql.org/docs/current/sql-createforeigntable.html
https://www.postgresql.org/docs/current/sql-createserver.html
https://www.postgresql.org/docs/current/sql-createusermapping.html
You can write functions like in other languages. Here's a basic function that executes three commands in succession. Function takes no arguments.
CREATE OR REPLACE FUNCTION via_requests_from_to_raw_to_final() RETURNS text AS $$
BEGIN
-- 1st cmd
DELETE FROM alex_table;
-- 2nd cmd
INSERT INTO alex_table
SELECT RIDER_ID
,REQUEST_ID
FROM upstream_table;
--3rd cmd
RETURN 'Finished transformation';
END
$$ LANGUAGE plpgsql;
Reference: https://www.postgresql.org/docs/10/sql-createfunction.html
Profiling queries for bottlenecks and optimization
EXPLAIN SELECT COUNT(*)
FROM Covid
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) SELECT COUNT(*)
FROM Covid
Reference: https://www.postgresql.org/docs/10/sql-explain.html
SELECT VERSION()
SELECT *
FROM information_schema.tables
WHERE table_schema = 'public'
ORDER BY table_name;
Other table_schema
types are information_schema
and pg_catalog
.
Reference: https://www.postgresql.org/docs/10/infoschema-tables.html
Alternatively,
SELECT *
FROM pg_catalog.pg_tables
WHERE schemaname != 'pg_catalog' AND schemaname != 'information_schema';
Reference: https://www.postgresql.org/docs/10/view-pg-tables.html
SELECT *
FROM information_schema.columns
WHERE table_name = 'vehicle_logs';
Reference: https://www.postgresql.org/docs/10/infoschema-columns.html
Query the total size of each table (including indexes and toast tables) in the current database (use pg_relation_size
to exclude indexes (pg_indexes_size
) and toast tables)
SELECT relname AS relation
,pg_size_pretty(pg_total_relation_size(C.oid)) AS total_size
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size (C.oid) DESC
Reference: https://www.postgresql.org/docs/10/functions-admin.html https://stackoverflow.com/questions/41991380/whats-the-difference-between-pg-table-size-pg-relation-size-pg-total-relatio
Query the size of each database in the current database server
SELECT pg_database.datname
,pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database;
Reference: https://www.postgresqltutorial.com/postgresql-database-indexes-table-size/
Query to find out all tables with TOAST tables
SELECT oid
,oid::regclass
,reltoastrelid
,reltoastrelid::regclass
,pg_size_pretty(pg_relation_size(reltoastrelid)) AS toast_size
FROM pg_class
WHERE relkind = 'r'
AND reltoastrelid <> 0
ORDER BY toast_size DESC;
Reference:
- https://www.postgresql.org/docs/10.0/catalog-pg-class.html
- https://dba.stackexchange.com/questions/264691/understanding-where-pg-toast-is-coming-from
- https://stackoverflow.com/questions/41991380/whats-the-difference-between-pg-table-size-pg-relation-size-pg-total-relatio
CREATE MATERIALIZED VIEW matview AS (
SELECT *
FROM table
)
SELECT *
FROM pg_matviews
Here is some syntax to create a new role, granting privileges and a new user
-- Create a new role with privileges
CREATE ROLE readaccess LOGIN;
-- Grant on existing objects
GRANT CONNECT ON DATABASE the_db TO readaccess;
GRANT USAGE ON SCHEMA public TO readaccess;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO readaccess;
GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO readaccess;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA public TO readaccess;
-- Grant for new objects
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO readaccess;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON SEQUENCES TO readaccess;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT EXECUTE ON FUNCTIONS TO readaccess;
-- Create a user with password and grant existing role to user
CREATE USER alex WITH LOGIN PASSWORD 'cao';
GRANT readaccess TO alex;
References:
https://www.postgresql.org/docs/10/sql-alterdefaultprivileges.html
https://wiki.postgresql.org/images/d/d1/Managing_rights_in_postgresql.pdf
Here is some syntax to create a new role, granting privileges and a new user
-- Start transaction block
BEGIN;
-- Create a new role with privileges
CREATE ROLE readwriteaccess;
-- Grant on existing objects
GRANT CREATE, CONNECT ON DATABASE mercury_hiroshima TO readwriteaccess;
GRANT CREATE ON SCHEMA public TO readwriteaccess;
GRANT USAGE ON SCHEMA public TO readwriteaccess;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO readwriteaccess;
GRANT SELECT, UPDATE ON ALL SEQUENCES IN SCHEMA public TO readwriteaccess;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA public TO readwriteaccess;
-- Grant for new objects
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO readwriteaccess;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT, UPDATE ON SEQUENCES TO readwriteaccess;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT EXECUTE ON FUNCTIONS TO readwriteaccess;
-- End transaction block
COMMIT;
REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA public FROM <role>;
CREATE ROLE certain_column_access;
GRANT SELECT|INSERT|UPDATE|DELETE (colA, colB, colC) ON <table> TO certain_column_access;
ACL Privilege abbereviations.
Privilege | Abbreviation | Applicable Object Types |
---|---|---|
SELECT | r (“read”) | LARGE OBJECT, SEQUENCE, TABLE (and table-like objects), table column |
INSERT | a (“append”) | TABLE, table column |
UPDATE | w (“write”) | LARGE OBJECT, SEQUENCE, TABLE, table column |
DELETE | d | TABLE |
TRUNCATE | D | TABLE |
REFERENCES | x | TABLE, table column |
TRIGGER | t | TABLE |
The command GRANT ALL PRIVILEGES ON TABLE <tablename> TO readwriteaccess
grants the above 7 privileges.
https://www.postgresql.org/docs/12/ddl-priv.html
To rollback an unsuccessful transaction OR when you see this message ERROR: current transaction is aborted, commands ignored until end of transaction block
.
ROLLBACK TRANSACTION;
REVOKE readwriteaccess FROM alex;
The following sql code generates more sql code to alter table owners
SELECT 'ALTER TABLE ' || tablename || ' OWNER TO new_owner;'
FROM pg_tables
WHERE tableowner = 'old_owner';
Then you just execute the newly generated code.
Assuming you already have a trigger function defined, you can set it up with this command (command differs slightly for v10: PROCEDURE
instead of FUNCTION
).
You can trigger on INSERT, UPDATE, DELETE
etc...
CREATE TRIGGER <trigger_name>
AFTER INSERT ON <table_name>
FOR EACH ROW EXECUTE PROCEDURE <trigger_function()>;
To disable an existing trigger functon on a table
ALTER TABLE <table_name>
DISABLE TRIGGER <trigger_name>
To drop an existing trigger function
DROP TRIGGER <trigger_name> ON <table_name>;
CREATE INDEX name_idx ON table (column);
SELECT *
FROM pg_indexes
WHERE schemaname = 'public'
Reference: https://www.postgresql.org/docs/10/view-pg-indexes.html
SELECT schema_name
FROM information_schema.schemata;
PostgreSQL uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly. To overcome this limitation, large field values are compressed and/or broken up into multiple physical rows. The technique is affectionately known as TOAST. The TOAST infrastructure is also used to improve handling of large data values in-memory.
Reference: https://www.postgresql.org/docs/10/storage-toast.html
ALTER TABLE vehicle_dash_image
SET (
autovacuum_vacuum_threshold=120
,autovacuum_vacuum_cost_delay=20
)
ALTER TABLE vehicle_dash_image
RESET (
autovacuum_vacuum_threshold
,autovacuum_vacuum_cost_delay
)
References:
- https://www.postgresql.org/docs/10/runtime-config-autovacuum.html
- https://www.postgresql.org/docs/10/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-VACUUM-COST
- DDL - Data Definition Language
- DQL - Data Query Language
- DML - Data Manipulation Language
- DCL - Data Control Language
- TCL - Transaction Control Language
Consists of the SQL commands that can be used to define the database schema. It simply deals with descriptions of the database schema and is used to create and modify the structure of database objects in the database.
- CREATE
- ALTER
- DROP
- RENAME
- TRUNCATE
- COMMENT
DQL statements are used for performing queries on the data within schema objects. The purpose of DQL Command is to get some schema relation based on the query passed to it.
- SELECT
The SQL commands that deals with the manipulation of data present in the database belong to DML or Data Manipulation Language and this includes most of the SQL statements.
- INSERT
- UPDATE
- DELETE
- MERGE
- CALL
- EXPLAIN PLAN
- LOCK TABLE
DCL mainly deals with the rights, permissions and other controls of the database system.
- GRANT
- REVOKE
TCL commands deals with the transaction within the database.
- COMMIT
- ROLLBACK
- SAVEPOINT
- SET TRANSACTION
Reference: https://www.geeksforgeeks.org/sql-ddl-dql-dml-dcl-tcl-commands/
psql is a terminal-based front-end to PostgreSQL. You can use a local instance of the psql command line utility to connect to a PostgreSQL DB instance. You need either PostgreSQL or the psql client installed on your client computer.
psql --host=<hostname> --username=<user> --password --dbname=<database-name> --port=5432
# Alternatively
psql -h <hostname> -U <user> -W -d <database-name> --port=5432
Password prompt will appear after command is entered.
Command | Description |
---|---|
\l |
Show databases |
\c <database> |
Switch to different database closing previous connection |
\dt |
Show tables in database |
\dt+ |
Show tables in database with more info |
\d+ <tablename> |
Describe a table with details |
\du |
Show users and their roles |
\ddp |
Show default access privileges |
\dpp |
Show access privileges for tables, sequences, materialized views:qqq:q |
\g |
Execute previous command |
\s |
Command history |
\i <filename> |
Execute psql commands from filename |
\? |
Help |
\h <COMMAND> |
Specific Help |
\timing |
Turn on query execution time |
\q |
Quit psql |
REFERENCE: https://www.postgresqltutorial.com/psql-commands/
You can write multi-line queries directly from the prompt. Terminate lines with Enter. Terminate queries with a ;
.