Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prevent expensive queries to gtfs-api (PostgREST) #36

Merged
merged 4 commits into from
Nov 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .env
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,14 @@ DAGSTER_POSTGRES_DB=postgres_db
IPL_LAMASSU_BASE_URL=http://lamassu:8080

# gtfs-db variables
IPL_GTFS_DB_IMAGE=postgis/postgis:15-3.3-alpine
IPL_GTFS_DB_IMAGE=ghcr.io/mobidata-bw/postgis-with-pg-plan-filter
IPL_GTFS_DB_POSTGRES_USER=postgres
IPL_GTFS_DB_POSTGRES_PASSWORD=password
IPL_GTFS_DB_POSTGRES_DB=gtfs_importer
IPL_GTFS_DB_POSTGRES_DB_PREFIX=gtfs
# `postgrest` role used by the gtfs-api service (PostgREST)
IPL_GTFS_DB_POSTGREST_USER=postgrest
IPL_GTFS_DB_POSTGREST_PASSWORD=password

# pgbouncer variables
PGBOUNCER_IMAGE=bitnami/pgbouncer:1
Expand Down
2 changes: 2 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -481,6 +481,8 @@ services:
# path to the file containing the latest import's DB name
GTFS_IMPORTER_DSN_FILE: /var/gtfs/pgbouncer-dsn.txt
GTFS_TMP_DIR: /var/gtfs
POSTGREST_USER: ${IPL_GTFS_DB_POSTGREST_USER}
POSTGREST_PASSWORD: ${IPL_GTFS_DB_POSTGREST_PASSWORD}

gtfs-swagger-ui:
networks: [ipl]
Expand Down
44 changes: 44 additions & 0 deletions etc/gtfs/sql.d/40-postgrest-cost-limit.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
CREATE EXTENSION IF NOT EXISTS plan_filter;

-- Make sure plan_filter is auto-loaded whenever connecting to the current DB.
-- Note: `current_database()` cannot be used in `ALTER DATABASE`, so we use `EXECUTE`.
-- https://dba.stackexchange.com/a/78381

CREATE FUNCTION _alter_db (db TEXT, param TEXT, val TEXT)
RETURNS void
AS $$
BEGIN
EXECUTE format('ALTER DATABASE %I SET %I = %L', db, param, val);
END
$$
LANGUAGE plpgsql;
-- effectively ALTER DATABASE current_database() SET session_preload_libraries = 'plan_filter';
SELECT _alter_db(current_database(), 'session_preload_libraries', 'plan_filter');
DROP FUNCTION _alter_db;

-- Each configuration parameter exists within a scope.
-- Configuration parameters like plan_filter.statement_cost_limit that exist in a *more secure* scope than `user` (e.g. `superuser` or `sighup`) *do not* get adapted when changing to a different role (`web_anon`) by impersonating it via `SET ROLE`!
-- > This is because according to the pg docs executing SET ROLE does not cause new configuration values to be set, this only happens at login time. I have an example of this here.
-- https://github.com/PostgREST/postgrest/issues/249#issuecomment-334629208
-- > The possible values of context are:
-- > - internal (called PGC_INTERNAL in source code)
-- > - postmaster (PGC_POSTMASTER)
-- > - sighup (PGC_SIGHUP)
-- > - backend (PGC_BACKEND)
-- > - superuser (PGC_SUSET)
-- > - user (PGC_USERSET)
-- > The above list is in order of when a parameter can be set; if a parameter can be changed in a certain context, then it can be changed at any of the earlier contexts as well.
-- https://www.enterprisedb.com/blog/understanding-postgres-parameter-context
-- > However if we want to set different statement_timeout for anon and web_user. […] These will not be picked up and their statement_timeout won't be enforced.
-- https://github.com/PostgREST/postgrest/issues/2561#issue-1436728326

-- However, we can still define a cost limit for operations on the *`postgrest` user* [1] (PostgREST uses it to connect to the DB, before switching to `web_anon`); PostgREST then re-applies it manually when it switches the role (`SET ROLE web_anon`) in each transaction [2].
-- Because that means that even PostgREST proper can't run very expensive queries (e.g. the schema introspection query at startup) anymore, we need to find a balance between not hindering PostgREST and preventing excessive queries by users. 1000000 is roughly equivalent to 0.2s-5s of execution time.
-- [1] https://postgrest.org/en/stable/references/auth.html#overview-of-role-system
-- [2] https://postgrest.org/en/stable/references/auth.html#impersonated-role-settings

-- This currently doesn't work as expected though, PostgREST doesn't seem to re-apply `plan_filter.statement_cost_limit` to `web_anon`. [3]
-- [3] https://github.com/PostgREST/postgrest/issues/3045

-- Limit postgrest's & web_anon's queries by the query planner's estimated cost.
ALTER ROLE postgrest SET plan_filter.statement_cost_limit = 1000000.0;
3 changes: 2 additions & 1 deletion gtfs-importer/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,8 @@ RUN \
&& chmod +x /usr/local/bin/gtfstidy

# todo: gtfs-via-postgres is Prosperity-dual-licensed, obtain a purely Apache-licensed version
RUN npm install -g gtfs-via-postgres@4
# todo: Docker layer caching won't pull the latest 4.x release
RUN npm install -g gtfs-via-postgres@'^4.8.2'

ADD package.json ./
RUN npm install --omit dev && npm cache clean --force
Expand Down
1 change: 1 addition & 0 deletions gtfs-importer/import.sh
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ print_bold 'Done!'

cat <<EOF
Run PostgREST with the following environment variables:'
PGUSER=postgrest
PGRST_DB_SCHEMAS=api
PGRST_DB_ANON_ROLE=web_anon
EOF
11 changes: 8 additions & 3 deletions gtfs-importer/importer.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import {fileURLToPath} from 'node:url'
import _pg from 'pg'
const {Client} = _pg
import pgFormat from 'pg-format'
import {ok} from 'node:assert'
import {writeFile} from 'node:fs/promises'

const PATH_TO_IMPORT_SCRIPT = fileURLToPath(new URL('import.sh', import.meta.url).href)
Expand Down Expand Up @@ -127,10 +128,14 @@ try {
// https://www.postgresql.org/docs/15/libpq-connect.html#id-1.7.3.8.3.5
const {
PGHOST,
PGUSER,
PGPASSWORD,
POSTGREST_USER,
POSTGREST_PASSWORD,
} = process.env
const dsn = `gtfs=host=${PGHOST} dbname=${dbName} user=${PGUSER} password=${PGPASSWORD}`
ok(PGHOST, 'missing/empty $PGHOST')
ok(POSTGREST_USER, 'missing/empty $POSTGREST_USER')
ok(POSTGREST_PASSWORD, 'missing/empty $POSTGREST_PASSWORD')
derhuerst marked this conversation as resolved.
Show resolved Hide resolved

const dsn = `gtfs=host=${PGHOST} dbname=${dbName} user=${POSTGREST_USER} password=${POSTGREST_PASSWORD}`
console.info(`writing "${dsn}" into env file ${PATH_TO_DSN_FILE}`)
await writeFile(PATH_TO_DSN_FILE, dsn)
}
Expand Down