PostgREST is a fast way to construct a RESTful API. Its default behavior is great for scaffolding in development. When it's time to go to production it works great too, as long as you take precautions. PostgREST is a small sharp tool that focuses on performing the API-to-database mapping. We rely on a reverse proxy like Nginx for additional safeguards.
The first step is to create an Nginx configuration file that proxies requests to an underlying PostgREST server.
http {
# ...
# upstream configuration
upstream postgrest {
server localhost:3000;
}
# ...
server {
# ...
# expose to the outside world
location /api/ {
default_type application/json;
proxy_hide_header Content-Location;
add_header Content-Location /api/$upstream_http_content_location;
proxy_set_header Connection "";
proxy_http_version 1.1;
proxy_pass http://postgrest/;
}
# ...
}
}
Note
For ubuntu, if you already installed nginx through apt
you can add this to the config file in
/etc/nginx/sites-enabled/default
.
Each table in the admin-selected schema gets exposed as a top level route. Client requests are executed by certain database roles depending on their authentication. All HTTP verbs are supported that correspond to actions permitted to the role. For instance if the active role can drop rows of the table then the DELETE verb is allowed for clients. Here's an API request to delete old rows from a hypothetical logs table:
.. tabs:: .. code-tab:: http DELETE /logs?time=lt.1991-08-06 HTTP/1.1 .. code-tab:: bash Curl curl "http://localhost:3000/logs?time=lt.1991-08-06" -X DELETE
However it's very easy to delete the entire table by omitting the query parameter!
.. tabs:: .. code-tab:: http DELETE /logs HTTP/1.1 .. code-tab:: bash Curl curl "http://localhost:3000/logs" -X DELETE
This can happen accidentally such as by switching a request from a GET to a DELETE. To protect against accidental operations use the pg-safeupdate PostgreSQL extension. It raises an error if UPDATE or DELETE are executed without specifying conditions. To install it you can use the PGXN network:
sudo -E pgxn install safeupdate
# then add this to postgresql.conf:
# shared_preload_libraries='safeupdate';
This does not protect against malicious actions, since someone can add a url parameter that does not affect the result set. To prevent this you must turn to database permissions, forbidding the wrong people from deleting rows, and using row-level security if finer access control is required.
For convenience to client-side pagination controls PostgREST supports counting and reporting total table size in its response. As described in :ref:`limits`, responses ordinarily include a range but leave the total unspecified like
HTTP/1.1 200 OK
Range-Unit: items
Content-Range: 0-14/*
However including the request header Prefer: count=exact
calculates and includes the full count:
HTTP/1.1 206 Partial Content
Range-Unit: items
Content-Range: 0-14/3573458
This is fine in small tables, but count performance degrades in big tables due to the MVCC architecture of PostgreSQL. For very large tables it can take a very long time to retrieve the results which allows a denial of service attack. The solution is to strip this header from all requests:
-- Pending nginx config: Remove any prefer header which contains the word count
PostgREST aims to do one thing well: add an HTTP interface to a PostgreSQL database. To keep the code small and focused we do not implement HTTPS. Use a reverse proxy such as NGINX to add this, here's how. Note that some Platforms as a Service like Heroku also add SSL automatically in their load balancer.
Nginx supports "leaky bucket" rate limiting (see official docs). Using standard Nginx configuration, routes can be grouped into request zones for rate limiting. For instance we can define a zone for login attempts:
limit_req_zone $binary_remote_addr zone=login:10m rate=1r/s;
This creates a shared memory zone called "login" to store a log of IP addresses that access the rate limited urls. The space reserved, 10 MB (10m
) will give us enough space to store a history of 160k requests. We have chosen to allow only allow one request per second (1r/s
).
Next we apply the zone to certain routes, like a hypothetical stored procedure called login
.
location /rpc/login/ {
# apply rate limiting
limit_req zone=login burst=5;
}
The burst argument tells Nginx to start dropping requests if more than five queue up from a specific IP.
Nginx rate limiting is general and indiscriminate. To rate limit each authenticated request individually you will need to add logic in a :ref:`Custom Validation <custom_validation>` function.
PostgREST manages its :ref:`own pool of connections <db-pool>` and uses prepared statements by default in order to increase performance. However, this setting is incompatible with external connection poolers such as PgBouncer working in transaction pooling mode. In this case, you need to set the :ref:`db-prepared-statements` config option to false
. On the other hand, session pooling is fully compatible with PostgREST, while statement pooling is not compatible at all.
Note
If prepared statements are enabled, PostgREST will quit after detecting that transaction or statement pooling is being used.
You should also set the :ref:`db-channel-enabled` config option to false
, due to the LISTEN
command not being compatible with transaction pooling, although it should not give any errors if it's left enabled by default.
When debugging a problem it's important to verify the PostgREST version. At any time you can make a request to the running server and determine exactly which version is deployed. Look for the Server
HTTP response header, which contains the version number.
PostgREST logs basic request information to stdout
, including the authenticated user if available, the requesting IP address and user agent, the URL requested, and HTTP response status.
127.0.0.1 - user [26/Jul/2021:01:56:38 -0500] "GET /clients HTTP/1.1" 200 - "" "curl/7.64.0" 127.0.0.1 - anonymous [26/Jul/2021:01:56:48 -0500] "GET /unexistent HTTP/1.1" 404 - "" "curl/7.64.0"
For diagnostic information about the server itself, PostgREST logs to stderr
.
12/Jun/2021:17:47:39 -0500: Attempting to connect to the database... 12/Jun/2021:17:47:39 -0500: Listening on port 3000 12/Jun/2021:17:47:39 -0500: Connection successful 12/Jun/2021:17:47:39 -0500: Config re-loaded 12/Jun/2021:17:47:40 -0500: Schema cache loaded
Note
When running it in an SSH session you must detach it from stdout or it will be terminated when the session closes. The easiest technique is redirecting the output to a log file or to the syslog:
ssh [email protected] \
'postgrest foo.conf </dev/null >/var/log/postgrest.log 2>&1 &'
# another option is to pipe the output into "logger -t postgrest"
PostgREST logging provides limited information for debugging server errors. It's helpful to get full information about both client requests and the corresponding SQL commands executed against the underlying database.
A great way to inspect incoming HTTP requests including headers and query parameters is to sniff the network traffic on the port where PostgREST is running. For instance on a development server bound to port 3000 on localhost, run this:
# sudo access is necessary for watching the network
sudo ngrep -d lo0 port 3000
The options to ngrep vary depending on the address and host on which you've bound the server. The binding is described in the :ref:`configuration` section. The ngrep output isn't particularly pretty, but it's legible.
When PostgREST loses the connection to the database, it retries the connection using capped exponential backoff, with 32 seconds being the maximum backoff time.
This retry behavior is triggered immediately after the connection is lost if :ref:`db-channel-enabled` is set to true(the default), otherwise it will be activated once a request is made.
To notify the client when the next reconnection attempt will be, PostgREST responds with 503 Service Unavailable
and the Retry-After: x
header, where x
is the number of seconds programmed for the next retry.
Once you've verified that requests are as you expect, you can get more information about the server operations by watching the database logs. By default PostgreSQL does not keep these logs, so you'll need to make the configuration changes below. Find postgresql.conf
inside your PostgreSQL data directory (to find that, issue the command show data_directory;
). Either find the settings scattered throughout the file and change them to the following values, or append this block of code to the end of the configuration file.
# send logs where the collector can access them
log_destination = "stderr"
# collect stderr output to log files
logging_collector = on
# save logs in pg_log/ under the pg data directory
log_directory = "pg_log"
# (optional) new log file per day
log_filename = "postgresql-%Y-%m-%d.log"
# log every kind of SQL statement
log_statement = "all"
Restart the database and watch the log file in real-time to understand how HTTP requests are being translated into SQL commands.
Note
On Docker you can enable the logs by using a custom init.sh
:
#!/bin/sh
echo "log_statement = 'all'" >> /var/lib/postgresql/data/postgresql.conf
After that you can start the container and check the logs with docker logs
.
docker run -v "$(pwd)/init.sh":"/docker-entrypoint-initdb.d/init.sh" -d postgres
docker logs -f <container-id>
Changing the schema while the server is running can lead to errors due to a stale schema cache. To learn how to refresh the cache see :ref:`schema_reloading`.
For Linux distributions that use systemd (Ubuntu, Debian, Archlinux) you can create a daemon in the following way.
First, create postgrest configuration in /etc/postgrest/config
db-uri = "postgres://<your_user>:<your_password>@localhost:5432/<your_db>"
db-schemas = "<your_exposed_schema>"
db-anon-role = "<your_anon_role>"
jwt-secret = "<your_secret>"
Then create the systemd service file in /etc/systemd/system/postgrest.service
[Unit]
Description=REST API for any PostgreSQL database
After=postgresql.service
[Service]
ExecStart=/bin/postgrest /etc/postgrest/config
ExecReload=/bin/kill -SIGUSR1 $MAINPID
[Install]
WantedBy=multi-user.target
After that, you can enable the service at boot time and start it with:
systemctl enable postgrest
systemctl start postgrest
## For reloading the service
## systemctl restart postgrest
As discussed in :ref:`singular_plural`, there are no special URL forms for singular resources in PostgREST, only operators for filtering. Thus there are no URLs like /people/1
. It would be specified instead as
.. tabs:: .. code-tab:: http GET /people?id=eq.1 HTTP/1.1 Accept: application/vnd.pgrst.object+json .. code-tab:: bash Curl curl "http://localhost:3000/people?id=eq.1" \ -H "Accept: application/vnd.pgrst.object+json"
This allows compound primary keys and makes the intent for singular response independent of a URL convention.
Nginx rewrite rules allow you to simulate the familiar URL convention. The following example adds a rewrite rule for all table endpoints, but you'll want to restrict it to those tables that have a numeric simple primary key named "id."
# support /endpoint/:id url style
location ~ ^/([a-z_]+)/([0-9]+) {
# make the response singular
proxy_set_header Accept 'application/vnd.pgrst.object+json';
# assuming an upstream named "postgrest"
proxy_pass http://postgrest/$1?id=eq.$2;
}