Skip to content

Commit

Permalink
docs: reformat documentation (#353)
Browse files Browse the repository at this point in the history
Changes applied:

- Fixing some typos
- Apply 80 column limit on text lines
- Remove some tabs
- Remove some duplicated empty lines
- Capitalise SQL keywords
  • Loading branch information
hauleth authored Jun 19, 2024
1 parent 85520e1 commit a76efaa
Show file tree
Hide file tree
Showing 15 changed files with 214 additions and 99 deletions.
85 changes: 61 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,29 +14,60 @@

## Overview

Supavisor is a scalable, cloud-native Postgres connection pooler. A Supavisor cluster is capable of proxying millions of Postgres end-client connections into a stateful pool of native Postgres database connections.
Supavisor is a scalable, cloud-native Postgres connection pooler. A Supavisor
cluster is capable of proxying millions of Postgres end-client connections into
a stateful pool of native Postgres database connections.

For database managers, Supavisor simplifies the task of managing Postgres clusters by providing easy configuration of highly available Postgres clusters ([todo](#future-work)).
For database managers, Supavisor simplifies the task of managing Postgres
clusters by providing easy configuration of highly available Postgres clusters
([todo](#future-work)).

## Motivation

We have several goals with Supavisor:

- **Zero-downtime scaling**: we want to scale Postgres server compute with zero-downtime. To do this, we need an external Pooler that can buffer and re-route requests while the resizing operation is in progress.
- **Handling modern connection demands**: We need a Pooler that can absorb millions of connections. We often see developers connecting to Postgres from Serverless environments, and so we also need something that works with both TCP and HTTP protocols.
- **Efficiency**: Our customers pay for database processing power, and our goal is to maximize their database capacity. While PgBouncer is resource-efficient, it still consumes some resources on the database instance. By moving connection pooling to a dedicated cluster adjacent to tenant databases, we can free up additional resources to better serve customer queries.
- **Zero-downtime scaling**: we want to scale Postgres server compute with
zero-downtime. To do this, we need an external Pooler that can buffer and
re-route requests while the resizing operation is in progress.
- **Handling modern connection demands**: We need a Pooler that can absorb
millions of connections. We often see developers connecting to Postgres from
Serverless environments, and so we also need something that works with both TCP
and HTTP protocols.
- **Efficiency**: Our customers pay for database processing power, and our goal
is to maximize their database capacity. While PgBouncer is resource-efficient,
it still consumes some resources on the database instance. By moving connection
pooling to a dedicated cluster adjacent to tenant databases, we can free up
additional resources to better serve customer queries.

## Architecture

Supavisor was designed to work in a cloud computing environment as a highly available cluster of nodes. Tenant configuration is stored in a highly available Postgres database. Configuration is loaded from the Supavisor database when a tenant connection pool is initiated.

Connection pools are dynamic. When a tenant client connects to the Supavisor cluster the tenant pool is started and all connections to the tenant database are established. The process ID of the new tenant pool is then distributed to all nodes of the cluster and stored in an in-memory key-value store. Subsequent tenant client connections live on the inbound node but connection data is proxied from the pool node to the client connection node as needed.

Because the count of Postgres connections is constrained only one tenant connection pool should be alive in a Supavisor cluster. In the case of two simultaneous client connections starting a pool, as the pool process IDs are distributed across the cluster, eventually one of those pools is gracefully shutdown.

The dynamic nature of tenant database connection pools enables high availability in the event of node outages. Pool processes are monitored by each node. If a node goes down that process ID is removed from the cluster. Tenant clients will then start a new pool automatically as they reconnect to the cluster.

This design enables blue-green or rolling deployments as upgrades require. A single VPC / multiple availability zone topologies is possible and can provide for greater redundancy when load balancing queries across read replicas are supported ([todo](#future-work)).
Supavisor was designed to work in a cloud computing environment as a highly
available cluster of nodes. Tenant configuration is stored in a highly available
Postgres database. Configuration is loaded from the Supavisor database when a
tenant connection pool is initiated.

Connection pools are dynamic. When a tenant client connects to the Supavisor
cluster the tenant pool is started and all connections to the tenant database
are established. The process ID of the new tenant pool is then distributed to
all nodes of the cluster and stored in an in-memory key-value store. Subsequent
tenant client connections live on the inbound node but connection data is
proxied from the pool node to the client connection node as needed.

Because the count of Postgres connections is constrained only one tenant
connection pool should be alive in a Supavisor cluster. In the case of two
simultaneous client connections starting a pool, as the pool process IDs are
distributed across the cluster, eventually one of those pools is gracefully
shutdown.

The dynamic nature of tenant database connection pools enables high availability
in the event of node outages. Pool processes are monitored by each node. If a
node goes down that process ID is removed from the cluster. Tenant clients will
then start a new pool automatically as they reconnect to the cluster.

This design enables blue-green or rolling deployments as upgrades require. A
single VPC / multiple availability zone topologies is possible and can provide
for greater redundancy when load balancing queries across read replicas are
supported ([todo](#future-work)).

<p align="center">
<img src="https://user-images.githubusercontent.com/8291514/230757493-669bf563-084c-4705-b22e-38d398f4ec05.svg#gh-light-mode-only">
Expand Down Expand Up @@ -68,23 +99,27 @@ This design enables blue-green or rolling deployments as upgrades require. A sin
- NOT run in a serverless environment
- NOT dependant on Kubernetes
- Observable
- Easily understand throughput by tenant, tenant database or individual connection
- Easily understand throughput by tenant, tenant database or individual
connection
- Prometheus `/metrics` endpoint
- Manageable
- OpenAPI spec at `/api/openapi`
- SwaggerUI at `/swaggerui`
- Highly available
- When deployed as a Supavisor cluster and a node dies connection pools should be quickly spun up or already available on other nodes when clients reconnect
- When deployed as a Supavisor cluster and a node dies connection pools should
be quickly spun up or already available on other nodes when clients reconnect
- Connection buffering
- Brief connection buffering for transparent database restarts or failovers

## Future Work

- Load balancing
- Queries can be load balanced across read-replicas
- Load balancing is independant of Postgres high-availability management (see below)
- Load balancing is independent of Postgres high-availability management (see
below)
- Query caching
- Query results are optionally cached in the pool cluster and returned before hitting the tenant database
- Query results are optionally cached in the pool cluster and returned before
hitting the tenant database
- Session pooling
- Like `PgBouncer`
- Multi-protocol Postgres query interface
Expand All @@ -96,8 +131,9 @@ This design enables blue-green or rolling deployments as upgrades require. A sin
- Health checks
- Push button read-replica configuration
- Config as code
- Not only for the supavisor cluster but tenant databases and tenant database clusters as well
- Pulumi / terraform support
- Not only for the Supavisor cluster but tenant databases and tenant database
clusters as well
- Pulumi / Terraform support

## Benchmarks

Expand Down Expand Up @@ -152,16 +188,17 @@ tps = 189.228103 (without initial connection time)
- Supavisor two node cluster
- 64vCPU / 246RAM
- Ubuntu 22.04.2 aarch64
- 1_003_200 concurrent client connection
- 20_000+ QPS
- 1 003 200 concurrent client connection
- 20 000+ QPS
- 400 tenant Postgres connection
- `select * from (values (1, 'one'), (2, 'two'), (3, 'three')) as t (num,letter);`
- `SELECT * FROM (VALUES (1, 'one'), (2, 'two'), (3, 'three')) AS t (num, letter);`
- ~50% CPU utilization (pool owner node)
- 7.8G RAM usage

## Acknowledgements

[José Valim](https://github.com/josevalim) and the [Dashbit](https://dashbit.co/) team were incredibly helpful in informing the design decisions for Supavisor.
[José Valim](https://github.com/josevalim) and the [Dashbit](https://dashbit.co/) team were incredibly helpful in informing
the design decisions for Supavisor.

## Inspiration

Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.1.64
1.1.65
9 changes: 6 additions & 3 deletions docs/configuration/pool_modes.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
Configure the `mode_type` on the `user` to set how Supavisor connection pools will behave.
Configure the `mode_type` on the `user` to set how Supavisor connection pools
will behave.

The `mode_type` can be one of:

Expand All @@ -8,11 +9,13 @@ The `mode_type` can be one of:

## Transaction Mode

`transaction` mode assigns a connection to a client for the duration of a single transaction.
`transaction` mode assigns a connection to a client for the duration of a single
transaction.

## Session Mode

`session` mode assigns a connection to a client for the duration of the client connection.
`session` mode assigns a connection to a client for the duration of the client
connection.

## Native Mode

Expand Down
18 changes: 12 additions & 6 deletions docs/configuration/tenants.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
All configuration options for a tenant are stored on the `tenant` record in the metadata database used by Supavisor.
All configuration options for a tenant are stored on the `tenant` record in the
metadata database used by Supavisor.

A `tenant` is looked via the `external_id` discovered in the incoming client connection.
A `tenant` is looked via the `external_id` discovered in the incoming client
connection.

All `tenant` fields and their types are defined in the `Supavisor.Tenants.Tenant` module.
All `tenant` fields and their types are defined in the
`Supavisor.Tenants.Tenant` module.

## Field Descriptions

Expand All @@ -22,13 +25,16 @@ All `tenant` fields and their types are defined in the `Supavisor.Tenants.Tenant

`upstream_verify` - how to verify the ssl certificate

`upstream_tls_ca` - the ca certificate to use when connecting to the database server
`upstream_tls_ca` - the ca certificate to use when connecting to the database
server

`enforce_ssl` - enforce an SSL connection on client connections

`require_user` - require client connection credentials to match `user` credentials in the metadata database
`require_user` - require client connection credentials to match `user`
credentials in the metadata database

`auth_query` - the query to use when matching credential agains a client connection
`auth_query` - the query to use when matching credential agains a client
connection

`default_pool_size` - the default size of the database pool

Expand Down
15 changes: 10 additions & 5 deletions docs/configuration/users.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
All configuration options for a tenant `user` are stored on the `user` record in the metadata database used by Supavisor.
All configuration options for a tenant `user` are stored on the `user` record in
the metadata database used by Supavisor.

All `user` fields and their types are defined in the `Supavisor.Tenants.User` module.
All `user` fields and their types are defined in the `Supavisor.Tenants.User`
module.

## Field Descriptions

Expand All @@ -10,12 +12,15 @@ All `user` fields and their types are defined in the `Supavisor.Tenants.User` mo

`db_user_alias` - client connection user will also match this user record

`is_manager` - these credentials are used to perform management queries against the tenant database
`is_manager` - these credentials are used to perform management queries against
the tenant database

`mode_type` - the pool mode type

`pool_size` - the database connection pool size used to override `default_pool_size` on the `tenant`
`pool_size` - the database connection pool size used to override
`default_pool_size` on the `tenant`

`pool_checkout_timeout` - the maximum duration allowed for a client connection to checkout a database connection from the pool
`pool_checkout_timeout` - the maximum duration allowed for a client connection
to checkout a database connection from the pool

`max_clients` - the maximum amount of client connections allowed for this user
13 changes: 9 additions & 4 deletions docs/connecting/authentication.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,20 @@
When a client connection is established Supavisor needs to verify the credentials of the connection.
When a client connection is established Supavisor needs to verify the
credentials of the connection.

Credential verificiation is done either via `user` records or an `auth_query`.

## Tenant User Record

If no `auth_query` exists on the `tenant` record credentials will be looked up from a `user` and verified against the client connection string credentials.
If no `auth_query` exists on the `tenant` record credentials will be looked up
from a `user` and verified against the client connection string credentials.

There must be one or more `user` records for a `tenant` where `is_manager` is `false`.

## Authentication Query

If the `user` in the client connection is not found for a `tenant` it will use the `user` where `is_manager` is `true` and the `auth_query` on the `tenant` to return matching credentials from the tenant database.
If the `user` in the client connection is not found for a `tenant` it will use
the `user` where `is_manager` is `true` and the `auth_query` on the `tenant` to
return matching credentials from the tenant database.

A simple `auth_query` can be:

Expand Down Expand Up @@ -43,7 +47,8 @@ REVOKE ALL ON FUNCTION supavisor.get_auth(p_usename TEXT) FROM PUBLIC;
GRANT EXECUTE ON FUNCTION supavisor.get_auth(p_usename TEXT) TO supavisor;
```

Update the `auth_query` on the `tenant` and it will use this query to match against client connection credentials.
Update the `auth_query` on the `tenant` and it will use this query to match
against client connection credentials.

```sql
SELECT * FROM supavisor.get_auth($1)
Expand Down
9 changes: 6 additions & 3 deletions docs/connecting/overview.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
To connect to a tenant database Supavisor needs to look up the tenant with an `external_id`.
To connect to a tenant database Supavisor needs to look up the tenant with an
`external_id`.

You can connect to Supavisor just like you connect to Postgres except we need to include the `external_id` in the connection string.
You can connect to Supavisor just like you connect to Postgres except we need to
include the `external_id` in the connection string.

Supavisor parses the `external_id` from a connection in one three ways:

Expand All @@ -14,7 +16,8 @@ Supavisor parses the `external_id` from a connection in one three ways:
## Username

Include the `external_id` in the username. The `external_id` is found after the `.` in the username:
Include the `external_id` in the username. The `external_id` is found after
the `.` (dot) in the username:

```
psql postgresql://postgres.dev_tenant:postgres@localhost:6543/postgres
Expand Down
9 changes: 6 additions & 3 deletions docs/deployment/fly.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,18 @@ Type the following command in your terminal:
fly launch
```

Choose a name for your app when prompted, then answer "yes" to the following question:
Choose a name for your app when prompted, then answer "yes" to the following
question:

```bash
Would you like to copy its configuration to the new app? (y/N)
```

Next, select an organization and choose a region. You don't need to deploy the app yet.
Next, select an organization and choose a region. You don't need to deploy the
app yet.

Since the pooler uses an additional port (7654) for the PostgreSQL protocol, you need to reserve an IP address:
Since the pooler uses an additional port (7654) for the PostgreSQL protocol, you
need to reserve an IP address:

```bash
fly ips allocate-v4
Expand Down
2 changes: 1 addition & 1 deletion docs/development/docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,6 @@ Build and serve the documentation locally with:

`mkdocs serve`

Production documentation is built on merge into `main` with the Github Action:
Production documentation is built on merge into `main` with the GitHub Action:

`/.github/workflows/docs.yml`
8 changes: 6 additions & 2 deletions docs/development/installation.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
Before starting, set up the database where Supavisor will store tenants' data. The following command will pull a Docker image with PostgreSQL 14 and run it on port 6432:
Before starting, set up the database where Supavisor will store tenants' data.
The following command will pull a Docker image with PostgreSQL 14 and run it on
port 6432:

```
docker-compose -f ./docker-compose.db.yml up
```

> `Supavisor` stores tables in the `supavisor` schema. The schema should be automatically created by the `dev/postgres/00-setup.sql` file. If you encounter issues with migrations, ensure that this schema exists.
> `Supavisor` stores tables in the `supavisor` schema. The schema should be
> automatically created by the `dev/postgres/00-setup.sql` file. If you
> encounter issues with migrations, ensure that this schema exists.
Next, get dependencies and apply migrations:

Expand Down
Loading

0 comments on commit a76efaa

Please sign in to comment.