Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: connection pool metrics in admin server #3420

Merged
merged 4 commits into from
Apr 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ This project adheres to [Semantic Versioning](http://semver.org/).
- #2676, Performance improvement on bulk json inserts, around 10% increase on requests per second by removing `json_typeof` from write queries - @steve-chavez
- #3214, Log connection pool events on log-level=info - @steve-chavez
- #3435, Add log-level=debug, for development purposes - @steve-chavez
- #1526, Add `/metrics` endpoint on admin server - @steve-chavez
- Exposes connection pool metrics, schema cache metrics

### Fixed

Expand Down
13 changes: 13 additions & 0 deletions docs/references/admin_server.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,16 @@ Ready
-----

In addition, the ``ready`` endpoint checks the state of the :ref:`connection_pool` and the :ref:`schema_cache`. A request will return ``200 OK`` if both are good or ``503`` if not.

.. code-block:: bash

curl -I "http://localhost:3001/ready"

.. code-block:: http

HTTP/1.1 200 OK

Metrics
=======

Provides :ref:`metrics`.
97 changes: 94 additions & 3 deletions docs/references/observability.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,15 @@
Observability
#############

.. contents::
:depth: 1
:local:
:backlinks: none

.. _pgrst_logging:

Logging
-------
Logs
====

PostgREST logs basic request information to ``stdout``, including the authenticated user if available, the requesting IP address and user agent, the URL requested, and HTTP response status.

Expand Down Expand Up @@ -40,7 +45,7 @@ For diagnostic information about the server itself, PostgREST logs to ``stderr``
Currently PostgREST doesn't log the SQL commands executed against the underlying database.

Database Logs
~~~~~~~~~~~~~
-------------

To find the SQL operations, you can watch the database logs. By default PostgreSQL does not keep these logs, so you'll need to make the configuration changes below.

Expand Down Expand Up @@ -81,6 +86,92 @@ Restart the database and watch the log file in real-time to understand how HTTP
docker run -v "$(pwd)/init.sh":"/docker-entrypoint-initdb.d/init.sh" -d postgres
docker logs -f <container-id>

.. _metrics:

Metrics
=======

The ``metrics`` endpoint on the :ref:`admin_server` endpoint provides metrics in `Prometheus text format <https://prometheus.io/docs/instrumenting/exposition_formats/#text-based-format>`_.

.. code-block:: bash

curl "http://localhost:3001/metrics"

# HELP pgrst_schema_cache_query_time_seconds The query time in seconds of the last schema cache load
# TYPE pgrst_schema_cache_query_time_seconds gauge
pgrst_schema_cache_query_time_seconds 1.5937927e-2
# HELP pgrst_schema_cache_loads_total The total number of times the schema cache was loaded
# TYPE pgrst_schema_cache_loads_total counter
pgrst_schema_cache_loads_total 1.0
...

Schema Cache Metrics
--------------------

Metrics related to the :ref:`schema_cache`.

pgrst_schema_cache_query_time_seconds
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

======== =======
**Type** Gauge
======== =======

The query time in seconds of the last schema cache load.

pgrst_schema_cache_loads_total
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

======== =======
**Type** Counter
======== =======

The total number of times the schema cache was loaded.

Connection Pool Metrics
-----------------------

Metrics related to the :ref:`connection_pool`.

pgrst_db_pool_timeouts_total
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

======== =======
**Type** Counter
======== =======

The total number of pool connection timeouts.

pgrst_db_pool_available
~~~~~~~~~~~~~~~~~~~~~~~

======== =======
**Type** Gauge
======== =======

Available connections in the pool.

pgrst_db_pool_waiting
~~~~~~~~~~~~~~~~~~~~~

======== =======
**Type** Gauge
======== =======

Requests waiting to acquire a pool connection

pgrst_db_pool_max
~~~~~~~~~~~~~~~~~

======== =======
**Type** Gauge
======== =======

Max pool connections.

Traces
======

Server Version
--------------

Expand Down
10 changes: 6 additions & 4 deletions nix/overlays/haskell-packages.nix
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,14 @@ let
# - When adding a new library version here, postgrest.cabal and stack.yaml must also be updated
#
# Notes:
# - This should NOT be the first place to start managing dependencies. Check postgrest.cabal.
# - When adding a new package version here, you have to update stack:
# + For stack.yml add:
# - When adding a new package version here, update cabal.
# + Update postgrest.cabal with the package version
# + Update cabal.project.freeze. Just set it to the current timestamp then run `cabal build`. It will tell you the correct timestamp for the index state.
# - When adding a new package version here, you have to update stack.
# + To update stack.yaml add:
# extra-deps:
# - <package>-<ver>
# + For stack.yml.lock, CI should report an error with the correct lock, copy/paste that one into the file
# + For stack.yaml.lock, CI should report an error with the correct lock, copy/paste that one into the file
# - To modify and try packages locally, see "Working with locally modified Haskell packages" in the Nix README.

# Before upgrading fuzzyset to 0.3, check: https://github.com/PostgREST/postgrest/issues/3329
Expand Down
2 changes: 2 additions & 0 deletions postgrest.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ library
PostgREST.Error
PostgREST.Logger
PostgREST.MediaType
PostgREST.Metrics
PostgREST.Observation
PostgREST.Query
PostgREST.Query.QueryBuilder
Expand Down Expand Up @@ -124,6 +125,7 @@ library
, optparse-applicative >= 0.13 && < 0.19
, parsec >= 3.1.11 && < 3.2
, postgresql-libpq >= 0.10
, prometheus-client >= 1.1.1 && < 1.2.0
, protolude >= 0.3.1 && < 0.4
, regex-tdfa >= 1.2.2 && < 1.4
, retry >= 0.7.4 && < 0.10
Expand Down
5 changes: 5 additions & 0 deletions src/PostgREST/Admin.hs
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,13 @@ import Network.Socket.ByteString

import PostgREST.AppState (AppState)
import PostgREST.Config (AppConfig (..))
import PostgREST.Metrics (metricsToText)
import PostgREST.Observation (Observation (..))

import qualified PostgREST.AppState as AppState
import qualified PostgREST.Config as Config


import Protolude

runAdmin :: AppConfig -> AppState -> Warp.Settings -> IO ()
Expand Down Expand Up @@ -56,6 +58,9 @@ admin appState appConfig req respond = do
["schema_cache"] -> do
sCache <- AppState.getSchemaCache appState
respond $ Wai.responseLBS HTTP.status200 [] (maybe mempty JSON.encode sCache)
["metrics"] -> do
mets <- metricsToText
respond $ Wai.responseLBS HTTP.status200 [] mets
_ ->
respond $ Wai.responseLBS HTTP.status404 [] mempty

Expand Down
25 changes: 15 additions & 10 deletions src/PostgREST/AppState.hs
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@
import qualified Network.Socket as NS
import qualified PostgREST.Error as Error
import qualified PostgREST.Logger as Logger
import qualified PostgREST.Metrics as Metrics
import PostgREST.Observation
import PostgREST.Version (prettyVersion)
import System.TimeIt (timeItT)
Expand Down Expand Up @@ -111,25 +112,28 @@
, stateSocketREST :: NS.Socket
-- | Network socket for the admin UI
, stateSocketAdmin :: Maybe NS.Socket
-- | Logger state
, stateLogger :: Logger.LoggerState
-- | Observation handler
, stateObserver :: ObservationHandler
, stateLogger :: Logger.LoggerState
, stateMetrics :: Metrics.MetricsState

Check warning on line 118 in src/PostgREST/AppState.hs

View check run for this annotation

Codecov / codecov/patch

src/PostgREST/AppState.hs#L117-L118

Added lines #L117 - L118 were not covered by tests
}

type AppSockets = (NS.Socket, Maybe NS.Socket)


init :: AppConfig -> IO AppState
init conf@AppConfig{configLogLevel} = do
loggerState <- Logger.init
let observer = Logger.observationLogger loggerState configLogLevel
init conf@AppConfig{configLogLevel, configDbPoolSize} = do
loggerState <- Logger.init
metricsState <- Metrics.init configDbPoolSize
let observer = liftA2 (>>) (Logger.observationLogger loggerState configLogLevel) (Metrics.observationMetrics metricsState)

pool <- initPool conf observer
(sock, adminSock) <- initSockets conf
state' <- initWithPool (sock, adminSock) pool conf loggerState observer
state' <- initWithPool (sock, adminSock) pool conf loggerState metricsState observer
pure state' { stateSocketREST = sock, stateSocketAdmin = adminSock}

initWithPool :: AppSockets -> SQL.Pool -> AppConfig -> Logger.LoggerState -> ObservationHandler -> IO AppState
initWithPool (sock, adminSock) pool conf loggerState observer = do
initWithPool :: AppSockets -> SQL.Pool -> AppConfig -> Logger.LoggerState -> Metrics.MetricsState -> ObservationHandler -> IO AppState
initWithPool (sock, adminSock) pool conf loggerState metricsState observer = do

appState <- AppState pool
<$> newIORef minimumPgVersion -- assume we're in a supported version when starting, this will be corrected on a later step
Expand All @@ -145,8 +149,9 @@
<*> C.newCache Nothing
<*> pure sock
<*> pure adminSock
<*> pure loggerState
<*> pure observer
<*> pure loggerState
<*> pure metricsState

debWorker <-
let decisecond = 100000 in
Expand All @@ -156,7 +161,7 @@
, debounceEdge = leadingEdge -- runs the worker at the start and the end
}

return appState { debouncedConnectionWorker = debWorker }
return appState { debouncedConnectionWorker = debWorker}

destroy :: AppState -> IO ()
destroy = destroyPool
Expand Down
4 changes: 4 additions & 0 deletions src/PostgREST/Logger.hs
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,10 @@ observationLogger loggerState logLevel obs = case obs of
o@(SchemaCacheLoadedObs _) -> do
when (logLevel >= LogDebug) $ do
logWithZTime loggerState $ observationMessage o
PoolRequest ->
pure ()
PoolRequestFullfilled ->
pure ()
o ->
logWithZTime loggerState $ observationMessage o

Expand Down
54 changes: 54 additions & 0 deletions src/PostgREST/Metrics.hs
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
module PostgREST.Metrics
( init
, MetricsState (..)
, observationMetrics
, metricsToText
) where

import qualified Data.ByteString.Lazy as LBS
import qualified Hasql.Pool.Observation as SQL

import qualified Prometheus as Prom

import PostgREST.Observation

import Protolude

data MetricsState =
MetricsState Prom.Counter Prom.Gauge Prom.Gauge Prom.Gauge Prom.Counter Prom.Gauge

init :: Int -> IO MetricsState
init configDbPoolSize = do
poolTimeouts <- Prom.register $ Prom.counter (Prom.Info "pgrst_db_pool_timeouts_total" "The total number of pool connection timeouts")
poolAvailable <- Prom.register $ Prom.gauge (Prom.Info "pgrst_db_pool_available" "Available connections in the pool")
poolWaiting <- Prom.register $ Prom.gauge (Prom.Info "pgrst_db_pool_waiting" "Requests waiting to acquire a pool connection")
poolMaxSize <- Prom.register $ Prom.gauge (Prom.Info "pgrst_db_pool_max" "Max pool connections")
schemaCacheLoads <- Prom.register $ Prom.counter (Prom.Info "pgrst_schema_cache_loads_total" "The total number of times the schema cache was loaded")
schemaCacheQueryTime <- Prom.register $ Prom.gauge (Prom.Info "pgrst_schema_cache_query_time_seconds" "The query time in seconds of the last schema cache load")
Prom.setGauge poolMaxSize (fromIntegral configDbPoolSize)
pure $ MetricsState poolTimeouts poolAvailable poolWaiting poolMaxSize schemaCacheLoads schemaCacheQueryTime

observationMetrics :: MetricsState -> ObservationHandler
observationMetrics (MetricsState poolTimeouts poolAvailable poolWaiting _ schemaCacheLoads schemaCacheQueryTime) obs = case obs of
(PoolAcqTimeoutObs _) -> do
Prom.incCounter poolTimeouts
(HasqlPoolObs (SQL.ConnectionObservation _ status)) -> case status of
SQL.ReadyForUseConnectionStatus -> do
Prom.incGauge poolAvailable
SQL.InUseConnectionStatus -> do
Prom.decGauge poolAvailable
SQL.TerminatedConnectionStatus _ -> do
Prom.decGauge poolAvailable
SQL.ConnectingConnectionStatus -> pure ()
PoolRequest ->
Prom.incGauge poolWaiting
PoolRequestFullfilled ->
Prom.decGauge poolWaiting
SchemaCacheLoadedObs resTime -> do
Prom.incCounter schemaCacheLoads
Prom.setGauge schemaCacheQueryTime resTime
_ ->
pure ()

metricsToText :: IO LBS.ByteString
metricsToText = Prom.exportMetricsAsText
3 changes: 3 additions & 0 deletions src/PostgREST/Observation.hs
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@
| QueryErrorCodeHighObs SQL.UsageError
| PoolAcqTimeoutObs SQL.UsageError
| HasqlPoolObs SQL.Observation
| PoolRequest
| PoolRequestFullfilled

type ObservationHandler = Observation -> IO ()

Expand Down Expand Up @@ -125,6 +127,7 @@
SQL.ReleaseConnectionTerminationReason -> "release"
SQL.NetworkErrorConnectionTerminationReason _ -> "network error" -- usage error is already logged, no need to repeat the same message.
)
_ -> mempty

Check warning on line 130 in src/PostgREST/Observation.hs

View check run for this annotation

Codecov / codecov/patch

src/PostgREST/Observation.hs#L130

Added line #L130 was not covered by tests
where
showMillis :: Double -> Text
showMillis x = toS $ showFFloat (Just 1) (x * 1000) ""
Expand Down
7 changes: 7 additions & 0 deletions src/PostgREST/Query.hs
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ import PostgREST.Config (AppConfig (..),
import PostgREST.Config.PgVersion (PgVersion (..))
import PostgREST.Error (Error)
import PostgREST.MediaType (MediaType (..))
import PostgREST.Observation (Observation (..))
import PostgREST.Plan (ActionPlan (..),
CallReadPlan (..),
CrudPlan (..),
Expand Down Expand Up @@ -77,10 +78,16 @@ data QueryResult
runQuery :: AppState.AppState -> AppConfig -> AuthResult -> ApiRequest -> ActionPlan -> SchemaCache -> PgVersion -> Bool -> ExceptT Error IO QueryResult
runQuery _ _ _ _ (NoDb x) _ _ _ = pure $ NoDbResult x
runQuery appState config AuthResult{..} apiReq (Db plan) sCache pgVer authenticated = do
let observer = AppState.getObserver appState

lift $ observer PoolRequest

dbResp <- lift $ do
let transaction = if prepared then SQL.transaction else SQL.unpreparedTransaction
AppState.usePool appState (transaction isoLvl txMode $ runExceptT dbHandler)

lift $ observer PoolRequestFullfilled

resp <-
liftEither . mapLeft Error.PgErr $
mapLeft (Error.PgError authenticated) dbResp
Expand Down
14 changes: 14 additions & 0 deletions test/io/test_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -1427,3 +1427,17 @@ def test_multiple_func_settings(defaultenv):
response = postgrest.session.post("/rpc/multiple_func_settings_test")

assert response.text == '[{"work_mem":"5000kB","statement_timeout":"10s"}]'


def test_admin_metrics(defaultenv):
"Should get metrics from the admin endpoint"

with run(env=defaultenv, port=freeport()) as postgrest:
response = postgrest.admin.get("/metrics")
assert response.status_code == 200
assert "pgrst_schema_cache_query_time_seconds" in response.text
assert "pgrst_schema_cache_loads_total" in response.text
assert "pgrst_db_pool_max" in response.text
assert "pgrst_db_pool_waiting" in response.text
assert "pgrst_db_pool_available" in response.text
assert "pgrst_db_pool_timeouts_total" in response.text
Loading
Loading