Skip to content

Commit

Permalink
Fix wait for PostgreSQL in draupnir-finalise-image (#80)
Browse files Browse the repository at this point in the history
The draupnir-finalise-image script waits for PostgreSQL to start
accepting connections before issuing commands against it.

It starts postgres via pg_ctl with the wait (-w) flag. This waits for
PostgreSQL to accept connections but by default it only waits 60s. If we
take longer than 60 seconds it exists nonzero and the script exists.

Following the pg_ctl start command the script was then looping for up to
10 minutes trying to read in PostgreSQL logs that it was ready for
connections. This loop would never function as intended as the wait flag
on pg_ctl either ensures the PostgreSQL is accepting connections or it
has exited nonzero and therefore the script exits as well.

Removing the loop, and changing the default wait timeout of pg_ctl to 10
minutes provides the intended behaviour.
  • Loading branch information
dyson authored Apr 1, 2020
1 parent 6e3198f commit f2bb249
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 20 deletions.
6 changes: 4 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
Changelog
=========

Unreleased
----------
5.0.1
-----
- Use pg_ctl wait with a timeout in draupnir-finalise-image script to wait until
PostgreSQL is ready to accept connections.

5.0.0
-----
Expand Down
2 changes: 1 addition & 1 deletion DRAUPNIR_VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
5.0.0
5.0.1
28 changes: 11 additions & 17 deletions cmd/draupnir-finalise-image
Original file line number Diff line number Diff line change
Expand Up @@ -100,23 +100,17 @@ EOF
LOG_FILE="/var/log/postgresql/image_${ID}"

# Start postgres
sudo -u postgres $PG_CTL -w -D "$UPLOAD_PATH" -o "-p $PORT" -l "${LOG_FILE}" start

# We need to wait for postgres to boot and announce that the recovery has
# completed. Ideally WAL recovery shouldn't take long, but for high volume
# databases Postgres needs a window to catch-up from the last checkpoint.
TIMEOUT=600 # 10m
sudo -u postgres touch "${LOG_FILE}" # otherwise we'll fail grep'ing the file
until grep "database system is ready to accept connections" "${LOG_FILE}"
do
if [ $(( TIMEOUT-- )) -eq 0 ];
then
cat "${LOG_FILE}" >&2
echo "Postgres recovery failed, timed out waiting for recovery" >&2
exit 255
fi
sleep 1
done

# We need to wait (-w) for postgres to boot and accept
# connections before continuing. Ideally WAL recovery shouldn't take long, but
# for high volume databases Postgres needs a window to catch-up from the last
# checkpoint.

# If startup doesn't complete within the timeout (-t <seconds>) then pg_ctl
# exits with a nonzero exit status. Note that the startup will continue in the
# background and may eventually succeed - all the nonzero exit has done here is
# notify that it didn't happen within the timout.
sudo -u postgres $PG_CTL -w -t 600 -D "$UPLOAD_PATH" -o "-p $PORT" -l "${LOG_FILE}" start

# Create a user to perform admin operations with
sudo -u postgres createuser --port="$PORT" --createdb --createrole --superuser draupnir-admin
Expand Down

0 comments on commit f2bb249

Please sign in to comment.