Skip to content

Postgres Point in Time Restore

khalidsattar edited this page Feb 16, 2013 · 3 revisions

This approach is considered more reliable because it uses postgres's WAL files to replay the system back to a point in time we want the restore to be. So assuming the last file system backup was taken 30 mins or even longer like a day, the WAL files can be used to get the system to a state just before the crash. However the set-up is more involved and recovery time make take longer.

The instruction assumes you have access to postgres config files and the user used for doing backups is a postgres super user. So this option may not suite shared hosted machines. The instruction below assume you are doing this as postgres user

Install Scripts

The scripts can be found in within e_luminate eluminate_web/scripts/postgres

mkdir -p ~/scripts

copy all files from the git's eluminate_web/scripts/postgres to the above created folder. And make sure they are executable.

Setup Archiving of WAL files

  1. Edit prostgres config file (eg on unbantu its /etc/postgres/9.1/main/postgresql.conf) and change the following line to these values
archive_mode = on
archive_command = '/var/lib/postgres/scripts/archive-wal "%p" "%f"'
archive_timeout = 1800
sudo /etc/init.d/postgresql restart

The path of archive_command needs to point to ~postgres/scripts. Thearchive_timeout value may vary depending on how active your database is. The value determines how long to wait before forcing an archive of the current WAL file. The archive will happen either when current WAL reached 16M or a time-out. So is depending on how active the database is. You need to balance the time-out with available space and when your base backup will run (typically daily).

Making Base Backups

We should do it at least once in 24 hours so that the hosting service backups this too. In the example below we are taking backups every 6 hours. As this creates a checkpoint, then the time taken of replaying WAL files should be reduced.

Create the cronjob entry

crontab -e 

add line like

* 0-23/4 * * * $HOME/scripts/do_base_backup
* 0 * * *     date >> $HOME/OLD-BACKUPS-RM.log; find $HOME/PG_... -maxdepth 1 -type f -name \*.tgz -mtime +3 -delete >> $HOME/OLD-BACKUPS-RM.log

The 2nd line above deleted backups older than 3 days.

Still to consider is whether we need to save this backup outside the system to safeguard against total system crash.

Restore Process

The restore process in this approach required first to restore the file system backup, then create a restore.conf file which will set to what point in time we want the WAL files to be replayed until and then using that to preform the restore.

Restore Files from base backup

  1. Stop server if its running
sudo /etc/init.d/postgresql stop
  1. Backup old database directory, easiest is to rename the old directory (if there is space to hold two copies). The location of the database directory etc may vary so I will set up variable which can be change to suite your environment
WAL_ARCHIVES=~/WAL_ARCHIVES
PDATA=/var/lib/postgresql/9.1/main   # where postgres stores its data structure
PBACKUPS=~/PG_FS_BACKUPS

mv ${PDATA} ${PDATA}.old
mkdir ${PDATA}
cd ${PDATA}
tar xpzf ${PBACKUPS}/${HOSTNAME}-YYYYMMDDhhmmss.tgz
mkdir pg_xlog
chmod 700 pg_xlog

${HOSTNAME}-YYYYMMDDhhmmss.tgz is the most recent backup you have, though you could go back to older ones if you are restoring to an earlier point then the last backup. If you do, you need to also extract older WAL files (see later section)

You now have a restored but unusable Postgres Database structure, replaying the WALs will make it consistent.

Create a recovery.conf file

The recovery.conf file controls how the recovery of the database will proceed. Its has many directives to control the restore. To restore to the latest version, we simple need one line in it which is where to copy the WAL files from. So continuing from above (ie in the main directory)

echo "restore_command = 'cp ${WAL_ARCHIVES}/%f %p'" > ${PDATA}/recovery.conf

If we need to restore to a particular point then you need

echo "recovery_target_time 'timestamp'" >> ${PDATA}/recovery.conf
echo "recovery_target_inclusive = true" >> ${PDATA}/recovery.conf

example of timestamp is 2013-02-01 22:16 GMT. During restore you may want to edit pg_hba.conf to stop other people accessing the database.

Recovery

/etc/init.d/postgresql start

This will now use the recovery.conf file to preform the restore by replaying WAL files. Once the replay of WAL file has completed. The file will be renamed to restore.done.

Inspect the database to see if things look right.

Restoring from not the most Recent Base Backup

When you restore from the most recent base backup, the WAL files are already present in ${WAL_ARCHIVES} folder. However when that last base backup took place, older WAL files are no longer needed, so teh script archives these into another tar ball to save space. Therefore when using older base backups we need to identify and restore the relevant WAL files back into ${WAL_ARCHIVE} so postgres can find the these during playback. For example in the Backup folder you might see

u-d620-20130216182502.tgz
u-d620-20130216185816.tgz
u-d620-20130216192502.tgz
WALS-20130216182502.tgz
WALS-20130216185816.tgz

We can see that u-d620-20130216192502.tgz is the most recent base backup (16th Feb 19:25). However for the purposes of this example we wish to restore to (16th Feb 18:45). So we will restore the base backup file u-d620-20130216182502.tgz along with WALS-20130216182502.tgz eg

cd ${WAL_ARCHIVES}
tar xpzf ${PBACKUPS}/WALS-20130216182502.tgz

(I believe its safe to leave existing WAL files in ${WAL_ARCHIVES} )

Now to create recover.confand start postgres as explained in previous section. eg

echo "restore_command = 'cp ${WAL_ARCHIVES}/%f %p'" > ${PDATA}/recovery.conf
echo "recovery_target_time '2013-02-16 18:45 GMT'" >> ${PDATA}/recovery.conf
echo "recovery_target_inclusive = true" >> ${PDATA}/recovery.conf

now when postgres is started, it will recover to the time we specified.

Backup logs

There is a log file created during backups, and the file is ~/postgres.log. If the backups are not happening then look at this file as well as your mailbox for error during cron run. If you are missing archived tar ball then see ~/OLD-BACKUPS-RM.log for log of files selected by house keeping process