Skip to content

Commit

Permalink
Initial documentation import.
Browse files Browse the repository at this point in the history
This adds a documentation using sphinx, which is basically the content
of the README file, with some editing and a bit more content in some
places.

I left a lot of FIXME in the documentation since there is still much to
do, but at least it will be a nice start.
  • Loading branch information
rdunklau committed Aug 2, 2021
1 parent b2f90f9 commit 4e2f4ed
Show file tree
Hide file tree
Showing 13 changed files with 1,392 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
_build/
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
34 changes: 34 additions & 0 deletions docs/about.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
About PGHoard
=============

Features
--------

* Automatic periodic basebackups
* Automatic transaction log (WAL/xlog) backups (using either ``pg_receivewal``,
``archive_command`` or experimental PG native replication protocol support with ``walreceiver``)
* Optional Standalone Hot Backup support
* Cloud object storage support (AWS S3, Google Cloud, OpenStack Swift, Azure, Ceph)
* Backup restoration directly from object storage, compressed and encrypted
* Point-in-time-recovery (PITR)
* Initialize a new standby from object storage backups, automatically configured as
a replicating hot-standby

Fault-resilience and monitoring
-------------------------------

* Persists over temporary object storage connectivity issues by retrying transfers
* Verifies WAL file headers before upload (backup) and after download (restore),
so that e.g. files recycled by PostgreSQL are ignored
* Automatic history cleanup (backups and related WAL files older than N days)
* "Archive sync" tool for detecting holes in WAL backup streams and fixing them
* "Archive cleanup" tool for deleting obsolete WAL files from the archive
* Keeps statistics updated in a file on disk (for monitoring tools)
* Creates alert files on disk on problems (for monitoring tools)


Performance
-----------

* Parallel compression and encryption
* WAL pre-fetching on restore
71 changes: 71 additions & 0 deletions docs/architecture.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
Architecture
============

PostgreSQL Point In Time Replication (PITR) consists of a having a database
basebackup and changes after that point go into WAL log files that can be
replayed to get to the desired replication point.

PGHoard runs as a daemon which will be responsible for performing the main
tasks of a backup tool for PostgreSQL:

* Taking periodical basebackups
* Archiving the WAL
* Managing backup retention according to a policy.

Basebackup
----------

The basebackups are taken by the pghoard daemon directly, with no need for an
external scheduler / crond.

When pghoard is first launched, it will take a basebackup. After that, the
frequency of basebackups is determined by configuration files.

Those basebackups can be taken in one of two ways:

* Either by copying the files directly from ``PGDATA``, using the
``local-tar`` or ``delta`` modes
* By calling ``pg_basebackup``, using the ``basic`` or ``pipe`` modes.

See :ref:`configuration_basebackup` for how to configure it.

Archiving
---------

PGHoard supports multiple operating models. If you don't want to modify the
backuped server archiving configuration, or install anything particular on that
server, ``pghoard`` can fetch the WAL using ``pg_receivewal`` (formerly ``pg_receivexlog`` on PostgreSQL < 10).
It also provides its own replication client replacing ``pg_receivewal``, using
the ``walreceiver`` mode. This mode is currently experimental.

PGHoard also supports a traditional ``archive_command`` in the form of the
``pghoard_postgres_command`` utility.


See :ref:`configuration_archiving` for how to configure it.

Retention
---------

``pghoard`` expires the backups according to the configured retention policy.
Whenever there is more than the specified number of backups, older backups will
be removed as well as their associated WAL files.

Compression and encryption
--------------------------

The PostgreSQL write-ahead log (WAL) and basebackups are compressed with
Snappy (default) in order to ensure good compression speed and relatively small backup size. for more information. Zstandard or LZMA encryption is also available. See :ref:`configuration_compression`.

Encryption is not enabled by defaultn, but PGHoard can encrypt backuped data at
rest. Each individual file is encrypted and authenticated with file specific
keys. The file specific keys are included in the backup in turn encrypted with
a master RSA private/public key pair.

You should follow the encryption section in the quickstart guide :ref:`quickstart_encryption`. For a full reference see :ref:`configuration_encryption`.


Deployment examples
-------------------

FIXME: add schemas showing a deployment of pghoard on the same host with
132 changes: 132 additions & 0 deletions docs/commands.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
Commands
========


pghoard
-------

``pghoard`` is the main daemon process that should be run under a service
manager, such as ``systemd`` or ``supervisord``. It handles the backup of
the configured sites.

.. code-block::
usage: pghoard [-h] [-D] [--version] [-s] [--config CONFIG] [config_file]
postgresql automatic backup daemon
positional arguments:
config_file configuration file path (for backward compatibility)
optional arguments:
-h, --help show this help message and exit
-D, --debug Enable debug logging
--version show program version
-s, --short-log use non-verbose logging format
--config CONFIG configuration file path
.. _commands_restore:

pghoard_restore
---------------

``pghoard_restore`` is a command line tool that can be used to restore a
previous database backup from either ``pghoard`` itself or from one of the
supported object stores. ``pghoard_restore`` can also configure
``recovery.conf`` to use ``pghoard_postgres_command`` as the WAL
``restore_command`` in ``recovery.conf``.


.. code-block::
usage: pghoard_restore [-h] [-D] [--status-output-file STATUS_OUTPUT_FILE] [--version]
{list-basebackups-http,list-basebackups,get-basebackup} ...
positional arguments:
list-basebackups-http
List available basebackups from a HTTP source
list-basebackups
List basebackups from an object store
get-basebackup
Download a basebackup from an object store


-h, --help show this help message and exit
-D, --debug Enable debug logging
--status-output-file STATUS_OUTPUT_FILE
Filename for status output JSON
--version show program version

pghoard_archive_cleanup
-----------------------

``pghoard_archive_cleanup`` can be used to clean up any orphan WAL files
from the object store. After the configured number of basebackups has been
exceeded (configuration key ``basebackup_count``), ``pghoard`` deletes the
oldest basebackup and all WAL associated with it. Transient object storage
failures and other interruptions can cause the WAL deletion process to leave
orphan WAL files behind, they can be deleted with this tool.

.. code-block::
usage: pghoard_archive_cleanup [-h] [--version] [--site SITE] [--config CONFIG] [--dry-run]
-h, --help show this help message and exit
--version show program version
--site SITE pghoard site
--config CONFIG pghoard config file
--dry-run only list redundant segments and calculate total file size but do not delete


pghoard_archive_sync
--------------------

``pghoard_archive_sync`` can be used to see if any local files should
be archived but haven't been or if any of the archived files have unexpected
content and need to be archived again. The other usecase it has is to determine
if there are any gaps in the required files in the WAL archive
from the current WAL file on to to the latest basebackup's first WAL file.

.. code-block::
usage: pghoard_archive_sync [-h] [-D] [--version] [--site SITE] [--config CONFIG]
[--max-hash-checks MAX_HASH_CHECKS] [--no-verify] [--create-new-backup-on-failure]
-h, --help show this help message and exit
-D, --debug Enable debug logging
--version show program version
--site SITE pghoard site
--config CONFIG pghoard config file
--max-hash-checks MAX_HASH_CHECKS
Maximum number of files for which to validate hash in addition to basic existence check
--no-verify do not verify archive integrity
--create-new-backup-on-failure
request a new basebackup if verification fails

pghoard_create_keys
-------------------

``pghoard_create_keys`` can be used to generate and output encryption keys
in the ``pghoard`` configuration format.

``pghoard_postgres_command`` is a command line tool that can be used as
PostgreSQL's ``archive_command`` or ``recovery_command``. It communicates with
``pghoard`` 's locally running webserver to let it know there's a new file that
needs to be compressed, encrypted and stored in an object store (in archive
mode) or it's inverse (in restore mode.)

.. code-block::
usage: pghoard_create_keys [-h] [-D] [--version] [--site SITE] --key-id KEY_ID [--bits BITS] [--config CONFIG]
-h, --help show this help message and exit
-D, --debug Enable debug logging
--version show program version
--site SITE backup site
--key-id KEY_ID key alias as used with encryption_key_id configuration directive
--bits BITS length of the generated key in bits, default 3072
--config CONFIG configuration file to store the keys in
56 changes: 56 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
sys.path.insert(0, os.path.abspath('..'))
from version import get_project_version


# -- Project information -----------------------------------------------------

project = 'PGHoard'
copyright = '2021, Aiven'
author = 'Aiven'

# The full version, including alpha/beta/rc tags
release = get_project_version('pghoard/version.py')

# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
"sphinx_rtd_theme"
]

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'sphinx_rtd_theme'

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
Loading

0 comments on commit 4e2f4ed

Please sign in to comment.