Skip to content

Commit

Permalink
Merge pull request #1127 from NASA-IMPACT/1126-managepy-command-for-d…
Browse files Browse the repository at this point in the history
…atabase-backups

1126 managepy command for database backups
  • Loading branch information
CarsonDavis authored Dec 10, 2024
2 parents 2b811b6 + a9e63bb commit c889c15
Show file tree
Hide file tree
Showing 7 changed files with 826 additions and 33 deletions.
95 changes: 68 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,56 +70,97 @@ $ docker-compose -f local.yml run --rm django python manage.py createsuperuser

Create additional users through the admin interface (/admin).

### Loading Fixtures
### Database Backup and Restore

To load collections:
COSMOS provides dedicated management commands for backing up and restoring your PostgreSQL database. These commands handle both compressed and uncompressed backups and automatically detect your server environment from your configuration.

```bash
$ docker-compose -f local.yml run --rm django python manage.py loaddata sde_collections/fixtures/collections.json
```
#### Creating a Database Backup

### Manually Creating and Loading a ContentTypeless Backup
Navigate to the server running prod, then to the project folder. Run the following command to create a backup:
To create a backup of your database:

```bash
docker-compose -f production.yml run --rm --user root django python manage.py dumpdata --natural-foreign --natural-primary --exclude=contenttypes --exclude=auth.Permission --indent 2 --output /app/backups/prod_backup-20241114.json
# Create a compressed backup (recommended)
docker-compose -f local.yml run --rm django python manage.py database_backup

# Create an uncompressed backup
docker-compose -f local.yml run --rm django python manage.py database_backup --no-compress

# Specify custom output location
docker-compose -f local.yml run --rm django python manage.py database_backup --output /path/to/output.sql
```
This will have saved the backup in a folder outside of the docker container. Now you can copy it to your local machine.

The backup command will automatically:
- Detect your server environment (Production/Staging/Local)
- Use database credentials from your environment settings
- Generate a dated filename if no output path is specified
- Compress the backup by default (can be disabled with --no-compress)

#### Restoring from a Database Backup

To restore your database from a backup:

```bash
mv ~/prod_backup-20240812.json <project_path>/prod_backup-20240812.json
scp sde:/home/ec2-user/sde_indexing_helper/backups/prod_backup-20240812.json prod_backup-20240812.json
# Restore from a backup (handles both .sql and .sql.gz files)
docker-compose -f local.yml run --rm django python manage.py database_restore path/to/backup.sql[.gz]
```

Finally, load the backup into your local database:
The restore command will:
- Automatically detect if the backup is compressed (.gz)
- Terminate existing database connections
- Drop and recreate the database
- Restore all data from the backup
- Handle all database credentials from your environment settings

#### Working with Remote Servers

When working with production or staging servers:

1. First, SSH into the appropriate server:
```bash
docker-compose -f local.yml run --rm django python manage.py loaddata prod_backup-20240812.json
# For production
ssh user@production-server
cd /path/to/project

# For staging
ssh user@staging-server
cd /path/to/project
```

### Loading the Database from an Arbitrary Backup
2. Then run the backup command with the production configuration:
```bash
docker-compose -f production.yml run --rm django python manage.py database_backup
```

1. Build the project and run the necessary containers (as documented above).
2. Clear out content types using the Django shell:
3. Copy the backup to your local machine:
```bash
scp user@remote-server:/path/to/backup.sql.gz ./local-backup.sql.gz
```

4. Finally, restore locally:
```bash
$ docker-compose -f local.yml run --rm django python manage.py shell
>>> from django.contrib.contenttypes.models import ContentType
>>> ContentType.objects.all().delete()
>>> exit()
docker-compose -f local.yml run --rm django python manage.py database_restore local-backup.sql.gz
```

3. Load your backup database:
#### Alternative Methods

While the database_backup and database_restore commands are the recommended approach, there are alternative methods available:

##### Using JSON Fixtures (for smaller datasets)
If you're working with a smaller dataset, you can use Django's built-in fixtures:

```bash
$ docker cp /path/to/your/backup.json container_name:/path/inside/container/backup.json
$ docker-compose -f local.yml run --rm django python manage.py loaddata /path/inside/the/container/backup.json
$ docker-compose -f local.yml run --rm django python manage.py migrate
# Create a backup excluding content types
docker-compose -f production.yml run --rm --user root django python manage.py dumpdata \
--natural-foreign --natural-primary \
--exclude=contenttypes --exclude=auth.Permission \
--indent 2 \
--output /app/backups/prod_backup-$(date +%Y%m%d).json

# Restore from a fixture
docker-compose -f local.yml run --rm django python manage.py loaddata /path/to/backup.json
```
### Restoring the Database from a SQL Dump
If the JSON file is particularly large (>1.5GB), Docker might struggle with this method. In such cases, you can use SQL dump and restore commands as an alternative, as described [here](./SQLDumpRestoration.md).


Note: For large databases (>1.5GB), the database_backup and database_restore commands are strongly recommended over JSON fixtures as they handle large datasets more efficiently.

## Additional Commands

Expand Down
7 changes: 7 additions & 0 deletions compose/local/django/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,20 @@ WORKDIR ${APP_HOME}

# Install required system dependencies
RUN apt-get update && apt-get install --no-install-recommends -y \
wget \
gnupg \
# psycopg2 dependencies
libpq-dev \
# Translations dependencies
gettext \
# pycurl dependencies
libcurl4-openssl-dev \
libssl-dev \
# PostgreSQL 15
&& sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt bullseye-pgdg main" > /etc/apt/sources.list.d/pgdg.list' \
&& wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - \
&& apt-get update \
&& apt-get install -y postgresql-15 postgresql-client-15 \
# cleaning up unused files
&& apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false \
&& rm -rf /var/lib/apt/lists/*
Expand Down
14 changes: 8 additions & 6 deletions compose/production/django/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ COPY ./requirements .
RUN pip wheel --wheel-dir /usr/src/app/wheels \
-r ${BUILD_ENVIRONMENT}.txt


# Python 'run' stage
FROM python AS python-run-stage

Expand All @@ -39,16 +38,22 @@ WORKDIR ${APP_HOME}
RUN addgroup --system django \
&& adduser --system --ingroup django django


# Install required system dependencies
RUN apt-get update && apt-get install --no-install-recommends -y \
wget \
gnupg \
# psycopg2 dependencies
libpq-dev \
# Translations dependencies
gettext \
# pycurl dependencies
libcurl4-openssl-dev \
libssl-dev \
# PostgreSQL 15
&& sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt bullseye-pgdg main" > /etc/apt/sources.list.d/pgdg.list' \
&& wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - \
&& apt-get update \
&& apt-get install -y postgresql-15 postgresql-client-15 \
# cleaning up unused files
&& apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false \
&& rm -rf /var/lib/apt/lists/*
Expand All @@ -61,25 +66,22 @@ COPY --from=python-build-stage /usr/src/app/wheels /wheels/
RUN pip install --no-cache-dir --no-index --find-links=/wheels/ /wheels/* \
&& rm -rf /wheels/


COPY --chown=django:django ./compose/production/django/entrypoint /entrypoint
RUN sed -i 's/\r$//g' /entrypoint
RUN chmod +x /entrypoint


COPY --chown=django:django ./compose/production/django/start /start
RUN sed -i 's/\r$//g' /start
RUN chmod +x /start

COPY --chown=django:django ./compose/production/django/celery/worker/start /start-celeryworker
RUN sed -i 's/\r$//g' /start-celeryworker
RUN chmod +x /start-celeryworker


COPY --chown=django:django ./compose/production/django/celery/beat/start /start-celerybeat
RUN sed -i 's/\r$//g' /start-celerybeat
RUN chmod +x /start-celerybeat


COPY ./compose/production/django/celery/flower/start /start-flower
RUN sed -i 's/\r$//g' /start-flower
RUN chmod +x /start-flower
Expand Down
142 changes: 142 additions & 0 deletions sde_collections/management/commands/database_backup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
"""
Management command to backup PostgreSQL database.
Usage:
docker-compose -f local.yml run --rm django python manage.py database_backup
docker-compose -f local.yml run --rm django python manage.py database_backup --no-compress
docker-compose -f local.yml run --rm django python manage.py database_backup --output /path/to/output.sql
docker-compose -f production.yml run --rm django python manage.py database_backup
"""

import enum
import gzip
import os
import shutil
import socket
import subprocess
from contextlib import contextmanager
from datetime import datetime

from django.conf import settings
from django.core.management.base import BaseCommand


class Server(enum.Enum):
PRODUCTION = "PRODUCTION"
STAGING = "STAGING"
UNKNOWN = "UNKNOWN"


def detect_server() -> Server:
hostname = socket.gethostname().upper()
if "PRODUCTION" in hostname:
return Server.PRODUCTION
elif "STAGING" in hostname:
return Server.STAGING
return Server.UNKNOWN


@contextmanager
def temp_file_handler(filename: str):
"""Context manager to handle temporary files, ensuring cleanup."""
try:
yield filename
finally:
if os.path.exists(filename):
os.remove(filename)


class Command(BaseCommand):
help = "Creates a PostgreSQL backup using pg_dump"

def add_arguments(self, parser):
parser.add_argument(
"--no-compress",
action="store_true",
help="Disable backup file compression (enabled by default)",
)
parser.add_argument(
"--output",
type=str,
help="Output file path (default: auto-generated based on server name and date)",
)

def get_backup_filename(self, server: Server, compress: bool, custom_output: str = None) -> tuple[str, str]:
"""Generate backup filename and actual dump path.
Args:
server: Server enum indicating the environment
compress: Whether the output should be compressed
custom_output: Optional custom output path
Returns:
tuple[str, str]: A tuple containing (final_filename, temp_filename)
- final_filename: The name of the final backup file (with .gz if compressed)
- temp_filename: The name of the temporary dump file (always without .gz)
"""
if custom_output:
# Ensure the output directory exists
output_dir = os.path.dirname(custom_output)
if output_dir:
os.makedirs(output_dir, exist_ok=True)

if compress:
return custom_output + (".gz" if not custom_output.endswith(".gz") else ""), custom_output.removesuffix(
".gz"
)
return custom_output, custom_output
else:
date_str = datetime.now().strftime("%Y%m%d")
temp_filename = f"{server.value.lower()}_backup_{date_str}.sql"
final_filename = f"{temp_filename}.gz" if compress else temp_filename
return final_filename, temp_filename

def run_pg_dump(self, output_file: str, env: dict) -> None:
"""Execute pg_dump with given parameters."""
db_settings = settings.DATABASES["default"]
cmd = [
"pg_dump",
"-h",
db_settings["HOST"],
"-U",
db_settings["USER"],
"-d",
db_settings["NAME"],
"--no-owner",
"--no-privileges",
"-f",
output_file,
]
subprocess.run(cmd, env=env, check=True)

def compress_file(self, input_file: str, output_file: str) -> None:
"""Compress input file to output file using gzip."""
with open(input_file, "rb") as f_in:
with gzip.open(output_file, "wb") as f_out:
shutil.copyfileobj(f_in, f_out)

def handle(self, *args, **options):
server = detect_server()
compress = not options["no_compress"]
backup_file, dump_file = self.get_backup_filename(server, compress, options.get("output"))

env = os.environ.copy()
env["PGPASSWORD"] = settings.DATABASES["default"]["PASSWORD"]

try:
if compress:
with temp_file_handler(dump_file):
self.run_pg_dump(dump_file, env)
self.compress_file(dump_file, backup_file)
else:
self.run_pg_dump(backup_file, env)

self.stdout.write(
self.style.SUCCESS(
f"Successfully created {'compressed ' if compress else ''}backup for {server.value}: {backup_file}"
)
)
except subprocess.CalledProcessError as e:
self.stdout.write(self.style.ERROR(f"Backup failed on {server.value}: {str(e)}"))
except Exception as e:
self.stdout.write(self.style.ERROR(f"Error during backup process: {str(e)}"))
Loading

0 comments on commit c889c15

Please sign in to comment.