Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backup): add new command dataset backup for server-side backups #5571

Merged
merged 8 commits into from
Feb 12, 2024

Conversation

j33ty
Copy link
Contributor

@j33ty j33ty commented Jan 26, 2024

Description

This PR introduces backup command for server-side dataset backups: sanity backup -h. This command primarily intended to be a consumer of Backup APIs and has very little client side logic. If you are a Sanity colleague, kindly refer to #prj-dataset-backups for more context.

Usage

User experience of interaction with the CLI has been kept very similar to using sanity export. The downloaded tar file should be importable through sanity dataset import.

Interaction with server-side backups

; sanity backup -h
usage: sanity backup [--default] [-v|--version] [-d|--debug] [-h|--help] <command> [<args>]

Commands:
   disable   Disable backup for a dataset.
   download  Download a dataset backup to a local file.
   enable    Enable backup for a dataset.
   list      List available backups for a dataset.

See 'sanity help backup <command>' for specific information on a subcommand.

Enable backup for a dataset

Usage

; sanity backup enable -h
usage: sanity backup enable [DATASET_NAME]

   Enable backup for a dataset.

Examples
  sanity backup enable DATASET_NAME

Sample Output

; sanity backup enable production
Enabled daily backups for dataset production.
*Retention policies may apply depending on your plan and agreement.*

List backups

Usage

; sanity backup list -h
usage: sanity backup list [DATASET_NAME]

   List available backups for a dataset.

Options
  --limit <int>     Maximum number of backups returned. Default 30.
  --after <string>  Only return backups after this timestamp (inclusive)
  --before <string> Only return backups before this timestamp (exclusive). Cannot be younger than <after> if specified.

Examples
  sanity backup list DATASET_NAME
  sanity backup list DATASET_NAME --limit 50
  sanity backup list DATASET_NAME --after 2024-01-01 --limit 10
  sanity backup list DATASET_NAME --after 2024-01-01 --before 2024-01-10

Sample Output

; sanity backup list
? Select the dataset name: production
┌──────────┬────────────┬─────────────────────────────────────────────────┐
│ RESOURCE │ CREATED AT │ BACKUP ID                                       │
├──────────┼────────────┼─────────────────────────────────────────────────┤
│ Dataset  │ 2024-01-25 │ 2024-01-25-005f0043-b2c2-4b9b-95a4-4c65222bbdd2 │
│ Dataset  │ 2024-01-25 │ 2024-01-25-40a9c750-142a-473c-8584-0d49fc5b8e6b │
│ Dataset  │ 2024-01-26 │ 2024-01-26-006fd426-4cc6-435f-bbe3-c4ee50068e3b │
│ Dataset  │ 2024-01-26 │ 2024-01-26-58922f75-36cc-4161-8995-3124e9c0434a │
│ Dataset  │ 2024-01-26 │ 2024-01-26-8cb692e4-7ab7-46b9-9917-bf4342074477 │
│ Dataset  │ 2024-01-26 │ 2024-01-26-addfa998-ba2f-4531-95c2-e3633ec8c80b │
└──────────┴────────────┴─────────────────────────────────────────────────┘

Download a backup

Usage

; sanity backup download -h
usage: sanity backup download [DATASET_NAME]

   Download a dataset backup to a local file.

Options
  --backup-id <string> The backup ID to download. (required)
  --out <string>       The file or directory path the backup should download to.
  --overwrite          Allows overwriting of existing backup file.
  --concurrency <num>  Concurrent number of backup item downloads. (max: 24)

Examples
  sanity backup download DATASET_NAME --backup-id 2024-01-01-backup-1
  sanity backup download DATASET_NAME --backup-id 2024-01-01-backup-2 --out /path/to/file
  sanity backup download DATASET_NAME --backup-id 2024-01-01-backup-3 --out /path/to/file --overwrite

Sample Output

; sanity backup download production --backup-id 2024-01-25-005f0043-b2c2-4b9b-95a4-4c65222bbdd2 --out backup/ --overwrite --concurrency 5
╭───────────────────────────────────────────────────────────╮
│                                                           │
│ Downloading backup for:                                   │
│ projectId: h5hc8cgs                                       │
│ dataset: production                                       │
│ backupId: 2024-01-25-005f0043-b2c2-4b9b-95a4-4c65222bbdd2 │
│                                                           │
╰───────────────────────────────────────────────────────────╯

Downloading backup to "/Users/ry/Documents/sanity/sanity/dev/test-studio/out/production-backup-2024-02-02-19083f95-1741-4e29-997a-76b56c773e24.tar.gz"
✔ [2.2s] Setting up backup environment...
✔ [5.4s] Reading backup files... (545/545)
✔ [2.3s] Downloading documents and assets... (545/545)
✔ [130ms] Archiving files into a tarball, 4.16 MB bytes written...
✔ [1ms] Cleaning up temporary files at /var/folders/1/foo/A/backup-1707275511960-78377
✔ Backup download complete [10.2s]

Disable backup for a dataset

Usage

; sanity backup disable -h
usage: sanity backup disable [DATASET_NAME]

   Disable backup for a dataset.

Examples
  sanity backup disable DATASET_NAME

Sample Output

; sanity backup disable production
Disabled daily backups for dataset 

Potential Optimizations

  1. For sanity backup download, we are first iterating over all the backup file and then downloading them. This behaviour may not be scalable when number of files is huge. To handle this, we can start downloading files as soon as we are reading them.
  2. For sanity backup download, if it is possible, stream downloaded files directly into the tarball without writing them to disk first.

Notes for release

Since we are not marketing this feature at the moment, it is okay to skip these commits from public release notes.

Copy link

vercel bot commented Jan 26, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
performance-studio ✅ Ready (Inspect) Visit Preview Feb 12, 2024 5:14pm
test-studio ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 12, 2024 5:14pm
1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
studio-workshop ⬜️ Ignored (Inspect) Visit Preview Feb 12, 2024 5:14pm

@j33ty j33ty force-pushed the feat/dataset-backups-cli branch 2 times, most recently from 69c224d to e260037 Compare January 26, 2024 22:27
Copy link

socket-security bot commented Jan 26, 2024

New dependencies detected. Learn more about Socket for GitHub ↗︎

Package New capabilities Transitives Size Publisher
npm/@types/[email protected] None +2 88.1 kB types
npm/[email protected] Transitive: environment, filesystem +19 279 kB ctalkington
npm/[email protected] filesystem Transitive: environment +36 1.48 MB ctalkington
npm/[email protected] None 0 54.7 kB dirtyhairy

View full report↗︎

@j33ty j33ty force-pushed the feat/dataset-backups-cli branch from e260037 to b5031c9 Compare January 29, 2024 15:46
@j33ty j33ty force-pushed the feat/dataset-backups-cli branch from 5c6cd3f to e7bd85a Compare February 5, 2024 13:10
@j33ty j33ty force-pushed the feat/dataset-backups-cli branch from a6e4371 to b9d56e0 Compare February 7, 2024 03:24
@j33ty j33ty marked this pull request as ready for review February 7, 2024 10:04
@j33ty j33ty requested a review from a team as a code owner February 7, 2024 10:04
@j33ty j33ty requested review from binoy14 and removed request for a team February 7, 2024 10:04
@j33ty j33ty self-assigned this Feb 9, 2024
@j33ty j33ty requested review from bjoerge and binoy14 February 9, 2024 12:20
Copy link
Member

@bjoerge bjoerge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Left a few comments, mostly nice-to-haves.

import {CliCommandGroupDefinition} from '@sanity/cli'

// defaultApiVersion is the backend API version used for dataset backup.
export const defaultApiVersion = 'vX'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be a proper API version instead of vX? If it's intentional to keep it at vX, can you add a comment explaining why and what it would take to move it to an official version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be released with vX Content Lake API version. Added the reason inline:

// First version of the backup API is vX since this feature is not yet released
// and formal API documentation is pending.

import cleanupTmpDir from '../../actions/backup/cleanupTmpDir'
import {defaultApiVersion} from './backupGroup'

const debug = require('debug')('sanity:backup')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker, but ideally this should be

Suggested change
const debug = require('debug')('sanity:backup')
import createDebug from 'debug'
const debug = createDebug('sanity:backup')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it.

const DEFAULT_DOWNLOAD_CONCURRENCY = 10
const MAX_DOWNLOAD_CONCURRENCY = 24

type DownloadBackupOptions = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Nit, so feel free to ignore: we tend to use interface when we can)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed type to interface.

// Create a unique temporary directory to store files before bundling them into the archive at outputPath.
// Temporary directories are normally deleted at the end of backup process, any unexpected exit may leave them
// behind, hence it is important to create a unique directory for each attempt.
const tmpOutDir = await mkdtemp(path.join(tmpdir(), `backup-`))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be great if we could add sanity- to the tempfile here so we can leave a hint about who made it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goood point. Added it.

Copy link
Contributor

@binoy14 binoy14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of them are nit-picky comments or questions. Not a blocker

context: CliCommandContext,
args: CliCommandArguments,
): Promise<[SanityClient, DownloadBackupOptions]> {
const flags = args.extOptions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at this PR I believe we want to use yargs to parse the flags, it enforces types and in future we want to move to making sure all flags are defined properly. It will also help remove some of the type checking code below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Updated code to follow this pattern.

helpText,
action: async (args, context) => {
const {output, chalk} = context
const flags = args.extOptions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here, it would be good to re-parse the arguments using yargs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed as well.

} catch (error) {
const msg = error.statusCode
? error.response.body.message
: error.message || error.statusMessage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was able to get this message, maybe a default message if none of the messages are populated?

Screenshot 2024-02-09 at 12 00 01 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit to print the proper error.

codebymatt and others added 8 commits February 12, 2024 18:02
…et backup command, refactor enable/disable commands, fix backup list CLI to use correct response type and error handling
…selection for backup ID and dataset names, add progress bar
…le names that contain a path segment to prevent archiving failure
… spinner, void using long names for temporary dir to prevent hitting max length limit, handle archive warning, enable compression of archived file by default
…rency safe download of documents, refactor code into modules that can be easily tested, improve progress tracking for dataset backup, install progress-stream correctly, address review feedback, fix API usage for list backups
…odule namespace in imports, use interface in place of type, handle unhandled rejection
@j33ty j33ty force-pushed the feat/dataset-backups-cli branch from 6a27918 to b0622b9 Compare February 12, 2024 17:08
@j33ty
Copy link
Contributor Author

j33ty commented Feb 12, 2024

I have rebased from latest next branch with new linter config and reformatted files in this PR.

Copy link
Contributor

@binoy14 binoy14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@j33ty j33ty added this pull request to the merge queue Feb 12, 2024
Merged via the queue into next with commit f04c76e Feb 12, 2024
40 checks passed
@j33ty j33ty deleted the feat/dataset-backups-cli branch February 12, 2024 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants