-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update MA getting started #8797
Changes from 9 commits
1d02534
1d9408c
88794d1
f5a32b9
ce387eb
ac3f873
be3feb4
b1ed1cd
33fd044
42d8273
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,331 @@ | ||
--- | ||
layout: default | ||
title: Quickstart: Data migration | ||
nav_order: 10 | ||
--- | ||
|
||
# Getting started: Data migration | ||
|
||
This quickstart outlines how to deploy Migration Assistant for OpenSearch and execute an existing data migration using `Reindex-from-Snapshot` (RFS). It uses AWS for illustrative purposes. However, the steps can be modified for use with other cloud providers. | ||
|
||
|
||
## Prerequisites and assumptions | ||
|
||
Before using this quickstart, make sure you fulfill the following prerequisites: | ||
|
||
* Verify that your migration path [is supported](https://opensearch.org/docs/latest/migrations/is-migration-assistant-right-for-you/#supported-migration-paths). Note that we test with the exact versions specified, but you should be able to migrate data on alternative minor versions as long as the major version is supported. | ||
* The source cluster must be deployed Amazon Simple Storage Service (Amazon S3) plugin. | ||
* The target cluster must be deployed. | ||
|
||
The steps in this guide assume the following: | ||
|
||
* In this guide, a snapshot will be taken and stored in Amazon S3; the following assumptions are made about this snapshot: | ||
* The `_source` flag is enabled on all indexes to be migrated. | ||
* The snapshot includes the global cluster state (`include_global_state` is `true`). | ||
* Shard sizes of up to approximately 80 GB are supported. Larger shards cannot be migrated. If this presents challenges for your migration, contact the [migration team](https://opensearch.slack.com/archives/C054JQ6UJFK). | ||
* Migration Assistant will be installed in the same AWS Region and have access to both the source snapshot and target cluster. | ||
|
||
--- | ||
|
||
## Step 1: Installing Bootstrap on an Amazon EC2 instance (~10 minutes) | ||
|
||
To begin your migration, use the following steps to install a `bootstrap` box on an Amazon Elastic Compute Cloud (Amazon EC2) instance. The instance uses AWS CloudFormation to create and manage the stack. | ||
|
||
1. Log in to the target AWS account in which you want to deploy Migration Assistant. | ||
2. From the browser where you are logged in to your target AWS account, right-click [here](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?templateURL=https://solutions-reference.s3.amazonaws.com/migration-assistant-for-amazon-opensearch-service/latest/migration-assistant-for-amazon-opensearch-service.template&redirectId=SolutionWeb) to load the CloudFormation template from a new browser tab. | ||
3. Follow the CloudFormation stack wizard: | ||
* **Stack Name:** `MigrationBootstrap` | ||
* **Stage Name:** `dev` | ||
* Choose **Next** after each step > **Acknowledge** > **Submit**. | ||
4. Verify that the Bootstrap stack exists and is set to `CREATE_COMPLETE`. This process takes around 10 minutes to complete. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Line 39: Acknowledge what on the fourth screen? Please clarify. Or is it a UI element: "Acknowledge"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Acknowledge as a UI element. |
||
--- | ||
|
||
## Step 2: Setting up Bootstrap instance access (~5 minutes) | ||
|
||
Use the following steps to set up Bootstrap instance access: | ||
|
||
1. After deployment, find the EC2 instance ID for the `bootstrap-dev-instance`. | ||
2. Create an AWS Identity and Access Management (IAM) policy using the following snippet, replacing `<aws-region>`, `<aws-account>`, `<stage>`, and `<ec2-instance-id>` with your information: | ||
|
||
```json | ||
{ | ||
"Version": "2012-10-17", | ||
"Statement": [ | ||
{ | ||
"Effect": "Allow", | ||
"Action": "ssm:StartSession", | ||
"Resource": [ | ||
"arn:aws:ec2:<aws-region>:<aws-account>:instance/<ec2-instance-id>", | ||
"arn:aws:ssm:<aws-region>:<aws-account>:document/BootstrapShellDoc-<stage>-<aws-region>" | ||
] | ||
} | ||
] | ||
} | ||
``` | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should there be a {% include copy-curl.html %} here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. I'll add those to the code examples here. |
||
3. Name the policy, for example, `SSM-OSMigrationBootstrapAccess`, and then create the policy by selecting **Create policy**. | ||
|
||
--- | ||
|
||
## Step 3: Logging in to Bootstrap and building Migration Assistant (~15 minutes) | ||
Check failure on line 71 in _migrations/getting-started-data-migration.md GitHub Actions / vale[vale] _migrations/getting-started-data-migration.md#L71
Raw output
|
||
|
||
Next, log in to to Bootstrap and build Migration Assistant using the following steps. | ||
Check failure on line 73 in _migrations/getting-started-data-migration.md GitHub Actions / vale[vale] _migrations/getting-started-data-migration.md#L73
Raw output
|
||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Prerequisites | ||
|
||
To use these steps, make sure you fulfill the following prerequisites: | ||
|
||
* The AWS Command Line Interface (AWS CLI) and AWS Session Manager plugin are installed on your instance. | ||
* The AWS credentials are configured (`aws configure`) for your instance. | ||
|
||
### Steps | ||
|
||
1. Load AWS credentials into your terminal. | ||
2. Log in to the instance using the following command, replacing `<instance-id>` and `<aws-region>` with your instance ID and Region: | ||
|
||
```bash | ||
aws ssm start-session --document-name BootstrapShellDoc-<stage>-<aws-region> --target <instance-id> --region <aws-region> [--profile <profile-name>] | ||
``` | ||
|
||
3. Once logged in, run the following command from the shell of the Bootstrap instance in the `/opensearch-migrations` directory: | ||
|
||
```bash | ||
./initBootstrap.sh && cd deployment/cdk/opensearch-service-migration | ||
``` | ||
|
||
4. After a successful build, note the path for infrastructure deployment, which will be used in the next step. | ||
|
||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||
--- | ||
|
||
## Step 4: Configuring and deploying RFS (~20 minutes) | ||
|
||
Use the following steps to configure and deploy RFS: | ||
|
||
1. Add the target cluster password to AWS Secrets Manager as an unstructured string. Be sure to copy the secret Amazon Resource Name (ARN) for use during deployment. | ||
2. From the same shell as the Bootstrap instance, modify the `cdk.context.json` file located in the `/opensearch-migrations/deployment/cdk/opensearch-service-migration` directory: | ||
|
||
```json | ||
{ | ||
"migration-assistant": { | ||
"vpcId": "<TARGET CLUSTER VPC ID>", | ||
"targetCluster": { | ||
"endpoint": "<TARGET CLUSTER ENDPOINT>", | ||
"auth": { | ||
"type": "basic", | ||
"username": "<TARGET CLUSTER USERNAME>", | ||
"passwordFromSecretArn": "<TARGET CLUSTER PASSWORD SECRET>" | ||
} | ||
}, | ||
"sourceCluster": { | ||
"endpoint": "<SOURCE CLUSTER ENDPOINT>", | ||
"auth": { | ||
"type": "basic", | ||
"username": "<TARGET CLUSTER USERNAME>", | ||
"passwordFromSecretArn": "<TARGET CLUSTER PASSWORD SECRET>" | ||
} | ||
}, | ||
"reindexFromSnapshotExtraArgs": "<RFS PARAMETERS (see below)>", | ||
"stage": "dev", | ||
"otelCollectorEnabled": true, | ||
"migrationConsoleServiceEnabled": true, | ||
"reindexFromSnapshotServiceEnabled": true, | ||
"migrationAssistanceEnabled": true | ||
} | ||
} | ||
``` | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should there be a {% include copy-curl.html %} here? |
||
The source and target cluster authorization can be configured to have no authorization, `basic` with a username and password, or `sigv4`. | ||
|
||
3. Bootstrap the account with the following command: | ||
|
||
```bash | ||
cdk bootstrap --c contextId=migration-assistant --require-approval never | ||
``` | ||
|
||
4. Deploy the stacks: | ||
|
||
```bash | ||
cdk deploy "*" --c contextId=migration-assistant --require-approval never --concurrency 5 | ||
``` | ||
|
||
5. Verify that all CloudFormation stacks were installed successfully. | ||
|
||
### RFS parameters | ||
|
||
If you're creating a snapshot using migration tooling, these parameters are automatically configured. If you're using an existing snapshot, modify the `reindexFromSnapshotExtraArgs` setting with the following values: | ||
|
||
```bash | ||
--s3-repo-uri s3://<bucket-name>/<repo> --s3-region <region> --snapshot-name <name> | ||
``` | ||
|
||
You will also need to give the `migrationconsole` and `reindexFromSnapshot` TaskRoles permissions to the S3 bucket. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Line 162: I assume we mean "S3 bucket". Otherwise, "EC2 instance". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should be S3 bucket. |
||
--- | ||
|
||
## Step 5: Deploying Migration Assistant | ||
Check failure on line 166 in _migrations/getting-started-data-migration.md GitHub Actions / vale[vale] _migrations/getting-started-data-migration.md#L166
Raw output
|
||
|
||
To deploy Migration Assistant, use the following steps: | ||
|
||
1. Bootstrap the account: | ||
|
||
```bash | ||
cdk bootstrap --c contextId=migration-assistant --require-approval never --concurrency 5 | ||
``` | ||
2. Deploy the stacks when `cdk.context.json` is fully configured: | ||
|
||
```bash | ||
cdk deploy "*" --c contextId=migration-assistant --require-approval never --concurrency 3 | ||
``` | ||
|
||
These commands deploy the following stacks: | ||
|
||
* Migration Assistant network stack | ||
* Reindex From Snapshot stack | ||
* Migration console stack | ||
|
||
--- | ||
|
||
## Step 6: Accessing the migration console | ||
|
||
Run the following command to access the migration console: | ||
|
||
```bash | ||
./accessContainer.sh migration-console dev <region> | ||
``` | ||
|
||
|
||
`accessContainer.sh` is located in `/opensearch-migrations/deployment/cdk/opensearch-service-migration/` on the Bootstrap instance. To learn more, see [Accessing the migration console]. | ||
`{: .note} | ||
|
||
--- | ||
|
||
## Step 7: Verifying the connection to the source and target clusters | ||
|
||
To verify the connection to the clusters, run the following command: | ||
|
||
```bash | ||
console clusters connection-check | ||
``` | ||
|
||
You should receive the following output: | ||
|
||
``` | ||
* **Source Cluster:** Successfully connected! | ||
* **Target Cluster:** Successfully connected! | ||
``` | ||
|
||
To learn more about migration console commands, see [Migration commands]. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add link |
||
|
||
--- | ||
|
||
## Step 8: Snapshot creation | ||
|
||
Run the following to initiate creating a snapshot from the source cluster: | ||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```bash | ||
console snapshot create [...] | ||
``` | ||
|
||
To check on the progress of the snapshot creation, use: | ||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```bash | ||
console snapshot status [...] | ||
``` | ||
|
||
To learn more information about the snapshot, run the following command: | ||
|
||
```bash | ||
console snapshot status --deep-check [...] | ||
``` | ||
|
||
Wait for snapshot creation to complete before moving to step 9. | ||
|
||
To learn more about snapshot creation, see [Snapshot Creation]. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Line 244: Add link |
||
--- | ||
|
||
## Step 9: Metadata migration | ||
|
||
Run the following command to migrate metadata: | ||
|
||
```bash | ||
console metadata migrate [...] | ||
``` | ||
|
||
For more information, see [Metadata migration]. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Line 256: Add link |
||
--- | ||
|
||
## Step 10: RFS document migration | ||
|
||
You can now use RFS to migrate documents from your original cluster: | ||
|
||
1. To start the migration from RFS, start a `backfill` using the following command: | ||
|
||
```bash | ||
console backfill start | ||
``` | ||
|
||
2. _(Optional)_ To speed up the migration, increase the number of documents processed at a simultaneously by using the following command: | ||
|
||
```bash | ||
console backfill scale <NUM_WORKERS> | ||
``` | ||
|
||
3. To check the status of the documentation backfill, use the following command: | ||
|
||
```bash | ||
console backfill status | ||
``` | ||
|
||
4. If you need to stop the backfill process, use the following command: | ||
|
||
```bash | ||
console backfill stop | ||
``` | ||
|
||
For more information, see [Backfill execution]. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Line 288: Add link |
||
--- | ||
|
||
## Step 11: Backfill monitoring | ||
|
||
Use the following command for detailed monitoring of the backfill process: | ||
|
||
```bash | ||
console backfill status --deep-check | ||
``` | ||
|
||
You should receive the following output: | ||
|
||
```json | ||
BackfillStatus.RUNNING | ||
Running=9 | ||
Pending=1 | ||
Desired=10 | ||
Shards total: 62 | ||
Shards completed: 46 | ||
Shards incomplete: 16 | ||
Shards in progress: 11 | ||
Shards unclaimed: 5 | ||
``` | ||
|
||
Logs and metrics are available in Amazon CloudWatch in the `OpenSearchMigrations` log group. | ||
|
||
--- | ||
|
||
## Step 12: Verify that all documents were migrated | ||
|
||
Use the following query in CloudWatch Logs Insights to identify failed documents: | ||
|
||
```bash | ||
fields @message | ||
| filter @message like "Bulk request succeeded, but some operations failed." | ||
| sort @timestamp desc | ||
| limit 10000 | ||
``` | ||
|
||
If any failed documents are identified, you can index the failed documents directly as opposed to using RFS. | ||
|
||
For more information, see [Backfill migration]. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add link |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 22: Related to my comment above, where Amazon S3 first appears on its own (not as part of a proper name), it should be defined as "Amazon Simple Storage Service (Amazon S3)". Thereafter, it's fine to use either Amazon S3 or just S3.