Skip to content

Commit

Permalink
Update MA getting started.
Browse files Browse the repository at this point in the history
Signed-off-by: Archer <[email protected]>
  • Loading branch information
Naarcha-AWS committed Nov 22, 2024
1 parent b50a3eb commit 1d02534
Show file tree
Hide file tree
Showing 2 changed files with 329 additions and 262 deletions.
329 changes: 329 additions & 0 deletions _migrations/getting-started-data-migration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,329 @@
---
layout: default
title: Quickstart - Data migration
nav_order: 10
---

# Getting started: Data migration

This document outlines how to deploy the Migration Assistant and execute an existing data migration using `reindex-from-snapshot` (RFS). It uses AWS for the sake of illustration. However, the steps can be modified for use with other cloud providers.

Note that this does not include steps for deploying and capturing live traffic, which is necessary for a zero-downtime migration.
{: .note}

## Prerequisites and assumptions

Before utilizing this quickstart, make sure you fulfill the following prerequisites:

* Verify your that migration path [is supported](https://opensearch.org/docs/latest/migrations/is-migration-assistant-right-for-you/#supported-migration-paths). Note that we test with the exact versions specified, but you should be able to migrate data on alternative minor versions as long as the major version is supported.
* The source cluster must be deployed with the S3 plugin.
* The target cluster must be deployed.

Using this quickstart assumes the following:

* A snapshot will be taken and stored in S3 in this guide, and the following assumptions are made about this snapshot:
* The `_source` flag is enabled on all indices to be migrated.

Check failure on line 25 in _migrations/getting-started-data-migration.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _migrations/getting-started-data-migration.md#L25

[OpenSearch.SubstitutionsError] Use 'indexes' instead of 'indices'.
Raw output
{"message": "[OpenSearch.SubstitutionsError] Use 'indexes' instead of 'indices'.", "location": {"path": "_migrations/getting-started-data-migration.md", "range": {"start": {"line": 25, "column": 42}}}, "severity": "ERROR"}
* The snapshot includes the global cluster state (`include_global_state` is `true`).
* Shard sizes up to approximately 80GB are supported. Larger shards will not be able to migrate. If this is a blocker, please consult the migrations team.

Check warning on line 27 in _migrations/getting-started-data-migration.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _migrations/getting-started-data-migration.md#L27

[OpenSearch.UnitsSpacing] Put a space between the number and the units in '80GB '.
Raw output
{"message": "[OpenSearch.UnitsSpacing] Put a space between the number and the units in '80GB '.", "location": {"path": "_migrations/getting-started-data-migration.md", "range": {"start": {"line": 27, "column": 37}}}, "severity": "WARNING"}

Check warning on line 27 in _migrations/getting-started-data-migration.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _migrations/getting-started-data-migration.md#L27

[OpenSearch.Please] Using 'please' is unnecessary. Remove.
Raw output
{"message": "[OpenSearch.Please] Using 'please' is unnecessary. Remove.", "location": {"path": "_migrations/getting-started-data-migration.md", "range": {"start": {"line": 27, "column": 122}}}, "severity": "WARNING"}
* Migration Assistant will be installed in the same region and have access to both the source snapshot and target cluster.

---

## Step 1: Installing bootstrap on an AWS EC2 instance (~10 mins)

Check failure on line 32 in _migrations/getting-started-data-migration.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _migrations/getting-started-data-migration.md#L32

[OpenSearch.Spelling] Error: mins. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: mins. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_migrations/getting-started-data-migration.md", "range": {"start": {"line": 32, "column": 61}}}, "severity": "ERROR"}

To begin your migration, use the following steps to install bootstrap on an AWS EC2 instance.

1. Log into the target AWS account where you want to deploy the Migration Assistant.
2. From the browser where you are logged into your target AWS account right-click [here](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?templateURL=https://solutions-reference.s3.amazonaws.com/migration-assistant-for-amazon-opensearch-service/latest/migration-assistant-for-amazon-opensearch-service.template&redirectId=SolutionWeb) ↗ to load the CloudFormation (Cfn) template from a new browser tab.

Check failure on line 37 in _migrations/getting-started-data-migration.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _migrations/getting-started-data-migration.md#L37

[OpenSearch.Spelling] Error: Cfn. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Cfn. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_migrations/getting-started-data-migration.md", "range": {"start": {"line": 37, "column": 397}}}, "severity": "ERROR"}
3. Follow the CloudFormation stack wizard:
* **Stack Name:** `MigrationBootstrap`
* **Stage Name:** `dev`
* Hit **Next** on each step, acknowledge on the fourth screen, and hit **Submit**.
4. Verify that the bootstrap stack exists and is set to `CREATE_COMPLETE`. This process takes around 10 minutes.

---

## Step 2: Setting up bootstrap instance access (~5 mins)

Check failure on line 46 in _migrations/getting-started-data-migration.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _migrations/getting-started-data-migration.md#L46

[OpenSearch.Spelling] Error: mins. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: mins. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_migrations/getting-started-data-migration.md", "range": {"start": {"line": 46, "column": 53}}}, "severity": "ERROR"}

Use the following steps to set up bootstrap instance access:

1. After deployment, find the EC2 instance ID for the `bootstrap-dev-instance`.
2. Create an IAM policy using the snippet below, replacing `<aws-region>`, `<aws-account>`, `<stage>`, and `<ec2-instance-id>`:

Check warning on line 51 in _migrations/getting-started-data-migration.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _migrations/getting-started-data-migration.md#L51

[OpenSearch.DirectionAboveBelow] Use 'following or later' instead of 'below' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions.
Raw output
{"message": "[OpenSearch.DirectionAboveBelow] Use 'following or later' instead of 'below' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions.", "location": {"path": "_migrations/getting-started-data-migration.md", "range": {"start": {"line": 51, "column": 43}}}, "severity": "WARNING"}

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "ssm:StartSession",
"Resource": [
"arn:aws:ec2:<aws-region>:<aws-account>:instance/<ec2-instance-id>",
"arn:aws:ssm:<aws-region>:<aws-account>:document/SSM-<stage>-BootstrapShell"
]
}
]
}
```

3. Name the policy, such as `SSM-OSMigrationBootstrapAccess`, then create the policy by selecting **Create policy**.

---

## Step 3 - Logging into bootstrap and build the bootstrap instance (~15 mins)

Check failure on line 73 in _migrations/getting-started-data-migration.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _migrations/getting-started-data-migration.md#L73

[OpenSearch.Spelling] Error: mins. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: mins. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_migrations/getting-started-data-migration.md", "range": {"start": {"line": 73, "column": 74}}}, "severity": "ERROR"}

Next, log into to bootstrap and build the bootstrap instance using the following steps.

### Prerequisites

To use these steps, make sure you fulfill the following prerequisites:

* AWS CLI and AWS Session Manager Plugin is installed on your instance.
* The AWS credentials are configured (`aws configure`) for your instance.

### Steps

1. Load AWS credentials into your terminal.
2. Login to the instance using the command below, replacing `<instance-id>` and `<aws-region>`:

Check warning on line 87 in _migrations/getting-started-data-migration.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _migrations/getting-started-data-migration.md#L87

[OpenSearch.DirectionAboveBelow] Use 'following or later' instead of 'below' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions.
Raw output
{"message": "[OpenSearch.DirectionAboveBelow] Use 'following or later' instead of 'below' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions.", "location": {"path": "_migrations/getting-started-data-migration.md", "range": {"start": {"line": 87, "column": 44}}}, "severity": "WARNING"}
```bash
aws ssm start-session --document-name SSM-dev-BootstrapShell --target <instance-id> --region <aws-region> [--profile <profile-name>]
```
3. Once logged in, run the following command from the shell of the bootstrap instance (within the /opensearch-migrations directory):
```bash
./initBootstrap.sh && cd deployment/cdk/opensearch-service-migration

Check failure on line 93 in _migrations/getting-started-data-migration.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _migrations/getting-started-data-migration.md#L93

[Vale.Terms] Use 'OpenSearch' instead of 'opensearch'.
Raw output
{"message": "[Vale.Terms] Use 'OpenSearch' instead of 'opensearch'.", "location": {"path": "_migrations/getting-started-data-migration.md", "range": {"start": {"line": 93, "column": 45}}}, "severity": "ERROR"}
```
4. After a successful build, remember the path for infrastructure deployment in the next step.

---

## Step 4 - Configuring and deploying for RFS (~20 mins)

Check failure on line 99 in _migrations/getting-started-data-migration.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _migrations/getting-started-data-migration.md#L99

[OpenSearch.Spelling] Error: mins. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: mins. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_migrations/getting-started-data-migration.md", "range": {"start": {"line": 99, "column": 52}}}, "severity": "ERROR"}

Use the following step to configure and deploy RMS:

1. Add the target cluster password to AWS Secrets Manager as an unstructured string. Be sure to copy the secret ARN for use during deployment.
2. From the same shell as the bootstrap instance, modify the `cdk.context.json` file located in the `/opensearch-migrations/deployment/cdk/opensearch-service-migration` directory:

```json
{
"migration-assistant": {
"vpcId": "<TARGET CLUSTER VPC ID>",
"targetCluster": {
"endpoint": "<TARGET CLUSTER ENDPOINT>",
"auth": {
"type": "basic",
"username": "<TARGET CLUSTER USERNAME>",
"passwordFromSecretArn": "<TARGET CLUSTER PASSWORD SECRET>"
}
},
"sourceCluster": {
"endpoint": "<SOURCE CLUSTER ENDPOINT>",
"auth": {
"type": "basic",
"username": "<TARGET CLUSTER USERNAME>",
"passwordFromSecretArn": "<TARGET CLUSTER PASSWORD SECRET>"
}
},
"reindexFromSnapshotExtraArgs": "<RFS PARAMETERS (see below)>",
"stage": "dev",
"otelCollectorEnabled": true,
"migrationConsoleServiceEnabled": true,
"reindexFromSnapshotServiceEnabled": true,
"migrationAssistanceEnabled": true
}
}
```

The source and target cluster authorization can be configured to have no authorization, `basic` with a username and password, or `sigv4`.

3. Bootstrap the account with the following command:

```bash
cdk bootstrap --c contextId=migration-assistant --require-approval never
```

4. Deploy the stacks:

```bash
cdk deploy "*" --c contextId=migration-assistant --require-approval never --concurrency 5
```

5. Verify that all CloudFormation stacks were installed successfully.

#### `ReindexFromSnapshot` parameters

Check failure on line 152 in _migrations/getting-started-data-migration.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _migrations/getting-started-data-migration.md#L152

[OpenSearch.HeadingCapitalization] 'parameters' is a heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.HeadingCapitalization] 'parameters' is a heading and should be in sentence case.", "location": {"path": "_migrations/getting-started-data-migration.md", "range": {"start": {"line": 152, "column": 28}}}, "severity": "ERROR"}

If you're creating a snapshot using migration tooling, these parameters are auto-configured. If you're using an existing snapshot, modify the `reindexFromSnapshotExtraArgs` setting with the following values:

```bash
--s3-repo-uri s3://<bucket-name>/<repo> --s3-region <region> --snapshot-name <name>
```

You will also need to give access to the `migrationconsole` and `reindexFromSnapshot` taskRole permissions to the EC2 bucket.

---

## Step 5: Deploying the migration assistant

To deploy the migration assistant to EC2, use the following steps:

1. Bootstrap the account:

```bash
cdk bootstrap --c contextId=migration-assistant --require-approval never --concurrency 5
```
2. Deploy the stacks when `cdk.context.json` is fully configured:

```bash
cdk deploy "*" --c contextId=migration-assistant --require-approval never --concurrency 3
```

These commands deploy the following stacks:

* Migration assistant network stack
* RFS stack
* Migration console stack

---

## Step 6: Accessing the Migration Console

Check failure on line 187 in _migrations/getting-started-data-migration.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _migrations/getting-started-data-migration.md#L187

[OpenSearch.HeadingCapitalization] 'Step 6: Accessing the Migration Console' is a heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.HeadingCapitalization] 'Step 6: Accessing the Migration Console' is a heading and should be in sentence case.", "location": {"path": "_migrations/getting-started-data-migration.md", "range": {"start": {"line": 187, "column": 4}}}, "severity": "ERROR"}

Run the following command to access the migration console:

```bash
./accessContainer.sh migration-console dev <region>
```


`accessContainer.sh` is located in `/opensearch-migrations/deployment/cdk/opensearch-service-migration/` on the bootstrap instance. To learn more, see [Accessing the migration console]:
`{: .note}

---

## Step 7: Checking the connection to the source and target clusters

To verify the connection to the clusters, run:

```bash
console clusters connection-check
```

You should receive the following output:

```
* **Source Cluster:** Successfully connected!
* **Target Cluster:** Successfully connected!
```

To learn more about migration console commands, see [Migration commands].

---

## Step 8: Snapshot creation

Run the following to initiate creating a snapshot from the source cluster:

```bash
console snapshot create [...]
```

To check on the progress of the snapshot creation, use:

```bash
console snapshot status [...]
```

To learn more details about the snapshot, use:

```bash
console snapshot status --deep-check [...]
```

Wait for the snapshot to complete before moving to step 9.

To learn more about snapshot creation, see [Snapshot Creation]:

---

## Step 9 - Metadata Migration

Check failure on line 246 in _migrations/getting-started-data-migration.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _migrations/getting-started-data-migration.md#L246

[OpenSearch.HeadingCapitalization] 'Step 9 - Metadata Migration' is a heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.HeadingCapitalization] 'Step 9 - Metadata Migration' is a heading and should be in sentence case.", "location": {"path": "_migrations/getting-started-data-migration.md", "range": {"start": {"line": 246, "column": 4}}}, "severity": "ERROR"}

Run the following command to migrate metadata:

```bash
console metadata migrate [...]
```

To learn more see [Metadata migration]

---

## Step 10: RFS Document Migration

Check failure on line 258 in _migrations/getting-started-data-migration.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _migrations/getting-started-data-migration.md#L258

[OpenSearch.HeadingCapitalization] 'Step 10: RFS Document Migration' is a heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.HeadingCapitalization] 'Step 10: RFS Document Migration' is a heading and should be in sentence case.", "location": {"path": "_migrations/getting-started-data-migration.md", "range": {"start": {"line": 258, "column": 4}}}, "severity": "ERROR"}

You can now use RFS to migrate documents from your original cluster:

1. To start the migration from RFS, start a `backfill` using the following command:

```bash
console backfill start
```

2. If you need more workers in order to accommodate the number of documents in your cluster, use the following command:

```bash
console backfill scale <NUM_WORKERS>
```

3. To check the status of the documentation backfill, use the following command:

```bash
console backfill status
```

4. If you need to stop the backfill process use the following command:

```bash
console backfill stop
```

_Learn more [[Backfill Execution]]_

---

## Step 11: Backfill monitoring

Use the following command for detailed monitoring:

```bash
console backfill status --deep-check
```

You should receive the following output:

```json
BackfillStatus.RUNNING
Running=9
Pending=1
Desired=10
Shards total: 62
Shards completed: 46
Shards incomplete: 16
Shards in progress: 11
Shards unclaimed: 5
```

Logs and metrics are available in CloudWatch in the `OpenSearchMigrations` log group.

---

## Step 12: Verify all documents were migrated

Use the following query in CloudWatch logs insights to identify failed documents:

```bash
fields @message
| filter @message like "Bulk request succeeded, but some operations failed."
| sort @timestamp desc
| limit 10000
```

If any failed documents are identified, you can index the failed documents directly, as opposed to using RFS.

For more information, see [Backfill migration].
Loading

0 comments on commit 1d02534

Please sign in to comment.