Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration assistant intro #8691

Merged
merged 15 commits into from
Nov 26, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 0 additions & 59 deletions _migrations/Solution-Overview.md

This file was deleted.

57 changes: 34 additions & 23 deletions _migrations/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,51 +9,62 @@

# Migration Assistant for OpenSearch

This overview outlines the process for successfully performing an end-to-end, zero-downtime migration. The solution offered in this repository caters to several specific scenarios:
Migrations Assistant for OpenSearch aids you in successfully performing an end-to-end, zero-downtime migration to OpenSearch from other search providers. It helps with the following scenarios:
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

1. **Metadata Migration** - Migrating cluster metadata, such as index settings, aliases, and templates.
2. **Backfill Migration** - Migrating existing or historical data from a source to a target cluster.
3. **Live Traffic Migration** - Replicating live ongoing traffic from source to target cluster.
4. **Comparative Tooling** - Comparing the performance and behaviors of an existing cluster with a prospective new one.
- **Metadata migration**: Migrating cluster metadata, such as index settings, aliases, and templates.
- **Backfill migration**: Migrating existing or historical data from a source to a target cluster.
- **Live traffic migration**: Replicating live ongoing traffic from source to target cluster.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
- **Comparative tooling**: Comparing the performance and behaviors of an existing cluster with a prospective new one.

In this guide, we focus on scenarios 1-3, guiding you through a backfill from a source cluster while concurrently handling live production traffic, which will be captured and replayed to a target cluster.
This user guide focuses on conducting a comprehensive migration involving both existing and live data with zero downtime and the option to back out of a migration.

It's crucial to note that migration strategies are not universally applicable. This guide provides a detailed methodology, based on certain assumptions detailed throughout, emphasizing the importance of robust engineering practices to ensure a successful migration.
{: .tip }

## Key components of Migration Assistant

Check failure on line 24 in _migrations/index.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _migrations/index.md#L24

[OpenSearch.HeadingCapitalization] 'Key components of Migration Assistant' is a heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.HeadingCapitalization] 'Key components of Migration Assistant' is a heading and should be in sentence case.", "location": {"path": "_migrations/index.md", "range": {"start": {"line": 24, "column": 4}}}, "severity": "ERROR"}
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

## Key Components of the Solution
The following are the key components of Migration Assistant.

### Elasticsearch/OpenSearch source

### Elasticsearch/OpenSearch Source
Your source cluster in this solution operates on Elasticsearch or OpenSearch, hosted on EC2 instances or similar computing environments. A proxy is set up to interact with this source cluster, either positioned in front of or directly on the coordinating nodes of the cluster.

### Migration Management Console
### Migration management console

A console that provides a migration-specific CLI and offers a variety of tools to streamline the migration process. Everything necessary for completing a migration, other than cleaning up the migration resources, can be done via this Console.

### Traffic Capture Proxy
This component is designed for HTTP RESTful traffic, playing a dual role. It not only forwards traffic to the source cluster but also splits and channels this traffic to a stream-processing service for later playback.
### Traffic capture proxy

This component is designed for HTTP RESTful traffic. It forwards traffic to the source cluster and also splits and channels this traffic to a stream-processing service for later playback.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

### Traffic Replayer

Acting as a traffic simulation tool, the Traffic Replayer replays recorded request traffic to a target cluster, mirroring source traffic patterns. It links original requests and their responses to those directed at the target cluster, facilitating comparative analysis.

### Metadata Migration Tool
A tool integrated into the Migration CLI that can also be used independently to migrate cluster metadata, including index mappings, index configuration settings, templates, component templates, and aliases.
### Metadata migration tool

### Reindex-from-Snapshot
Reindexing data from an existing snapshot on Elastic Container Service (ECS) workers that coordinate the migration of documents from an existing snapshot, reindexing the documents in parallel to a target cluster.
The Metadata migration tool integrated into the Migration CLI can be used independently to migrate cluster metadata, including index mappings, index configuration settings, templates, component templates, and aliases.

### reindex-from-snapshot

`Reindex-from-Snapshot` (RFS) reindexes data from an existing snapshot. Workers on Elastic Container Services (ECS) coordinate the migration of documents from an existing snapshot, reindexing the documents in parallel to a target cluster.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

### Target cluster

### Target Cluster
The destination cluster for migration or comparison in an A/B test.

### Architecture Overview
This architecture is based on the use of AWS cloud infrastructure, but most tools are designed to be cloud-independent. A local containerized version of this solution is also available.
### Architecture overview

The Migration assistant architecture is based on the use of an AWS Cloud infrastructure, but most tools are designed to be cloud-independent. A local containerized version of this solution is also available.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

The design deployed in AWS is as follows:

![Migration architecture overview]({{site.url}}{{site.baseurl}}/images/migrations/migration-architecture-overview.svg)

1. Client traffic is directed to the existing cluster.
2. An ALB with Capture Proxies relaying traffic to source while replicating to Amazon MSK.
3. With continuous traffic capture in place, a Reindex-from-Snapshot (RFS) is initiated by the user via Migration Console.
4. Once Reindex-from-Snapshot is complete, traffic captured is replayed from MSK by Traffic Replayer.
5. Performance and behavior of traffic sent to source and target clusters are compared by reviewing logs and metrics.
6. After confirming the target cluster’s functionality meets expectations the use redirects clients to new target.
2. An Application Load Balancer (ALB) with capture proxies relays traffic to a source while replicating data to Amazon Managed Streaming for Apace Kafka (AWS MSK).
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
3. Using the migration console, you can initiate metadata migration to establish indexes, templates, component templates, and aliases on the target cluster.
4. With continuous traffic capture in place, you can use a `reindex-from-snapshot` process to capture data from your current index.
4. Once `reindex-from-snapshot` is complete, captured traffic is replayed from AWS MSK to the target cluster by the traffic replayer.

Check failure on line 68 in _migrations/index.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _migrations/index.md#L68

[OpenSearch.Spelling] Error: replayer. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: replayer. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_migrations/index.md", "range": {"start": {"line": 68, "column": 125}}}, "severity": "ERROR"}
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
5. Performance and behavior of traffic sent to the source and target clusters are compared by reviewing logs and metrics.
6. After confirming the target cluster’s functionality meets expectations, clients are redirected to the new target.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Loading