Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(sink): refactor es and opensearch to support async #17746

Merged
merged 22 commits into from
Oct 10, 2024

Conversation

xxhZs
Copy link
Contributor

@xxhZs xxhZs commented Jul 18, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

refactor es and opensearch with https://docs.rs/opensearch/latest/opensearch/ and https://docs.rs/elasticsearch/latest/elasticsearch/

they support async, and their sink_debuple defaults to true

#17073

resolve #17443

The new sink is exactly the same as the original creation parameters. For now, we keep two copies of the code (elasticsearch and elasticsearch_rust), after which we can delete the old one.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@xxhZs xxhZs requested a review from a team as a code owner July 18, 2024 10:17
@xxhZs xxhZs requested a review from lmatz July 18, 2024 10:17
Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we change to use the rust implementation by default, we can also run the es e2e so that the new logic implemented in this PR can be covered in the test.

use super::{DummySinkCommitCoordinator, Sink, SinkError, SinkParam, SinkWriterParam};
use crate::sink::Result;

pub const OPENSEARCH_SINK: &str = "opensearch_rust";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR we can just deprecate the original java implementation and always use the rust implementation when the connector is opensearch or elastic-search. As long as the with option is the same as the java implementation, we can switch to the rust implementation without any compatibility issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already made the parameters consistent, but not sure about migrating in this pr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling it _rust doesn't look good because users shouldn't care about the implementation language.

We should either

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with yim and decided to replace the previous implementation in this pr, so rust's will be renamed opensearch, and java's will remain, or is it called opensearch_v1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the new impl is called opensearch, and replaces the java impl, I think java impl will become dead code and we can directly remove it?

src/connector/src/sink/elasticsearch_opensearch_common.rs Outdated Show resolved Hide resolved
src/connector/src/sink/opensearch.rs Outdated Show resolved Hide resolved
@xxhZs xxhZs force-pushed the xxh/es-opensearch-sink branch from a663b69 to 88264e5 Compare September 23, 2024 07:18
@graphite-app graphite-app bot requested a review from a team September 23, 2024 07:40
@xxhZs xxhZs requested a review from wenym1 September 29, 2024 05:00
Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any test or CI to validate whether the current implementation works?

src/connector/src/sink/remote.rs Outdated Show resolved Hide resolved
@xxhZs
Copy link
Contributor Author

xxhZs commented Sep 29, 2024

Do we have any test or CI to validate whether the current implementation works?

Yes, since we've changed our name to elasticsearch, the original tests can be used on the new implementation

@xxhZs xxhZs force-pushed the xxh/es-opensearch-sink branch 2 times, most recently from 5787b79 to 0086ea9 Compare September 29, 2024 07:58
@xxhZs xxhZs force-pushed the xxh/es-opensearch-sink branch from 0086ea9 to 66e902d Compare September 29, 2024 08:07
fix ci

fix ci

fix ci

fix ci
@xxhZs xxhZs force-pushed the xxh/es-opensearch-sink branch from 66e902d to 202f0e4 Compare September 29, 2024 08:41
Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM.

{"took":3,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":7,"relation":"eq"},"max_score":1.0,"hits":[{"_index":"test2","_type":"_doc","_id":"5_5-2","_score":1.0,"_source":{"d":"1970-01-01","st":{"st1":5,"st2":5},"t":"20:00:00.000000","ts":"1970-01-01 00:00:00.000000","tz":"1970-01-01 00:00:00.000000","v1":5,"v2":2,"v3":"5-2"}},{"_index":"test2","_type":"_doc","_id":"8_8-2","_score":1.0,"_source":{"d":"1970-01-01","st":{"st1":8,"st2":8},"t":"20:00:00.000000","ts":"1970-01-01 00:00:00.000000","tz":"1970-01-01 00:00:00.000000","v1":8,"v2":2,"v3":"8-2"}},{"_index":"test2","_type":"_doc","_id":"3_3-2","_score":1.0,"_source":{"d":"1970-01-01","st":{"st1":3,"st2":3},"t":"00:00:00.123456","ts":"1970-01-01 00:00:00.123456","tz":"1970-01-01 00:00:00.123456","v1":3,"v2":2,"v3":"3-2"}},{"_index":"test2","_type":"_doc","_id":"13_13-2","_score":1.0,"_source":{"d":"1970-01-01","st":{"st1":13,"st2":13},"t":"20:00:00.123456","ts":"1970-01-01 20:00:00.123456","tz":"1970-01-01 20:00:00.123456","v1":13,"v2":2,"v3":"13-2"}},{"_index":"test2","_type":"_doc","_id":"2_2-2","_score":1.0,"_source":{"d":"1970-01-01","st":{"st1":2,"st2":2},"t":"00:00:00.000000","ts":"1970-01-01 00:00:00.000000","tz":"1970-01-01 00:00:00.000000","v1":2,"v2":2,"v3":"2-2"}},{"_index":"test2","_type":"_doc","_id":"1_1-2","_score":1.0,"_source":{"d":"1970-01-01","st":{"st1":1,"st2":1},"t":"00:00:00.000000","ts":"1970-01-01 00:00:00.000000","tz":"1970-01-01 00:00:00.000000","v1":1,"v2":2,"v3":"1-2"}},{"_index":"test2","_type":"_doc","_id":"1_1-50","_score":1.0,"_source":{"d":"2000-01-01","st":{"st1":1,"st2":1},"t":"00:00:00.123456","ts":"2000-01-01 00:00:00.123456","tz":"2000-01-01 00:00:00.123456","v1":1,"v2":50,"v3":"1-50"}}]}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What causes the change in the result? Is the changes expected?

Copy link
Contributor Author

@xxhZs xxhZs Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I noticed a missing column in the previous test

@graphite-app graphite-app bot requested a review from a team September 29, 2024 09:10
Comment on lines 74 to 75
{ ElasticSearchJava, ElasticSearchJavaSink, "elasticsearch_v1" }
{ OpensearchJava, OpenSearchJavaSink, "opensearch_v1"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask why to keep _v1?

Copy link
Contributor Author

@xxhZs xxhZs Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are some users using java es, I'd like to be able to keep him for a while, in case something goes wrong with the migration to v2, so that they can use v1 first. Then delete v1 when v2 is more stable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify,

  • For users already using connector = elasticsearch, when they upgrade, they will switch to v2 immediately.
  • If they meet problems, they can manually create new sinks with elasticsearch_v1 as a temporary workaround.

Is this what you want?

The main concern for me is that when we offer elasticsearch_v1, it might be hard to delete the legacy implementation in the future. (It might be possible if we only make it an undocumented feature and tell only some users about it...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes yes, I'm leaning towards,Only problems that are difficult to solve with v2 will make users try v1, and no updates will be made after v1, he may just exist for a version or two as a transition

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For existing users, if they encounter any problem in this new rust implementation, instead of letting user recreate a sink with connector as _v1, I think we can make a patch and build a separate image to change back to use the java implementation for that users, and meanwhile investigate the fix the rust implementation.

If this is feasible, we can exclude the _v1 from the sink list.

Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the PR.

Comment on lines 74 to 75
{ ElasticSearchJava, ElasticSearchJavaSink, "elasticsearch_v1" }
{ OpensearchJava, OpenSearchJavaSink, "opensearch_v1"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For existing users, if they encounter any problem in this new rust implementation, instead of letting user recreate a sink with connector as _v1, I think we can make a patch and build a separate image to change back to use the java implementation for that users, and meanwhile investigate the fix the rust implementation.

If this is feasible, we can exclude the _v1 from the sink list.

@xxhZs xxhZs requested a review from hzxa21 October 9, 2024 03:13
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for Cargo.lock

@xxhZs xxhZs added this pull request to the merge queue Oct 10, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 10, 2024
@xxhZs xxhZs added this pull request to the merge queue Oct 10, 2024
Merged via the queue into main with commit 840c37b Oct 10, 2024
29 of 30 checks passed
@xxhZs xxhZs deleted the xxh/es-opensearch-sink branch October 10, 2024 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reimplement elastic-search and open-search sink with rust sdk
4 participants