feat(sink): support es sink struct and refactor es sink #14231

xxhZs · 2023-12-27T07:34:44Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Currently, we will be converted to id + json row in rust, and then packaged into chunk, after java's es sink directly forwards our json.
and fix this issue(decimal will coverted to text)

complete #14110

fix #13992

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
I have added test labels as necessary. See details.
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
All checks passed in ./risedev check (or alias, ./risedev c)
My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)

My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

remove sout

lmatz · 2023-12-27T07:53:32Z

and fix this issue(decimal will coverted to text)
#13992

how about adding a test case of decimal into integration tests

xxhZs · 2023-12-27T07:58:58Z

and fix this issue(decimal will coverted to text)
#13992

how about adding a test case of decimal into integration tests

decimal will coverted to text
example:
1.23213213 => "1.23213213"

hzxa21

Rest LGTM.

@wenym1 PTAL as well.

src/connector/src/sink/remote.rs

wenym1

Shall we have some e2e test to cover the struct case of es?

src/connector/src/sink/remote.rs

src/connector/src/sink/encoder/json.rs

wenym1 · 2023-12-28T15:25:06Z

src/connector/src/sink/encoder/mod.rs

@@ -134,8 +140,8 @@ pub enum CustomJsonType {
    // The internal order of the struct should follow the insertion order.
    // The decimal needs verification and calibration.
    Doris(HashMap<String, (u8, u8)>),
-    // Bigquery's json need date is string.
-    Bigquery,
+    // Es's json need jsonb is struct


Why the Bigquery cases is removed?

Because I found that our code went through some refactoring to include the data types, bigquery is no longer a special option and can be used with other sinks using common creation logic

src/connector/src/sink/remote.rs

wenym1 · 2023-12-28T15:29:28Z

src/connector/src/sink/remote.rs

@@ -174,11 +188,12 @@ async fn validate_remote_sink(param: &SinkParam) -> Result<()> {
                    | DataType::Jsonb
                    | DataType::Bytea
                    | DataType::List(_)
+                    | DataType::Struct(_)


The Struct is only valid for es sink. We should reject struct type for other sinks.

array only support int16, int32, int64, float, double, varchar type. How about reject other types as well for other sinks?
#13866

src/connector/src/sink/remote.rs

wenym1 · 2023-12-28T15:48:33Z

...gwave-connector-service/src/main/java/com/risingwave/connector/SinkWriterStreamObserver.java

@@ -207,6 +210,17 @@ private void bindSink(
        String connectorName = getConnectorName(sinkParam);
        SinkFactory sinkFactory = SinkUtils.getSinkFactory(connectorName);
        sink = sinkFactory.createWriter(tableSchema, sinkParam.getPropertiesMap());
+        if (connectorName.equals("elasticsearch")) {


It's unnecessary to set the special schema, since the schema is fixed and we should not access it in es
sink.

The schema here is for use by the StreamChunkDeserializer, which needs to use our mock schema, not the original one deserialize StreamChunk

I see.

Previously we assume the stream chunk schema is the same as the sink logical schema. Now we have broken the assumption. If so, we think we should have a separated field named like payload_schema in the StartSink proto. The StreamChunkDeserializer will now use the schema in the field instead of the field in the SinkParam.

On rust side, when we send the initial start sink request, for es sink we fill in the special schema, and for other sink we fill in the original sink schema, so that on java side we don't need this special logic.

wenym1 · 2023-12-28T15:50:02Z

java/connector-node/risingwave-sink-es-7/src/main/java/com/risingwave/connector/EsSink.java

@@ -198,13 +191,9 @@ public EsSink(EsSinkConfig config, TableSchema tableSchema) {
        this.bulkProcessor = createBulkProcessor(this.requestTracker);

        primaryKeyIndexes = new ArrayList<Integer>();


The primaryKeyIndexes should not be used any more and should be removed.

If we still need a schema from the upstream, why we don't need pk index, even though the schema is (varchar, jsonb) in most cases.

In this PR, all processing is handled in rust side, including generating the doc id and processing the content body. On java side we just build and send request from the processed doc id and content body, so the pk index is not used on java side.

tabVersion · 2023-12-31T00:36:41Z

java/connector-node/risingwave-sink-es-7/src/main/java/com/risingwave/connector/EsSink.java

+        final String key = (String) row.get(0);
+        String doc = (String) row.get(1);


Concern about the forward compatibility

tabVersion · 2023-12-31T00:40:51Z

src/connector/src/sink/remote.rs

+enum StreamChunkConverter {
+    Es(EsStreamChunkConverter),
+    Other,
+}


Not quite get the design, if the logic is specified to es, maybe we can implement it inside es sink.

Like

In this PR, all processing is handled in rust side, including generating the doc id and processing the content body. On java side we just build and send request from the processed doc id and content body, so the pk index is not used on java side.

The goal is to make it easy to implement the struct of es.

proto/connector_service.proto

src/connector/src/sink/remote.rs

proto/connector_service.proto

src/connector/src/sink/remote.rs

src/rpc_client/src/connector_client.rs

src/connector/src/sink/remote.rs

wenym1

Rest LGTM. Thanks for the PR!

src/rpc_client/src/connector_client.rs

src/connector/src/sink/remote.rs

fix ci

xxhZs added 2 commits December 22, 2023 18:27

save

c931597

fix jsonb

70c48a9

github-actions bot added the type/feature label Dec 27, 2023

add build id

f7ebc49

remove sout

xxhZs force-pushed the xxh/add-es-struct branch from b2c8006 to f7ebc49 Compare December 27, 2023 07:53

xxhZs and others added 2 commits December 27, 2023 16:19

Merge branch 'main' into xxh/add-es-struct

dfa9b79

fix test

9f5ae31

xxhZs marked this pull request as ready for review December 27, 2023 08:46

xxhZs requested a review from hzxa21 December 27, 2023 08:46

save

116dff8

hzxa21 approved these changes Dec 28, 2023

View reviewed changes

src/connector/src/sink/remote.rs Outdated Show resolved Hide resolved

xxhZs mentioned this pull request Dec 28, 2023

Support sinking STRUCT as JSON object to ES #14110

Closed

add time

9365a08

wenym1 reviewed Dec 28, 2023

View reviewed changes

tabVersion reviewed Dec 31, 2023

View reviewed changes

xxhZs added 3 commits January 2, 2024 15:21

fix

12a164d

Merge branch 'main' into xxh/add-es-struct

937858e

Merge branch 'main' into xxh/add-es-struct

611ddd2

wenym1 reviewed Jan 2, 2024

View reviewed changes

proto/connector_service.proto Outdated Show resolved Hide resolved

xxhZs and others added 2 commits January 2, 2024 18:07

fix ci

f24254c

Merge branch 'main' into xxh/add-es-struct

f4f4689

wenym1 reviewed Jan 2, 2024

View reviewed changes

xxhZs and others added 2 commits January 2, 2024 22:12

fix

2dcd7eb

Merge branch 'main' into xxh/add-es-struct

db75d4e

wenym1 approved these changes Jan 3, 2024

View reviewed changes

src/rpc_client/src/connector_client.rs Outdated Show resolved Hide resolved

src/connector/src/sink/remote.rs Outdated Show resolved Hide resolved

src/connector/src/sink/remote.rs Outdated Show resolved Hide resolved

xxhZs added 2 commits January 3, 2024 14:00

fix

61d0999

Merge branch 'main' into xxh/add-es-struct

90d698a

xxhZs enabled auto-merge January 4, 2024 06:52

xxhZs disabled auto-merge January 4, 2024 06:52

fix ci

bd4a8a9

fix ci

xxhZs force-pushed the xxh/add-es-struct branch from 887d002 to bd4a8a9 Compare January 4, 2024 08:03

xxhZs added this pull request to the merge queue Jan 4, 2024

Merged via the queue into main with commit f47a892 Jan 4, 2024
28 of 29 checks passed

xxhZs deleted the xxh/add-es-struct branch January 4, 2024 08:57

xuefengze mentioned this pull request Jan 4, 2024

Meta node panics when insert unsupported type data in Elasticsearch-sink #13866

Closed

Li0k pushed a commit that referenced this pull request Jan 10, 2024

feat(sink): support es sink struct and refactor es sink (#14231)

5bf6c46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sink): support es sink struct and refactor es sink #14231

feat(sink): support es sink struct and refactor es sink #14231

xxhZs commented Dec 27, 2023 •

edited by wenym1

Loading

lmatz commented Dec 27, 2023

xxhZs commented Dec 27, 2023

hzxa21 left a comment

wenym1 left a comment

wenym1 Dec 28, 2023

xxhZs Dec 29, 2023

wenym1 Dec 28, 2023

xuefengze Dec 29, 2023 •

edited

Loading

wenym1 Dec 28, 2023

xxhZs Dec 29, 2023 •

edited

Loading

wenym1 Dec 29, 2023

wenym1 Dec 28, 2023

tabVersion Dec 31, 2023

wenym1 Jan 2, 2024

tabVersion Dec 31, 2023

tabVersion Dec 31, 2023

xxhZs Jan 2, 2024

wenym1 left a comment

		@@ -198,13 +191,9 @@ public EsSink(EsSinkConfig config, TableSchema tableSchema) {
		this.bulkProcessor = createBulkProcessor(this.requestTracker);

		primaryKeyIndexes = new ArrayList<Integer>();

		final String key = (String) row.get(0);
		String doc = (String) row.get(1);

feat(sink): support es sink struct and refactor es sink #14231

feat(sink): support es sink struct and refactor es sink #14231

Conversation

xxhZs commented Dec 27, 2023 • edited by wenym1 Loading

What's changed and what's your intention?

Checklist

Documentation

Release note

lmatz commented Dec 27, 2023

xxhZs commented Dec 27, 2023

hzxa21 left a comment

Choose a reason for hiding this comment

wenym1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xuefengze Dec 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xxhZs Dec 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wenym1 left a comment

Choose a reason for hiding this comment

xxhZs commented Dec 27, 2023 •

edited by wenym1

Loading

xuefengze Dec 29, 2023 •

edited

Loading

xxhZs Dec 29, 2023 •

edited

Loading