-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(sink): Use s3 sink to replace the original snowflake backend implementation #18996
Conversation
If we are going to implement sink into snowflake directly, we need to change the backend implementation but keep the frontend part for compatibility. Thus I think the current PR is safe because it is invisible to the user. The only one concern is that previously the snowflake sink is paid feature, we now implement it using file sink and have to make it non-paid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xxhZs is more familiar with how to set up snowflake environment. You may ask him to help for the testing.
e2e_test/s3/fs_sink.py
Outdated
@@ -199,12 +200,12 @@ def _table(): | |||
test_timestamp, | |||
test_timestamptz | |||
from {_table()} WITH ( | |||
connector = 's3', | |||
connector = 'snowflake', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is weird to test snowflake sink in fs sink test. Let's keep the previous codes instead.
@@ -897,6 +897,9 @@ static CONNECTORS_COMPATIBLE_FORMATS: LazyLock<HashMap<String, HashMap<Format, V | |||
FileSink::<S3Sink>::SINK_NAME => hashmap!( | |||
Format::Plain => vec![Encode::Parquet, Encode::Json], | |||
), | |||
FileSink::<SnowflakeSink>::SINK_NAME => hashmap!( | |||
Format::Plain => vec![Encode::Parquet, Encode::Json], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior to this PR, if FORMAT ... ENCODE ...
is not specified by user (See the example here), json encode will be used while after this PR, it will be an error. Let's keep the behavior backward compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the encoder of the snowflake sink was previously hardcoded as a json encoder, json is the default here. In other words, even without this pr, when the snowflake sink supports other encodings in the future, it will also need to add the FORMAT ... ENCODE ... field. Can we regard this pr as an enhancement to the snowflake sink, that is, changing the encoding it supports from json to json and parquet, so users need to add the FORMAT ... ENCODE ... field.
Let's keep the behavior backward compatible
This means under this pr, we need to give a default value for FileSink's format_desc, at least the default value needs to be used when the sink is snowflake, which I think is a bit strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to give a default value for FileSink's format_desc when connector is snowflake
There are actually two dimensions of compatibility here.
In the frontend syntax part, we can achieve full compatibility with the previous snowflake sink; but in the implementation part, since the file sink implements batching, the new snowflake sink is still different from the previous one.
It's not hard to implement full compatibility in frontend syntax .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it paid and modify this in a separate PR if needed.
We have a demo integration test for snowflake sink. We should at least manually run this demo to make sure the steps are still working if we don't have a CI test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!I'll be testing to see if it's working these days.
src/license/src/feature.rs
Outdated
@@ -46,7 +46,6 @@ macro_rules! for_all_features { | |||
{ TestPaid, Paid, "A dummy feature that's only available on paid tier for testing purposes." }, | |||
{ TimeTravel, Paid, "Query historical data within the retention period."}, | |||
{ GlueSchemaRegistry, Paid, "Use Schema Registry from AWS Glue rather than Confluent." }, | |||
{ SnowflakeSink, Paid, "Delivering data to SnowFlake." }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need to set sw sink as a paid feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, paid or non-paid is a bit strange, I don’t have any preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think twice, we may keep it paid, because user does not aware of this change. cc @neverchanje @hzxa21 for decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested to work properly, also need to modify the relevant configuration in integration_tests/snowflake_sink
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
As the backend implementation of snowflake sink is almost the same as file sink, I think it's ok to only keep the frontend part of snowflake sink.
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.