-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
discussion: append-only sink #7045
Comments
It looks like the problem is not only about "append-only sink", but rather about the syntax to define a Sink. In other words: would Flink's syntax be better? |
@tabVersion is working on a PRD describing the user-facing procedure. |
I am considering the pros & cons of our current approach i.e. the Problem: Whether the sink should include hidden columns? #7035My answer is definitely "No". Hidden columns, as its name implies, these columns are only used by RisingWave internally and should not be exposed to users in any form, including materialized views and sinks. Problem: How do users define the primary key?This can be discussed in 3 cases:
Problem: how do users control whether the sink is append-only or upsert? (this issue)I think this problem can be divided into 2 parts. First, whether a SQL query is append-only or upsert is not controlled by users. Instead, it's a property of the query itself and could only be derived by our optimizer. By the way, Second, whether a Sink connector is append-only or upsert is also not controlled by users. It is an intrinsic property of the Sink e.g. Kafka/Pulsar is append-only, while Redis/JDBC is updatable, etc. When creating a Sink with a query, what we should do is check whether they are compatible: if the query is append-only, either append-only or updatable sink works; but for updatable queries, the sink connector must be updatable as well. A special case is JDBC connector, as mentioned in the issue description, Flink's JDBC connector decides the behavior according to PK's existence in the users' definition. For us, check Case 3 in the previous section: we can just ask for the external system for the PKs. SummaryIt looks like our current syntax can still work, so how about keeping it as is now? |
I personally feel that
For now, I think the only reason for me to accept |
Yes, that's also my concern. From either us or the user's perspective, query with |
Link to the doc: https://www.notion.so/risingwave-labs/Sink-User-Behavior-Guideline-be17dc1d7c89415388bb7bb43effcbe9 |
This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned. |
As we encountered several issues with deducted internal primary keys that do not match the downstream's primary key, leading to panic when creating sinks, we are adjusting the sink's position.
|
In #7035, our system will derive hidden columns as the primary key for the user which is even not declared by the user.
In Flink, there are three kinds of sinks encoding that express the changes of a table,
ApendOnly
,Retractable
andUpsert
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/concepts/dynamic_tables/#table-to-stream-conversion. The sink we have exactly implemented is exactly upsert sink. We will derive a primary key for the stream and operate the table with the key. Our system actually converts the append-only stream to an upsert stream implicitly by adding the row_id column, knowing that all operations are inserts and the keys will never conflict.So we need to introduce append-only sink
Unresolved question: how do users control whether the sink is append-only or upsert?
In Flink, users can easily define the primary key of the external dynamic table(sink) https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/jdbc/.
But in our system, the primary key on the MVew or Sink is inferred internally, which is a black box for users.
E.g. (I did not experient in flink SQL, but I think It is like that) In flink we can easily create an append-only sink for aggregation results which is not easy to expressed in our system:
Because the user does not define the primary key of the result table, all historical results will appear in the result table and the behavior is semantically consistent with the statement "SELECT ... INSERT ..."
But Risingwave use "create materialized view" statement, which means it can just maintain the newest result. So it produces an upsert stream naturally.
The text was updated successfully, but these errors were encountered: