-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Postgres CDC Source with an column type mismatch causes the Compute Node to fail #19408
Comments
Let me take a look |
Fixed in #19408. Now the cluster will reject this type of mismatch. |
Thank you for the lightning fast fix. I see this change will validate the table prior to creation, which I understand will prevent the CREATE TABLE statement from running, which is good. I'm still rather concerned with the Compute Node completely crashing when one Actor or Job does. Is there anything that can be done to make that more resilient? |
This is actually intentional by design. Rather than forcing the system to operate under erroneous conditions, we prefer that it exposes issues as early as possible when unexpected problems arise, allowing us to fix them promptly. Imagine if our system didn't throw clear error signals for the above issues, but instead simply processed incorrect data as NULL. Users would be puzzled as to why the data they read is NULL, instead of immediately realizing that it's a system issue. This is a design choice under the Streaming workload, as streaming computations are essentially run in the background and are asynchronous. Excessive resiliency or fault tolerance (not referring to availability here) might actually increase the potential risks in the system. From another perspective, our clients' onboarding process usually starts with a PoC, and only after stabilization do we proceed to a full deployment. We hope that the cluster can expose potential issues as much as possible during the PoC phase, reducing urgent or latent problems during official operations. Of course, such a design requires us to consider various corner cases as much as possible during the CREATE TABLE phase, and to have clearer error messages and prompts. We will continue to make efforts in this area. Hope my explanation can ease your concerns. |
That does make sense. In our case, we had a discrepancy in the typing from our production level environment and our test environments, which is what led to the error. Hopefully with the change, we'll be protected, but we'll also be more cautious in the future. |
Describe the bug
If a Postgres table has a
timestamp
column and the corresponding RisingWave table is created with atimestamptz
column, the CREATE TABLE statement succeeds but the Actor job that runs will fail and the entire Compute node will crash.When running in standalone mode, RisingWave also crashes.
This seems to only happen when the table in question already has data in it. If the PG table is empty, the RW source and table are created, then data is INSERTed into the PG table, there is no error.
Error message/log
To Reproduce
In Postgres:
In RisingWave:
Expected behavior
Compute Node should not crash.
CREATE TABLE statement should not complete if the table is incompatible.
How did you deploy RisingWave?
Docker Compose
The version of RisingWave
PostgreSQL 13.14.0-RisingWave-2.0.1 (0d15632)
Additional context
No response
The text was updated successfully, but these errors were encountered: