Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support a method to reload a CDC table without recreating it #12753

Closed
StrikeW opened this issue Oct 11, 2023 · 10 comments
Closed

Support a method to reload a CDC table without recreating it #12753

StrikeW opened this issue Oct 11, 2023 · 10 comments

Comments

@StrikeW
Copy link
Contributor

StrikeW commented Oct 11, 2023

Is your feature request related to a problem? Please describe.

When we try to recover a CDC table job, if the upstream binlog/wal file is missing, currently this job cannot be recovered. Right now, we can only drop the table and recreate it to do the full replication. However, the process of rebuilding a CDC table can be tedious, because there maybe many downstream stream jobs depends on the CDC table.

One of our customer request us to support a method to reload the CDC table without recreating it.

Describe the solution you'd like

Some ideas:

  • Trigger a full replication internally when we detect that upstream binlog file is missing in the recovery process, but how to detect the binlog file is missing? Need some investigatation
  • Introduce a new SQL command, for example the REFRESH command to allow user to trigger a full replication for a specific CDC table. Mainly for operational scenario.

reference:
DB2: https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-refresh-table
PG: https://www.postgresql.org/docs/current/sql-refreshmaterializedview.html

Describe alternatives you've considered

No response

Additional context

related: #12313, #14060

@github-actions github-actions bot added this to the release-1.4 milestone Oct 11, 2023
@BugenZhao
Copy link
Member

BugenZhao commented Oct 11, 2023

This is somehow similar to Alter MV. IIUC, the main motivation is to preserve all downstream streaming jobs. Then the problem is, how to decide the changes that should be yielded to these downstream if there's already data loss?

  • Delete all records in the RW table and re-synchronize with the upstream. Will result in a significant amount of downtime and unnecessary changes to the sinks.
  • Directly re-synchronize with the upstream and rely on the on-conflict handling to overwrite data. Will there be dirty data visible by users?
  • Somehow find the "diff" to indicate the lost data. This introduces a brand-new job pattern and may require a complete design.

@liurenjie1024
Copy link
Contributor

Trigger a full replication internally when we detect that upstream binlog file is missing in the recovery process, but how to detect the binlog file is missing? Need some investigatation

By full replication you mean do full scan of source table?

@hzxa21
Copy link
Collaborator

hzxa21 commented Oct 12, 2023

When we try to recover a CDC table job, if the upstream binlog/wal file is missing, currently this job cannot be recovered.

Why is the upstream binlog/wal file missing at the first place? Is this an expected situaion.

I guess this is due to binlog/WAL retention? Correct me if I am wrong, the job cannot be recovered only if the missing upstream binlog/WAL are "unconsumed" by RW. If this is the case, that means there is something unexpected happening in RW cluster causing lags in consuming the binlog/WAL files. That sounds rare to me and dropping & creating the cdc table sounds acceptable to me.

@hzxa21
Copy link
Collaborator

hzxa21 commented Oct 13, 2023

I tried to repro binlog missing via the following steps:

  1. Use docker compose under integration test to create MySQL/RW pods and create sample tables and MVs.
  2. Run the following command in MySQL:
mysql> show binary logs;
+---------------+-----------+-----------+
| Log_name      | File_size | Encrypted |
+---------------+-----------+-----------+
| binlog.000003 |       157 | No        |
+---------------+-----------+-----------+

mysql> purge binary logs before now();

# Reset to a number larger than the binlog idx saw above
mysql> reset master to 4
  1. Check RW cluster and run batch query

Here are my findings:

  1. RW cluster is still up after the binlog is cleaned.
  2. Newly changes to the MySQL table won't reflected in RW table.
  3. I saw the following errors in compute node:
2023-10-13 06:44:56,867 ERROR [blc-mysql:3306] mysql.MySqlStreamingChangeEventSource:1094 - Error during binlog processing. Last offset stored = {transaction_id=null, ts_sec=1697179409, file=binlog.000004, pos=1882, server_id=1, event=1}, binlog reader near position = binlog.000004/1882
2023-10-13 06:44:56,871 ERROR [blc-mysql:3306] pipeline.ErrorHandler:35 - Producer failure
io.debezium.DebeziumException: could not find next log; the first event 'binlog.000004' at 1042, the last event read from './binlog.000004' at 1882, the last byte read from './binlog.000004' at 1882. Error code: 1236; SQLSTATE: HY000.
        at io.debezium.connector.mysql.MySqlStreamingChangeEventSource.wrap(MySqlStreamingChangeEventSource.java:1194) ~[debezium-connector-mysql-1.9.7.Final.jar:1.9.7.Final]
        at io.debezium.connector.mysql.MySqlStreamingChangeEventSource$ReaderThreadLifecycleListener.onCommunicationFailure(MySqlStreamingChangeEventSource.java:1239) [debezium-connector-mysql-1.9.7.Final.jar:1.9.7.Final]
        at com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:1079) [mysql-binlog-connector-java-0.27.2.jar:0.27.2]
        at com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:631) [mysql-binlog-connector-java-0.27.2.jar:0.27.2]
        at com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:932) [mysql-binlog-connector-java-0.27.2.jar:0.27.2]
        at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: com.github.shyiko.mysql.binlog.network.ServerException: could not find next log; the first event 'binlog.000004' at 1042, the last event read from './binlog.000004' at 1882, the last byte read from './binlog.000004' at 1882.
        at com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:1043) ~[mysql-binlog-connector-java-0.27.2.jar:0.27.2]
        ... 3 more
2023-10-13 06:44:56,872 INFO  [blc-mysql:3306] mysql.MySqlStreamingChangeEventSource:1210 - Stopped reading binlog after 0 events, last recorded offset: {transaction_id=null, ts_sec=1697179409, file=binlog.000004, pos=1882, server_id=1, event=1}
2023-10-13 06:44:57,112 INFO  [rw-dbz-engine-runner-1008] embedded.EmbeddedEngine:847 - Stopping the task and engine
2023-10-13 06:44:57,113 INFO  [rw-dbz-engine-runner-1008] common.BaseSourceTask:243 - Stopping down connector
2023-10-13T06:44:57.161128057Z  INFO risingwave_storage::hummock::event_handler::uploader: epoch 5239229393076224 is sealed
2023-10-13T06:44:57.16114017Z  INFO risingwave_storage::hummock::event_handler::uploader: epoch 5239229393076224 to seal has no data
2023-10-13 06:44:57,216 INFO  [debezium-mysqlconnector-RW_CDC_1008-change-event-source-coordinator] pipeline.ChangeEventSourceCoordinator:175 - Finished streaming
2023-10-13 06:44:57,222 INFO  [pool-5-thread-1] jdbc.JdbcConnection:962 - Connection gracefully closed
2023-10-13 06:44:57,225 ERROR [rw-dbz-engine-runner-1008] core.DbzCdcEngineRunner:83 - engine#1008 terminated with error. message: Error while trying to run connector class 'io.debezium.connector.mysql.MySqlConnector'
org.apache.kafka.connect.errors.ConnectException: An exception occurred in the change event producer. This connector will be stopped.
        at io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:50) ~[debezium-core-1.9.7.Final.jar:1.9.7.Final]
        at io.debezium.connector.mysql.MySqlStreamingChangeEventSource$ReaderThreadLifecycleListener.onCommunicationFailure(MySqlStreamingChangeEventSource.java:1239) ~[debezium-connector-mysql-1.9.7.Final.jar:1.9.7.Final]
        at com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:1079) ~[mysql-binlog-connector-java-0.27.2.jar:0.27.2]
        at com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:631) ~[mysql-binlog-connector-java-0.27.2.jar:0.27.2]
        at com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:932) ~[mysql-binlog-connector-java-0.27.2.jar:0.27.2]
        at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: io.debezium.DebeziumException: could not find next log; the first event 'binlog.000004' at 1042, the last event read from './binlog.000004' at 1882, the last byte read from './binlog.000004' at 1882. Error code: 1236; SQLSTATE: HY000.
        at io.debezium.connector.mysql.MySqlStreamingChangeEventSource.wrap(MySqlStreamingChangeEventSource.java:1194) ~[debezium-connector-mysql-1.9.7.Final.jar:1.9.7.Final]
        ... 5 more
Caused by: com.github.shyiko.mysql.binlog.network.ServerException: could not find next log; the first event 'binlog.000004' at 1042, the last event read from './binlog.000004' at 1882, the last byte read from './binlog.000004' at 1882.
        at com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:1043) ~[mysql-binlog-connector-java-0.27.2.jar:0.27.2]
        ... 3 more

I am using this git commit (git-5a7e0883e10e6cba54a70888717b3d2152bc9a61) since this is the locally cached one and my box takes a long time to pull the latest image for some reason.

@StrikeW
Copy link
Contributor Author

StrikeW commented Oct 16, 2023

Hi @hzxa21 , thanks for the effort. But I doubt that it may not a correct way to reproduce the binlog missing case caused by retention.

The io.debezium.DebeziumException: could not find next log; error is caused by binlog file missing on the MySQL, but the missing is due to the reset master command which forcefully removes the current active binlog file.
The active binlog file would not be successfully removed by the PURGE BINARY LOGS (and also retention policy IMO).

mysql> show master status;
+---------------+----------+--------------+------------------+-------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+-------------------+
| binlog.000009 |    13947 |              |                  |                   |
+---------------+----------+--------------+------------------+-------------------+
1 row in set (0.01 sec)

mysql> PURGE BINARY LOGS BEFORE now();
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+------------------------------------------------------------------------+
| Level   | Code | Message                                                                |
+---------+------+------------------------------------------------------------------------+
| Warning | 1868 | file ./binlog.000009 was not purged because it is the active log file. |
+---------+------+------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> reset master to 10;
Query OK, 0 rows affected (0.02 sec)

The io.debezium.DebeziumException: could not find next log exception occurs in the connector after the reset master command.

In summary, if binlog file is accidentally lost instead of removed by the retention policy, then the following behavior is expected. For this case we need to drop and rebuild the cdc table.

2. Newly changes to the MySQL table won't reflected in RW table.
3. I saw the following errors in compute node:

@StrikeW
Copy link
Contributor Author

StrikeW commented Oct 16, 2023

This is somehow similar to Alter MV. IIUC, the main motivation is to preserve all downstream streaming jobs. Then the problem is, how to decide the changes that should be yielded to these downstream if there's already data loss?

Right now we should focus on the case that binlog file is deleted by the MySQL retention policy. In this case, the persisted source offset may have been purged by MySQL. So that the connector can't be recovered correctly upon recovery, then cdc table would not sync with its upstream.

I found that Debezium can emit heartbeat event to downstream. With the help with heartbeat events, we can update the source offset to keep it up to date with the upstream even the upstream doesn't have any update for a long time. Then offset can be reset successfully during the recovery. I will implement the support of heartbeat event first to solve the problem for our customer.

Heartbeat messages are useful for monitoring whether the connector is receiving change events from the database. Heartbeat messages might help decrease the number of change events that need to be re-sent when a connector restarts.

@BugenZhao
Copy link
Member

This is somehow similar to Alter MV. IIUC, the main motivation is to preserve all downstream streaming jobs. Then the problem is, how to decide the changes that should be yielded to these downstream if there's already data loss?

Did you accidentally reference this? 👀

@StrikeW
Copy link
Contributor Author

StrikeW commented Oct 18, 2023

This is somehow similar to Alter MV. IIUC, the main motivation is to preserve all downstream streaming jobs. Then the problem is, how to decide the changes that should be yielded to these downstream if there's already data loss?

Did you accidentally reference this? 👀

No. Since I think the problem you mentioned is more general and complex, for example when the compute node crashed for a long time. But right now we can narrow the scope and assumes that compute node won't be down for more than the expiration time of the binlog, so that the connector can reset its offset upon recovery.

@StrikeW
Copy link
Contributor Author

StrikeW commented Jan 4, 2024

We have another round of discussion of this issue. Let me migrate it to this thread.
The background is when our CDC connector runs into trouble, the CDC table would miss some upstream data.

@fuyufjh: How do you plan to implement “resync”? Especially, will you delete and re-insert all rows from the streaming perspective?
Yes - then it’s not better than drop cascade and recreate all objects
No - then it will lead to inconsistant data and may cause panic

If user confirm that their upstream table is append-only, I and @hzxa21 have a discussion about this case. We can provide a command to allow user trigger a full re-sync with their upstream table, and introduce a new type of ConflictBehavior for the MV operator that will compare the old row (if exists) and new row for a primary key. So that we can recognize no-op Upsert events to avoid emitting redundant UD and UI events to the downstream. The idea is similar to #14339.

Since this new type of conflict handle will introduce overhead, it should be switched off after the re-sync has been done.

Copy link
Contributor

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

@StrikeW StrikeW closed this as not planned Won't fix, can't repro, duplicate, stale Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants