Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: end-to-end source test failed: Interrupted while stopping coordinator #15739

Closed
fuyufjh opened this issue Mar 18, 2024 · 1 comment
Closed
Assignees
Labels
type/bug Something isn't working
Milestone

Comments

@fuyufjh
Copy link
Member

fuyufjh commented Mar 18, 2024

Describe the bug

https://buildkite.com/risingwavelabs/pull-request/builds/44654#018e4f81-15e2-4e5c-8fe6-6d292d9304ee

Note that there are 2 logs in the next section. The 2nd seems to be caused by the 1st one, and eventually caused the CN node to exited

Error message/log

2024-03-18T03:13:18.52456753Z ERROR risingwave_connector_node: Interrupted while stopping coordinator: java.lang.InterruptedException
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2109)
	at java.base/java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1454)
	at java.base/java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:709)
	at io.debezium.pipeline.ChangeEventSourceCoordinator.stop(ChangeEventSourceCoordinator.java:308)
	at io.debezium.connector.common.BaseSourceTask.stop(BaseSourceTask.java:289)
	at io.debezium.connector.common.BaseSourceTask.stop(BaseSourceTask.java:273)
	at io.debezium.embedded.EmbeddedEngine.stopTaskAndCommitOffset(EmbeddedEngine.java:1047)
	at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:759)
	at io.debezium.embedded.ConvertingEngineBuilder$2.run(ConvertingEngineBuilder.java:229)
	at com.risingwave.connector.source.core.DbzCdcEngine.run(DbzCdcEngine.java:64)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
 thread="rw-dbz-engine-runner-2" class="io.debezium.connector.common.BaseSourceTask"
2024-03-18T03:13:18.525471586Z ERROR risingwave_connector_node: engine#2 terminated with error. message: Error while trying to stop the task and commit the offsets: org.apache.kafka.connect.errors.ConnectException: Interrupted while stopping coordinator, failing the task
	at io.debezium.connector.common.BaseSourceTask.stop(BaseSourceTask.java:296)
	at io.debezium.connector.common.BaseSourceTask.stop(BaseSourceTask.java:273)
	at io.debezium.embedded.EmbeddedEngine.stopTaskAndCommitOffset(EmbeddedEngine.java:1047)
	at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:759)
	at io.debezium.embedded.ConvertingEngineBuilder$2.run(ConvertingEngineBuilder.java:229)
	at com.risingwave.connector.source.core.DbzCdcEngine.run(DbzCdcEngine.java:64)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
 thread="rw-dbz-engine-runner-2" class="com.risingwave.connector.source.core.DbzCdcEngineRunner"
Mon Mar 18 03:18:18 UTC 2024 [risedev]: Program exited with 139

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

No response

Additional context

No response

@fuyufjh fuyufjh added the type/bug Something isn't working label Mar 18, 2024
@github-actions github-actions bot added this to the release-1.8 milestone Mar 18, 2024
@StrikeW StrikeW modified the milestones: release-1.8, release-1.9 Apr 8, 2024
@StrikeW
Copy link
Contributor

StrikeW commented Apr 10, 2024

2024-03-18T03:13:18.521158054Z  INFO risingwave_connector_node: Engine#2: JNI sender broken detected, stop the engine thread="Thread-1" class="com.risingwave.connector.source.core.JniDbzSourceHandler"
2024-03-18T03:13:18.521262817Z  INFO risingwave_connector_node: Stopping the embedded engine thread="Thread-1" class="io.debezium.embedded.EmbeddedEngine"
2024-03-18T03:13:18.521379047Z  INFO risingwave_connector_node: Waiting for PT0.001S for connector to stop thread="Thread-1" class="io.debezium.embedded.EmbeddedEngine"
2024-03-18T03:13:18.522470024Z  INFO risingwave_connector_node: Stopping the task and engine thread="rw-dbz-engine-runner-2" class="io.debezium.embedded.EmbeddedEngine"
2024-03-18T03:13:18.522535726Z  INFO risingwave_connector_node: Stopping down connector thread="rw-dbz-engine-runner-2" class="io.debezium.connector.common.BaseSourceTask"
2024-03-18T03:13:18.522678066Z  INFO risingwave_connector_node: engine#2 terminated thread="Thread-1" class="com.risingwave.connector.source.core.DbzCdcEngineRunner"
2024-03-18T03:13:18.522723105Z  INFO risingwave_connector::source::cdc::source::reader: end of jni call runJniDbzSourceThread source_id=2
2024-03-18T03:13:18.52456753Z ERROR risingwave_connector_node: Interrupted while stopping coordinator: java.lang.InterruptedException
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2109)
	at java.base/java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1454)
	at java.base/java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:709)
	at io.debezium.pipeline.ChangeEventSourceCoordinator.stop(ChangeEventSourceCoordinator.java:308)
	at io.debezium.connector.common.BaseSourceTask.stop(BaseSourceTask.java:289)
	at io.debezium.connector.common.BaseSourceTask.stop(BaseSourceTask.java:273)
	at io.debezium.embedded.EmbeddedEngine.stopTaskAndCommitOffset(EmbeddedEngine.java:1047)
	at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:759)
	at io.debezium.embedded.ConvertingEngineBuilder$2.run(ConvertingEngineBuilder.java:229)
	at com.risingwave.connector.source.core.DbzCdcEngine.run(DbzCdcEngine.java:64)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
 thread="rw-dbz-engine-runner-2" class="io.debezium.connector.common.BaseSourceTask"
2024-03-18T03:13:18.525471586Z ERROR risingwave_connector_node: engine#2 terminated with error. message: Error while trying to stop the task and commit the offsets: org.apache.kafka.connect.errors.ConnectException: Interrupted while stopping coordinator, failing the task
	at io.debezium.connector.common.BaseSourceTask.stop(BaseSourceTask.java:296)
	at io.debezium.connector.common.BaseSourceTask.stop(BaseSourceTask.java:273)
	at io.debezium.embedded.EmbeddedEngine.stopTaskAndCommitOffset(EmbeddedEngine.java:1047)
	at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:759)
	at io.debezium.embedded.ConvertingEngineBuilder$2.run(ConvertingEngineBuilder.java:229)
	at com.risingwave.connector.source.core.DbzCdcEngine.run(DbzCdcEngine.java:64)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
 thread="rw-dbz-engine-runner-2" class="com.risingwave.connector.source.core.DbzCdcEngineRunner"

These logs are expected when the connector is being drop.

I remembered the ALTER command will drop the old stream job, from the log of CN, the cdc connector indeed detected the channel is broken and it will wait a while for the exit of connector (Waiting for PT0.001S for connector to stop). In the mean time, we will call executor.shutdownNow() to interrupt the connector thread.

We may find ways to eliminate these logs.

@StrikeW StrikeW modified the milestones: release-1.9, release-1.10 May 13, 2024
@StrikeW StrikeW closed this as completed Jul 8, 2024
@StrikeW StrikeW closed this as not planned Won't fix, can't repro, duplicate, stale Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants