-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
driver reports connection errors for decommissioned nodes and a delay for cassandra-stress, resulted with: "Command did not complete within 12870 seconds!" #401
Comments
Scylla version: Kernel Version: Issue descriptionReproduced again. Cassandra stress test tries to connect to decommissioned nodes.
This list contains just 3 nodes for illustration—more nodes terminated before connection. Please take a look at the logs for more details. OS / Image: Test:
|
Packages
Scylla version:
6.3.0~dev-20241217.01cdba9a9894
with build-idf5cdbc08a2634f6f378e901fbb10a27fc164783e
Kernel Version:
6.8.0-1018-azure
Issue description
Describe your issue in detail and steps it took to produce it.
Run a 3 hours longevity on Azure.
The node
longevity-10gb-3h-master-db-node-413a3a9b-eastus-2
was decommissioned at 2:About 2 hours later, 2 nemesis that issues a rolling restart of all cluster nodes are executed:
Then c-s got errors for node-2 (that was decommissioned ~ 2 hours ago) like:
Multiple connection errors (that might be expected) are reported for some other nodes as well, during these rolling restarts:
Another unclear connection issues are found for node-10 that was decommissioned at 03:22 -
node-10 removal:
connection errors:
node-10 private address is also reported long afterwards:
The cassandra-stress eventually failed the test for an error of:
Impact
Describe the impact this issue causes to the user.
How frequently does it reproduce?
Describe the frequency with how this issue can be reproduced.
Installation details
Cluster size: 6 nodes (Standard_L8s_v3)
Scylla Nodes used in this run:
OS / Image:
/subscriptions/6c268694-47ab-43ab-b306-3c5514bc4112/resourceGroups/scylla-images/providers/Microsoft.Compute/images/scylla-6.3.0-dev-x86_64-2024-12-18T02-02-40
(azure: undefined_region)Test:
longevity-10gb-3h-azure-test
Test id:
413a3a9b-fe7b-4e5e-b864-6f1f26628226
Test name:
scylla-master/longevity/longevity-10gb-3h-azure-test
Test method:
longevity_test.LongevityTest.test_custom_time
Test config file(s):
Logs and commands
$ hydra investigate show-monitor 413a3a9b-fe7b-4e5e-b864-6f1f26628226
$ hydra investigate show-logs 413a3a9b-fe7b-4e5e-b864-6f1f26628226
Logs:
Jenkins job URL
Argus
The text was updated successfully, but these errors were encountered: