fix: use rabbit quorum queues in lieu of ha #294

LukeRepko · 2024-06-04T20:53:19Z

We define the use of quorum queues via kustomize as the default queue type for the named vhosts, but the oslo_messaging_rabbit config opt of rabbit_ha_queues: true was set, taking precedence. We actually do not want to use HA (mirrored) queues, as they are being deprecated, and will be removed in newer versions of RMQ (4.x being released EOY 2024). The use of HA queues in genestack up to this point was the result of sane but no longer ideal defaults set by openstack-helm that were carried forth.

This explicitly disables rabbit_ha_queues, and then enables rabbit_quorum_queue. Removing the related rabbit vhost is required for this change prior to re-deploying a given openstack service.

Example of re-deploying nova when making this change; note how we remove the queue, vhost, and user:

kubectl -n openstack delete queues.rabbitmq.com nova-queue
kubectl -n openstack delete vhosts.rabbitmq.com nova-vhost
kubectl -n openstack delete users.rabbitmq.com nova
helm --upgrade install nova ./nova

NOTE: Several helm upgrades may be required due to a race condition with the operator removing the vhost. Uninstalling first may be easier, but do so carefully.

Other changes:

add: rabbit_transient_quorum_queue which is newly availably in 2024.1. We will want to begin using this to make transient queues reliable
add: use_queue_manager which is newly available in 2024.1 We will want to begin using this when available to de-obfuscate named queues in rabbit
add: rabbit_interval_max to reconnect faster after a node outage
fix: send heartbeats more frequently; clients should mark a given node as down about 30s more quickly (default was 60s)
fix: set kombu_reconnect_delay lower to help avoid multiple code paths not being traversed when a RMQ node goes down

We define the use of quorum queues via kustomize as the default queue type for the named vhosts, but the oslo_messaging_rabbit config opt of `rabbit_ha_queues: true` was set, taking precedence. We actually do not want to use HA queues, as they are being deprecated, and will be removed in newer versions of RMQ (4.x being released EOY 2024). The use of HA queues in genestack up to this point was the result of sane but no longer ideal defaults set by openstack-helm that were carried forth. This explicitly disables rabbit_ha_queues, and then enables rabbit_quorum_queue. Removing the related rabbit vhost is required for this change prior to re-deploying a given openstack service. Example of re-deploying nova when making this change; note how we remove the queue, vhost, and user: ``` kubectl -n openstack delete queues.rabbitmq.com nova-queue kubectl -n openstack delete vhosts.rabbitmq.com nova-vhost kubectl -n openstack delete users.rabbitmq.com nova helm --upgrade install nova ./nova ``` **NOTE**: Several helm upgrades may be required due to a race condition with the operator removing the vhost. Uninstalling first may be easier, but do so carefully. Other changes: - add: `rabbit_transient_quorum_queue` which is newly availably in 2024.1. We will want to begin using this to make transient queues reliable - add: `use_queue_manager` which is newly available in 2024.1 We will want to begin using this when available to de-obfuscate named queues in rabbit - add: `rabbit_interval_max` to reconnect faster after a node outage - fix: send heartbeats more frequently; clients should mark a given node as down about 30s more quickly (default was 60s) - fix: set `kombu_reconnect_delay` lower to help avoid multiple code paths not being traversed when a RMQ node goes down

LukeRepko marked this pull request as draft June 4, 2024 20:53

LukeRepko marked this pull request as ready for review June 4, 2024 21:16

LukeRepko force-pushed the sjcdev branch from 136b749 to 05926c0 Compare June 4, 2024 21:22

LukeRepko requested review from cloudnull and sulochan June 4, 2024 21:32

cloudnull approved these changes Jun 4, 2024

View reviewed changes

cloudnull merged commit ec0ff9e into rackerlabs:main Jun 4, 2024
13 checks passed

LukeRepko deleted the sjcdev branch September 5, 2024 14:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use rabbit quorum queues in lieu of ha #294

fix: use rabbit quorum queues in lieu of ha #294

LukeRepko commented Jun 4, 2024 •

edited

Loading

fix: use rabbit quorum queues in lieu of ha #294

fix: use rabbit quorum queues in lieu of ha #294

Conversation

LukeRepko commented Jun 4, 2024 • edited Loading

LukeRepko commented Jun 4, 2024 •

edited

Loading