-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow more than 1 PgBouncer replicas #622
Comments
@low-on-mana is this really safe to use in an Airflow environment? I was wondering about the same actually, to have some kind of backup if one PgBouncer replica fails (during k8s node patching or whatever). Official chart also uses a hardcoded replicas: 1. I've tried to understand how can multiple PgBouncer replicas affect the deployment (connections to DB etc.) but didn't find any suitable links, tutorials, nothing.. explaining this multi-replica PgBouncer thing. Would it also require to customize values such as maxClientConnections and poolSize? E.g. you set replicas to 3 then you would need to customize these values accordingly (divide by 3?). Anyone who has any experience in this? |
This issue has been automatically marked as stale because it has not had activity in 60 days. Thank you for your contributions. Issues never become stale if any of the following is true:
|
@low-on-mana @jurovee I agree that having multiple PgBouncer replicas would be (in theory) great for redundancy, especially during node outages/upgrades, the problem is that any disruption to the database connection during a transaction will result in airflow raising an error, which I doubt airflow will gracefully recover from. (NOTE: airflow uses SQLAlchemy in "pessimistic" pooling mode with the pre-ping approach, which can't handle mid-transaction failures) That is to say, more PgBouncer replicas actually increases the possiblity of airflow trying to use a connection to a PgBouncer Pod that is no longer active (and crashing as a result). We would need to investigate getting airflow to use a different SQLAlchemy pooling mode (to allow mid-transaction failures to be resolved gracefully) before we can increase PgBouncer replicas. |
@thesuperzapper Forgive me but why do you say higher "PgBouncer replicas actually increases the possibility of airflow trying to use a[n inactive] connection?" I'm chasing HA on this particular component also, and want to understand the risk you're describing. |
This issue has been automatically marked as stale because it has not had activity in 60 days. Thank you for your contributions. Issues never become stale if any of the following is true:
|
Checks
User-Community Airflow Helm Chart
.Chart Version
latest
Kubernetes Version
Helm Version
Description
We are using the latest version of this chart in production for airflow 2.3.0 ( we did this migration few days back ).
One of the issues we faced is related to pgbouncer.
What happened was K8 rescheduled the pgbouncer pod to another node, since there is only 1 pod running we had one task failure which we had to retry manually later.
We can have safe_to_evict false or pod disruption budget as another solution but best would be to make pgbouncer HA by using multi pods.
Can we have 2 pods for HA pgbouncer ?
charts/charts/airflow/templates/pgbouncer/pgbouncer-deployment.yaml
Line 24 in 420eae2
Relevant Logs
No response
Custom Helm Values
No response
The text was updated successfully, but these errors were encountered: