Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust node pools in data-processing clusters #330

Open
gfr10598 opened this issue Apr 6, 2021 · 6 comments
Open

Adjust node pools in data-processing clusters #330

gfr10598 opened this issue Apr 6, 2021 · 6 comments
Labels
bug Something isn't working pipeline

Comments

@gfr10598
Copy link
Contributor

gfr10598 commented Apr 6, 2021

The most recent change to etl k8s configs failed to launch pods in staging, because I had set the 8 core node-pool to 0 instances.

Also, in all projects, the utilization is low, because there are more nodes than needed for the requested pods.

All the pools should be updated to use appropriate auto-scaling configs.

@autolabel autolabel bot added the review/triage Team should review and assign priority label Apr 6, 2021
@gfr10598
Copy link
Contributor Author

gfr10598 commented Apr 6, 2021

Today I am:
changing sandbox default pool to allow 1 node per zone, and remove zone us-east1-d
changing staging 8 core parser-pool1 to allow 0-2 nodes per zone.
changing staging 4 core parser-pool to allow 0-1 nodes per zone.

Tomorrow, I intend to change prod to set up auto-scaling for parser, default, and gardener pools.

@gfr10598
Copy link
Contributor Author

gfr10598 commented Apr 6, 2021

removing zone us-east1-d from the default pool in mlab-sandbox made gardener unschedulable, because of persistent volume location. Restored us-east1-d, and adjusted auto-scaling to allow 1-2 per zone.

Later changed to 0-2 per zone

@gfr10598
Copy link
Contributor Author

gfr10598 commented Apr 6, 2021

Subsequently updated mlab-sandbox parser-pool1 to allow 0-2 nodes per zone as well.

@laiyi-ohlsen laiyi-ohlsen added bug Something isn't working pipeline and removed review/triage Team should review and assign priority labels Apr 12, 2021
@laiyi-ohlsen
Copy link

@gfr10598 can you enumerate the steps that are needed before the next gardener release to production?

@gfr10598
Copy link
Contributor Author

I just deleted the mlab-staging parser-pool1 node pool from data-processing cluster. K8S had restarted parsers, and they were instantiated in the wrong node pool. Deleting the pool should prevent this happening again.

@gfr10598
Copy link
Contributor Author

See m-lab/etl#985 related to propagating errors from etl to gardener.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pipeline
Projects
None yet
Development

No branches or pull requests

2 participants