Feature request: try different zones if the one specified in config does not have enough resources #335

vlad-ivanov-name · 2022-06-10T12:21:47Z

What feature do you want to see added?

Hello,

would it be possible to have an option to use multiple zones for launching agents? Right now if a zone is exhausted, the plugin will not handle it so well: there will be many agents created (above the limit set in config) but they will fail to start, and while deleting them works, it also triggers an exception within Jenkins. Here's a log snippet:

2022-06-10 10:50:47.296+0000 [id=4267]	INFO	c.g.j.p.c.ComputeEngineComputerLauncher#launch: Launch failed while waiting for operation operation-1654858230641-5e115b4fdb4d9-33d8d6e8-ec232c0f to complete. Operation error was The zone 'projects/censored/zones/us-west1-b' does not have enough resources available to fulfill the request.  '(resource type:compute)'.

It would be good to have an option to try different zones from a preconfigured set if one zone doesn't have enough resources.

Exception when deleting an agent that failed to start

2022-06-10 10:50:15.593+0000 [id=3700]	WARNING	h.i.i.InstallUncaughtExceptionHandler#handleException: Caught unhandled exception with ID 95d9316f-d4ab-4107-a055-c25f7c2f40a8
com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
{
  "code" : 404,
  "errors" : [ {
    "domain" : "global",
    "message" : "The resource 'projects/censored/zones/us-west1-b/instances/jenkins-agent-dynamic-jtiut7' was not found",
    "reason" : "notFound"
  } ],
  "message" : "The resource 'projects/censored/zones/us-west1-b/instances/jenkins-agent-dynamic-jtiut7' was not found"
}
	at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:150)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1056)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
	at com.google.cloud.graphite.platforms.plugin.client.ComputeWrapper.deleteInstance(ComputeWrapper.java:116)
	at com.google.cloud.graphite.platforms.plugin.client.ComputeClient.terminateInstanceAsync(ComputeClient.java:323)
	at com.google.jenkins.plugins.computeengine.ComputeEngineInstance._terminate(ComputeEngineInstance.java:136)
	at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:88)
	at com.google.jenkins.plugins.computeengine.ComputeEngineComputer.doDoDelete(ComputeEngineComputer.java:181)
	at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710)
	at org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:398)
	at org.kohsuke.stapler.Function$InstanceFunction.invoke(Function.java:410)
	at org.kohsuke.stapler.interceptor.RequirePOST$Processor.invoke(RequirePOST.java:78)
	at org.kohsuke.stapler.PreInvokeInterceptedFunction.invoke(PreInvokeInterceptedFunction.java:26)
	at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:208)
...

Upstream changes

No response

The text was updated successfully, but these errors were encountered:

yarinkos · 2022-06-12T10:19:39Z

@vlad-ivanov-name, Is this another root cause besides the GCP compute bug? Could it be a quote issue?
It might be a running issue (https://stackoverflow.com/questions/52684656/the-zone-does-not-have-enough-resources-available-to-fulfill-the-request-the-re).
I was just thinking out loud

vlad-ivanov-name · 2022-06-12T10:51:14Z

Yeah I checked the quota -- that's not it. I don't think it's a bug per se, it's just the instance Jenkins was trying to spin up needs a GPU and it's common to see GPU resources exhausted within a particular zone.

craigwatson · 2022-06-14T12:30:46Z

We also saw similar issues in the London region at roughly the same time - this was indeed caused by resource exhaustion inside GCP, and not by any project quotas.

Ideally, the plug-in would catch this failure and continually retry (potentially with some kind of exponential back off?) until it was successful.

Alternatively, the plug-in could use Instance Groups to keep track of the pool of VMs, although I imagine that would involve a fair amount of work internally as the pool management logic would change.

spiegelm · 2023-04-19T16:07:29Z

We would love to have this feature. About each week our builds get stuck with ZONE_RESOURCE_POOL_EXHAUSTED in some zone of europe-west1 and require manual intervention to unblock pull requests or releases.

I guess some cases could already be fixed by round-robin over all zones of this region. Using a specific zone is not an issue for our builds, any zone would be sufficient.

YOO629 · 2024-11-13T04:06:16Z

+1 would love to have this feature

vlad-ivanov-name added the enhancement New feature or request label Jun 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: try different zones if the one specified in config does not have enough resources #335

Feature request: try different zones if the one specified in config does not have enough resources #335

vlad-ivanov-name commented Jun 10, 2022 •

edited

Loading

yarinkos commented Jun 12, 2022

vlad-ivanov-name commented Jun 12, 2022

craigwatson commented Jun 14, 2022

spiegelm commented Apr 19, 2023

YOO629 commented Nov 13, 2024

Feature request: try different zones if the one specified in config does not have enough resources #335

Feature request: try different zones if the one specified in config does not have enough resources #335

Comments

vlad-ivanov-name commented Jun 10, 2022 • edited Loading

What feature do you want to see added?

Upstream changes

yarinkos commented Jun 12, 2022

vlad-ivanov-name commented Jun 12, 2022

craigwatson commented Jun 14, 2022

spiegelm commented Apr 19, 2023

YOO629 commented Nov 13, 2024

vlad-ivanov-name commented Jun 10, 2022 •

edited

Loading