You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
would it be possible to have an option to use multiple zones for launching agents? Right now if a zone is exhausted, the plugin will not handle it so well: there will be many agents created (above the limit set in config) but they will fail to start, and while deleting them works, it also triggers an exception within Jenkins. Here's a log snippet:
2022-06-10 10:50:47.296+0000 [id=4267] INFO c.g.j.p.c.ComputeEngineComputerLauncher#launch: Launch failed while waiting for operation operation-1654858230641-5e115b4fdb4d9-33d8d6e8-ec232c0f to complete. Operation error was The zone 'projects/censored/zones/us-west1-b' does not have enough resources available to fulfill the request. '(resource type:compute)'.
It would be good to have an option to try different zones from a preconfigured set if one zone doesn't have enough resources.
Exception when deleting an agent that failed to start
2022-06-10 10:50:15.593+0000 [id=3700] WARNING h.i.i.InstallUncaughtExceptionHandler#handleException: Caught unhandled exception with ID 95d9316f-d4ab-4107-a055-c25f7c2f40a8
com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
{
"code" : 404,
"errors" : [ {
"domain" : "global",
"message" : "The resource 'projects/censored/zones/us-west1-b/instances/jenkins-agent-dynamic-jtiut7' was not found",
"reason" : "notFound"
} ],
"message" : "The resource 'projects/censored/zones/us-west1-b/instances/jenkins-agent-dynamic-jtiut7' was not found"
}
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:150)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1056)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.graphite.platforms.plugin.client.ComputeWrapper.deleteInstance(ComputeWrapper.java:116)
at com.google.cloud.graphite.platforms.plugin.client.ComputeClient.terminateInstanceAsync(ComputeClient.java:323)
at com.google.jenkins.plugins.computeengine.ComputeEngineInstance._terminate(ComputeEngineInstance.java:136)
at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:88)
at com.google.jenkins.plugins.computeengine.ComputeEngineComputer.doDoDelete(ComputeEngineComputer.java:181)
at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710)
at org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:398)
at org.kohsuke.stapler.Function$InstanceFunction.invoke(Function.java:410)
at org.kohsuke.stapler.interceptor.RequirePOST$Processor.invoke(RequirePOST.java:78)
at org.kohsuke.stapler.PreInvokeInterceptedFunction.invoke(PreInvokeInterceptedFunction.java:26)
at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:208)
...
Upstream changes
No response
The text was updated successfully, but these errors were encountered:
Yeah I checked the quota -- that's not it. I don't think it's a bug per se, it's just the instance Jenkins was trying to spin up needs a GPU and it's common to see GPU resources exhausted within a particular zone.
We also saw similar issues in the London region at roughly the same time - this was indeed caused by resource exhaustion inside GCP, and not by any project quotas.
Ideally, the plug-in would catch this failure and continually retry (potentially with some kind of exponential back off?) until it was successful.
Alternatively, the plug-in could use Instance Groups to keep track of the pool of VMs, although I imagine that would involve a fair amount of work internally as the pool management logic would change.
We would love to have this feature. About each week our builds get stuck with ZONE_RESOURCE_POOL_EXHAUSTED in some zone of europe-west1 and require manual intervention to unblock pull requests or releases.
I guess some cases could already be fixed by round-robin over all zones of this region. Using a specific zone is not an issue for our builds, any zone would be sufficient.
What feature do you want to see added?
Hello,
would it be possible to have an option to use multiple zones for launching agents? Right now if a zone is exhausted, the plugin will not handle it so well: there will be many agents created (above the limit set in config) but they will fail to start, and while deleting them works, it also triggers an exception within Jenkins. Here's a log snippet:
It would be good to have an option to try different zones from a preconfigured set if one zone doesn't have enough resources.
Exception when deleting an agent that failed to start
Upstream changes
No response
The text was updated successfully, but these errors were encountered: