Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spotVM and maxRunDuration feature to VM provisioning #492

Open
wants to merge 20 commits into
base: develop
Choose a base branch
from

Conversation

gbhat618
Copy link

@gbhat618 gbhat618 commented Dec 11, 2024

Summary

This PR aims to enhance the VM provisioning capabilities of the google-compute-engine-plugin by introducing support for Spot VMs and maxRunDuration. This effort seeks to address the issues reported in #408, #473, and #358.

Enhancements

  • Spot VM Support: Incorporates the option to use Spot VMs, providing a cost-effective and efficient alternative to preemptible VMs without the automatic deletion after 24 hours.
  • maxRunDuration: Implements the maxRunDuration feature, enabling the automatic deletion of VMs after a specified duration to optimize resource utilization and cost.

Rationale

Although this plugin already has a background cleanup job for leftover orphan agent VMs, setting the maxRunDuration ensures that VMs deletion still happens even if the Jenkins controller itself has crashed. Besides this is an optional setting with no defaults enforced, so this feature is not trying to replace/interfere with the existing background job.

Technical Background

The plugin now supports the following VM options:

  • Standard VMs: Regular VM instances with guaranteed availability by GCP SLA and standard pricing. Also supports maxRunDuration.
  • Spot VMs: An updated version of Preemptible VMs, offering exactly same cost benefits without the restriction of automatic deletion after 24 hours.
  • Preemptible VMs: Cost-effective VM instances with limited availability, reclaimed by GCE after 24 hours.
Why Maintain Support for Both Spot and Preemptible VMs?

Despite the similarities between Spot and Preemptible VMs, retaining support for both options is essential for the following reasons:

  • Non-Deprecation by GCP: Google Cloud Platform does not plan to deprecate Preemptible VMs.
  • Backward Compatibility: Many Jenkins controllers are configured to use Preemptible VMs. Dropping support will disrupt existing setups.
  • Clear Differentiation: Retaining both options avoids confusion and simplifies the user experience, requiring no changes upon plugin upgrade.

Testing Status

Manual Tests: Screen recording and screenshots

User experience configuration page

provisioning_type_impl_2.mov

VM Provisioned in GCP

Description Content
Spot VM provisioned with max run duration 1200s
Standard VM provisioned with max run duration 600s
Preemptible VM as before

→ Unit tests for the scheduling() method, ensuring the compatibility with the old and new configurations.

New integration test for the Spot VM with Max Duration: logs
[INFO] --- surefire:3.2.2:test (default-test) @ google-compute-engine ---
[INFO] Using auto detected provider org.apache.maven.surefire.junitplatform.JUnitPlatformProvider
[INFO]
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running com.google.jenkins.plugins.computeengine.integration.SpotVmProvisioningWithMaxRunDurationCasCIT
=== Starting com.google.jenkins.plugins.computeengine.integration.SpotVmProvisioningWithMaxRunDurationCasCIT
   0.042 [id=20]	INFO	o.jvnet.hudson.test.WarExploder#explode: Exploding /Users/gbhat/.m2/repository/org/jenkins-ci/main/jenkins-war/2.452.3/jenkins-war-2.452.3.war into /Users/gbhat/CBProjects/google-compute-engine-plugin/target/jenkins-for-test
   1.657 [id=20]	INFO	o.jvnet.hudson.test.JenkinsRule#createWebServer: Running on http://localhost:50634/jenkins/
   1.861 [id=35]	INFO	jenkins.InitReactorRunner$1#onAttained: Started initialization
   2.403 [id=36]	WARNING	hudson.ClassicPluginStrategy#createClassJarFromWebInfClasses: Created /var/folders/80/w8mmdy513sb3y9fp5fr7vvqc0000gn/T/jenkins10513192598629104995/plugins/powershell/WEB-INF/lib/classes.jar; update plugin to a version created with a newer harness
   2.416 [id=36]	INFO	hudson.PluginManager#considerDetachedPlugin: Loading a detached plugin as a dependency: /var/folders/80/w8mmdy513sb3y9fp5fr7vvqc0000gn/T/jenkins10513192598629104995/plugins/jakarta-mail-api.jpi
   2.426 [id=36]	INFO	hudson.PluginManager#considerDetachedPlugin: Loading a detached plugin as a dependency: /var/folders/80/w8mmdy513sb3y9fp5fr7vvqc0000gn/T/jenkins10513192598629104995/plugins/mailer.jpi
   2.430 [id=36]	INFO	hudson.PluginManager#considerDetachedPlugin: Loading a detached plugin as a dependency: /var/folders/80/w8mmdy513sb3y9fp5fr7vvqc0000gn/T/jenkins10513192598629104995/plugins/jakarta-activation-api.jpi
   2.434 [id=36]	INFO	hudson.PluginManager#considerDetachedPlugin: Loading a detached plugin as a dependency: /var/folders/80/w8mmdy513sb3y9fp5fr7vvqc0000gn/T/jenkins10513192598629104995/plugins/display-url-api.jpi
   2.437 [id=36]	INFO	hudson.PluginManager#considerDetachedPlugin: Loading a detached plugin as a dependency: /var/folders/80/w8mmdy513sb3y9fp5fr7vvqc0000gn/T/jenkins10513192598629104995/plugins/instance-identity.jpi
   2.443 [id=36]	INFO	hudson.PluginManager#considerDetachedPlugin: Loading a detached plugin as a dependency: /var/folders/80/w8mmdy513sb3y9fp5fr7vvqc0000gn/T/jenkins10513192598629104995/plugins/matrix-auth.jpi
   2.460 [id=36]	INFO	hudson.PluginManager#considerDetachedPlugin: Loading a detached plugin as a dependency: /var/folders/80/w8mmdy513sb3y9fp5fr7vvqc0000gn/T/jenkins10513192598629104995/plugins/echarts-api.jpi
   2.684 [id=36]	INFO	hudson.PluginManager#considerDetachedPlugin: Loading a detached plugin as a dependency: /var/folders/80/w8mmdy513sb3y9fp5fr7vvqc0000gn/T/jenkins10513192598629104995/plugins/junit.jpi
   2.690 [id=36]	INFO	hudson.PluginManager#considerDetachedPlugin: Loading a detached plugin as a dependency: /var/folders/80/w8mmdy513sb3y9fp5fr7vvqc0000gn/T/jenkins10513192598629104995/plugins/jquery3-api.jpi
   2.697 [id=36]	INFO	hudson.PluginManager#considerDetachedPlugin: Loading a detached plugin as a dependency: /var/folders/80/w8mmdy513sb3y9fp5fr7vvqc0000gn/T/jenkins10513192598629104995/plugins/matrix-project.jpi
   2.707 [id=36]	INFO	hudson.PluginManager#considerDetachedPlugin: Loading a detached plugin as a dependency: /var/folders/80/w8mmdy513sb3y9fp5fr7vvqc0000gn/T/jenkins10513192598629104995/plugins/checks-api.jpi
   2.713 [id=36]	INFO	hudson.PluginManager#considerDetachedPlugin: Loading a detached plugin as a dependency: /var/folders/80/w8mmdy513sb3y9fp5fr7vvqc0000gn/T/jenkins10513192598629104995/plugins/command-launcher.jpi
   2.718 [id=36]	INFO	hudson.PluginManager#considerDetachedPlugin: Loading a detached plugin as a dependency: /var/folders/80/w8mmdy513sb3y9fp5fr7vvqc0000gn/T/jenkins10513192598629104995/plugins/jdk-tool.jpi
   2.726 [id=36]	INFO	hudson.PluginManager#considerDetachedPlugin: Loading a detached plugin as a dependency: /var/folders/80/w8mmdy513sb3y9fp5fr7vvqc0000gn/T/jenkins10513192598629104995/plugins/mina-sshd-api-common.jpi
   2.737 [id=36]	INFO	hudson.PluginManager#considerDetachedPlugin: Loading a detached plugin as a dependency: /var/folders/80/w8mmdy513sb3y9fp5fr7vvqc0000gn/T/jenkins10513192598629104995/plugins/mina-sshd-api-core.jpi
   2.748 [id=36]	INFO	hudson.PluginManager#considerDetachedPlugin: Loading a detached plugin as a dependency: /var/folders/80/w8mmdy513sb3y9fp5fr7vvqc0000gn/T/jenkins10513192598629104995/plugins/sshd.jpi
   2.755 [id=36]	INFO	hudson.PluginManager#considerDetachedPlugin: Loading a detached plugin as a dependency: /var/folders/80/w8mmdy513sb3y9fp5fr7vvqc0000gn/T/jenkins10513192598629104995/plugins/javax-mail-api.jpi
   3.003 [id=49]	INFO	jenkins.InitReactorRunner$1#onAttained: Listed all plugins
   3.010 [id=36]	INFO	j.b.api.BouncyCastlePlugin#start: /Users/gbhat/CBProjects/google-compute-engine-plugin/target/tmp/j h15818329323505612236/plugins/bouncycastle-api/WEB-INF/optional-lib not found; for non RealJenkinsRule this is fine and can be ignored.
   3.779 [id=46]	INFO	jenkins.InitReactorRunner$1#onAttained: Prepared all plugins
   3.784 [id=46]	INFO	jenkins.InitReactorRunner$1#onAttained: Started all plugins
   3.784 [id=42]	INFO	jenkins.InitReactorRunner$1#onAttained: Augmented all extensions
   4.229 [id=35]	INFO	jenkins.InitReactorRunner$1#onAttained: System config loaded
   4.258 [id=35]	INFO	jenkins.InitReactorRunner$1#onAttained: System config adapted
   4.259 [id=35]	INFO	jenkins.InitReactorRunner$1#onAttained: Loaded all jobs
   4.259 [id=50]	INFO	jenkins.InitReactorRunner$1#onAttained: Configuration for all jobs updated
   4.328 [id=54]	INFO	jenkins.InitReactorRunner$1#onAttained: Completed initialization
   4.373 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#init: init
  53.506 [id=20]	WARNING	i.j.p.casc.BaseConfigurator#createAttribute: Can't handle class com.google.jenkins.plugins.computeengine.InstanceConfiguration#sshKeyCredential: type is abstract but not Describable.
  53.515 [id=20]	WARNING	i.j.p.casc.BaseConfigurator#createAttribute: Can't handle class com.google.jenkins.plugins.computeengine.InstanceConfiguration#sshKeyCredential: type is abstract but not Describable.
  53.528 [id=20]	INFO	c.g.j.p.c.ComputeEngineCloud#provision: Provisioning node from configs [com.google.jenkins.plugins.computeengine.InstanceConfiguration@500d3818] for excess workload of 1 units of label 'integration'
  54.750 [id=20]	INFO	c.g.j.p.c.ComputeEngineCloud#availableNodeCapacity: Found capacity for 10 nodes in cloud integration
  54.753 [id=20]	INFO	c.g.j.p.c.InstanceConfiguration#instance: User selected to use an autogenerated ssh key pair
  55.995 [id=20]	INFO	c.g.j.p.c.InstanceConfiguration#provision: Sent insert request for instance configuration [max-run-duration]
  56.015 [id=115]	INFO	c.g.j.p.c.ComputeEngineComputerLauncher#launch: Launch will wait 300000 for operation operation-1734367070702-62965cd28cc13-7d493207-2d2aa9bc to complete...
  56.030 [id=118]	INFO	c.g.j.p.c.ComputeEngineCloud#lambda$getPlannedNodeFuture$0: Waiting 300000ms for node max-run-duration-xvylg5 to connect
  64.385 [id=115]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Launching instance: max-run-duration-xvylg5
  64.386 [id=115]	INFO	c.g.j.p.c.ComputeEngineCloud#log: bootstrap
  64.386 [id=115]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Getting keypair...
  64.387 [id=115]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Using autogenerated ssh keypair
  64.387 [id=115]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Authenticating as jenkins
  65.869 [id=115]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Connecting to 34.73.37.82 on port 22, with timeout 10000.
  75.898 [id=115]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Failed to connect via ssh: The kexTimeout (10000 ms) expired.
  75.906 [id=115]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Waiting for SSH to come up. Sleeping 5.
  82.686 [id=115]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Connecting to 34.73.37.82 on port 22, with timeout 10000.
  83.238 [id=115]	WARNING	c.g.j.p.c.ComputeEngineCloud#log: An error occured: There was a problem while connecting to 34.73.37.82:22
  89.782 [id=115]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Connecting to 34.73.37.82 on port 22, with timeout 10000.
  92.775 [id=115]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Connected via SSH.
  94.275 [id=115]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Verifying: /usr/bin/java -fullversion
  96.132 [id=115]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Copying agent.jar to: /tmp
 106.796 [id=115]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Launching Jenkins agent via plugin SSH: /usr/bin/java -jar /tmp/agent.jar
 142.183 [id=115]	INFO	c.g.j.p.c.ComputeEngineComputer#onConnected: Instance max-run-duration-xvylg5 is preemptive, setting up preemption listener
 142.202 [id=118]	INFO	c.g.j.p.c.ComputeEngineCloud#lambda$getPlannedNodeFuture$0: 86172ms elapsed waiting for node max-run-duration-xvylg5 to connect
 143.721 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#testMaxRunDurationDeletesAndNoNewBuilds: Instance: max-run-duration-xvylg5
 143.721 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#testMaxRunDurationDeletesAndNoNewBuilds: instance scheduling configs are correct
 233.283 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: Legacy code started this job.  No cause information is available
 233.283 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: Running as SYSTEM
 233.284 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: Building remotely on max-run-duration-xvylg5 (integration) in workspace /tmp/workspace/test0
 233.284 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: [test0] $ /bin/sh -xe /tmp/jenkins5711445835290691383.sh
 233.284 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: + echo hello world
 233.284 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: hello world
 233.284 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: Finished: SUCCESS
 233.284 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#testMaxRunDurationDeletesAndNoNewBuilds: first build completed onmax-run-duration-xvylg5
 285.657 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#testMaxRunDurationDeletesAndNoNewBuilds: proceeding to 2nd build, after no remaining instances
 294.272 [id=67]	INFO	c.g.j.p.c.ComputeEngineCloud#provision: Provisioning node from configs [com.google.jenkins.plugins.computeengine.InstanceConfiguration@500d3818] for excess workload of 1 units of label 'integration'
 295.271 [id=67]	INFO	c.g.j.p.c.ComputeEngineCloud#availableNodeCapacity: Found capacity for 10 nodes in cloud integration
 295.272 [id=67]	INFO	c.g.j.p.c.InstanceConfiguration#instance: User selected to use an autogenerated ssh key pair
 297.504 [id=67]	INFO	c.g.j.p.c.InstanceConfiguration#provision: Sent insert request for instance configuration [max-run-duration]
 297.507 [id=181]	INFO	c.g.j.p.c.ComputeEngineComputerLauncher#launch: Launch will wait 300000 for operation operation-1734367312193-62965db8da883-a37d2ee9-34784052 to complete...
 297.520 [id=67]	INFO	h.s.NodeProvisioner$StandardStrategyImpl#apply: Started provisioning max-run-duration-5jhvlc from gce-integration with 1 executors. Remaining excess workload: 0
 297.525 [id=184]	INFO	c.g.j.p.c.ComputeEngineCloud#lambda$getPlannedNodeFuture$0: Waiting 300000ms for node max-run-duration-5jhvlc to connect
 313.965 [id=181]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Launching instance: max-run-duration-5jhvlc
 313.965 [id=181]	INFO	c.g.j.p.c.ComputeEngineCloud#log: bootstrap
 313.966 [id=181]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Getting keypair...
 313.967 [id=181]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Using autogenerated ssh keypair
 313.968 [id=181]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Authenticating as jenkins
 315.521 [id=181]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Connecting to 35.229.64.102 on port 22, with timeout 10000.
 319.802 [id=181]	WARNING	c.g.j.p.c.ComputeEngineCloud#log: An error occured: There was a problem while connecting to 35.229.64.102:22
 326.321 [id=181]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Connecting to 35.229.64.102 on port 22, with timeout 10000.
 326.591 [id=181]	WARNING	c.g.j.p.c.ComputeEngineCloud#log: An error occured: There was a problem while connecting to 35.229.64.102:22
 333.316 [id=181]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Connecting to 35.229.64.102 on port 22, with timeout 10000.
 336.386 [id=181]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Connected via SSH.
 337.521 [id=181]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Verifying: /usr/bin/java -fullversion
 339.650 [id=181]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Copying agent.jar to: /tmp
 342.939 [id=181]	INFO	c.g.j.p.c.ComputeEngineCloud#log: Launching Jenkins agent via plugin SSH: /usr/bin/java -jar /tmp/agent.jar
 377.883 [id=181]	INFO	c.g.j.p.c.ComputeEngineComputer#onConnected: Instance max-run-duration-5jhvlc is preemptive, setting up preemption listener
 377.895 [id=184]	INFO	c.g.j.p.c.ComputeEngineCloud#lambda$getPlannedNodeFuture$0: 80375ms elapsed waiting for node max-run-duration-5jhvlc to connect
 478.854 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: Started
 478.854 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: [Pipeline] Start of Pipeline
 478.855 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: [Pipeline] node
 478.855 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: Still waiting to schedule task
 478.855 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: ‘max-run-duration-5jhvlc’ is offline
 478.855 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: Running on max-run-duration-5jhvlc in /tmp/workspace/p
 478.855 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: [Pipeline] {
 478.855 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: [Pipeline] sh
 478.855 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: + date
 478.855 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: Mon Dec 16 16:44:41 UTC 2024
 478.855 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: [Pipeline] }
 478.856 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: [Pipeline] // node
 478.856 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: [Pipeline] End of Pipeline
 478.856 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#printLogsAndReturnAgentName: Finished: SUCCESS
 478.856 [id=20]	INFO	c.g.j.p.c.i.SpotVmProvisioningWithMaxRunDurationCasCIT#testMaxRunDurationDeletesAndNoNewBuilds: second build completed on max-run-duration-5jhvlc
 478.980 [id=224]	INFO	hudson.remoting.Request$2#run: Failed to send back a reply to the request RPCRequest:hudson.remoting.RemoteClassLoader$IClassLoader.fetch3[java.lang.String](2): hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@8103fa6:max-run-duration-5jhvlc": channel is already closed
 479.131 [id=20]	WARNING	o.j.h.t.RemainingActivityListener#onTearDown: PlaceholderExecutable:ExecutorStepExecution.PlaceholderTask{label=max-run-duration-5jhvlc,context=CpsStepContext[3:node]:Owner[p/1:p #1]} still seems to be running, which could break deletion of log files or metadata
 479.150 [id=20]	INFO	hudson.lifecycle.Lifecycle#onStatusUpdate: Stopping Jenkins
 479.957 [id=184]	WARNING	o.j.p.w.s.s.ExecutorStepExecution$RemovedNodeListener#cancelOwnerExecution
java.io.IOException: cannot find current thread
    at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:295)
    at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:75)
    at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$RemovedNodeListener.cancelOwnerExecution(ExecutorStepExecution.java:380)
    at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$RemovedNodeListener.onDeleted(ExecutorStepExecution.java:355)
    at jenkins.model.NodeListener.lambda$fireOnDeleted$2(NodeListener.java:97)
    at jenkins.util.Listeners.lambda$notify$0(Listeners.java:59)
    at jenkins.util.Listeners.notify(Listeners.java:70)
    at jenkins.model.NodeListener.fireOnDeleted(NodeListener.java:97)
    at jenkins.model.Nodes.removeNode(Nodes.java:297)
    at jenkins.model.Jenkins.removeNode(Jenkins.java:2257)
    at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:91)
    at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.lambda$done$5(OnceRetentionStrategy.java:142)
    at hudson.model.Queue._withLock(Queue.java:1409)
    at hudson.model.Queue.withLock(Queue.java:1283)
    at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.lambda$done$6(OnceRetentionStrategy.java:137)
    at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
    at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
    at jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
 479.993 [id=20]	INFO	hudson.lifecycle.Lifecycle#onStatusUpdate: Jenkins stopped
 480.062 [id=20]	INFO	o.j.h.t.TemporaryDirectoryAllocator#dispose: deleting /Users/gbhat/CBProjects/google-compute-engine-plugin/target/tmp/j h15818329323505612236
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 481.2 s -- in com.google.jenkins.plugins.computeengine.integration.SpotVmProvisioningWithMaxRunDurationCasCIT
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  08:07 min
[INFO] Finished at: 2024-12-16T22:14:56+05:30
[INFO] ------------------------------------------------------------------------

(cloudbees internal ticket link for back reference)


Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

@gbhat618
Copy link
Author

👋 @jglick , @Vlatombe , @nevingeorgesunny

return "Spot VM";
}

public FormValidation doCheckMaxRunDurationSeconds(@QueryParameter String value) {

Check warning

Code scanning / Jenkins Security Scan

Stapler: Missing POST/RequirePOST annotation Warning

Potential CSRF vulnerability: If DescriptorImpl#doCheckMaxRunDurationSeconds connects to user-specified URLs, modifies state, or is expensive to run, it should be annotated with @POST or @RequirePOST
return "Spot VM";
}

public FormValidation doCheckMaxRunDurationSeconds(@QueryParameter String value) {

Check warning

Code scanning / Jenkins Security Scan

Stapler: Missing permission check Warning

Potential missing permission check in DescriptorImpl#doCheckMaxRunDurationSeconds
return "Standard";
}

public FormValidation doCheckMaxRunDurationSeconds(@QueryParameter String value) {

Check warning

Code scanning / Jenkins Security Scan

Stapler: Missing POST/RequirePOST annotation Warning

Potential CSRF vulnerability: If DescriptorImpl#doCheckMaxRunDurationSeconds connects to user-specified URLs, modifies state, or is expensive to run, it should be annotated with @POST or @RequirePOST
return "Standard";
}

public FormValidation doCheckMaxRunDurationSeconds(@QueryParameter String value) {

Check warning

Code scanning / Jenkins Security Scan

Stapler: Missing permission check Warning

Potential missing permission check in DescriptorImpl#doCheckMaxRunDurationSeconds
@gbhat618
Copy link
Author

gbhat618 commented Dec 12, 2024

recording and screenshots moved to the summary at the top ⬆️

@gbhat618
Copy link
Author

gbhat618 commented Dec 12, 2024

I will work on the testing as,

  • check and update automated tests (or write new tests)
  • Jcasc based testing
  • what happens to agents in jenkins if the vm got deleted in GCP due to maxRunDuration (maybe jenkins will mark agent as non-available - but will test and update the notes).

Edit: all these points are completed ✔️ (automated tests are written both unittests and integration test). When maxRunDuration deletes the VM in GCP side, the controller also deletes the slave, which is correct no issues for builds ✅ (already covered in the integration test)

@@ -58,7 +58,7 @@ public static void teardown() throws IOException {

@Test
public void testGetImage() throws Exception {
Image image = client.getImage("debian-cloud", "debian-9-stretch-v20180820");
Image image = client.getImage("debian-cloud", "debian-12-bookworm-v20241210");
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debian-9 is already deprecated, this is throwing 404 image not found errors,

➜  ~ gcloud compute images list --project=debian-cloud --no-standard-images --show-deprecated | grep debian-9-stretch-v20180820
debian-9-stretch-v20180820                         debian-cloud  debian-9            DEPRECATED  READY

? String.format(
"projects/%s/global/images/%s", BOOT_DISK_PROJECT_ID, System.getenv("GOOGLE_BOOT_DISK_IMAGE_NAME"))
: "projects/debian-cloud/global/images/family/debian-9";
: "projects/debian-cloud/global/images/family/debian-12";
Copy link
Author

@gbhat618 gbhat618 Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to support the GOOGLE_BOOT_DISK_PROJECT_ID and GOOGLE_BOOT_DISK_IMAGE_NAME even for linux too, because the base debian images don't have java installed.

See the new document file being added in this PR integration-tests.md

}
return null;
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test credentials are created using this constructor, where it is passing an empty id.
https://github.com/jenkinsci/google-oauth-plugin/blob/5367a8a1b9c97e14a5188150f897fa91da1d9777/src/main/java/com/google/jenkins/plugins/credentials/oauth/GoogleRobotPrivateKeyCredentials.java#L65-L70

which ends up at https://github.com/jenkinsci/credentials-plugin/blob/1e306e8f6c54f8eeef973b3387b8c223bd60ae1c/src/main/java/com/cloudbees/plugins/credentials/common/IdCredentials.java#L87-L96
thus having a random uuid.

seems like at somepoint the google-credential-plugin might have got this behavior (or sending blank id, instead of projectid).

As the integration tests in this repository are not run often, this never got brought up..

@gbhat618 gbhat618 marked this pull request as ready for review December 16, 2024 18:19
@gbhat618
Copy link
Author

👋 Completed the PR from draft to ready, plz help review @jglick , @Vlatombe , @nevingeorgesunny

@rsandell rsandell added the enhancement New feature or request label Dec 17, 2024
@rsandell
Copy link
Member

I am leaving my review as purely comments because I am going on vacation and don't want to block this while I'm off.

@jglick
Copy link
Member

jglick commented Dec 18, 2024

(@Artmorse has also worked on this plugin recently.)

@@ -66,8 +65,11 @@
*/
@Log
public class ComputeEngineCloudRestartPreemptedIT {
@ClassRule
public static Timeout timeout = new Timeout(20 * TEST_TIMEOUT_MULTIPLIER, TimeUnit.MINUTES);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the Timeout wasn't working, the JenkinsRule was still timing out at 180s, so fixing it with setting a property

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add @WithTimeout(1200) at test level.

@gbhat618 gbhat618 requested a review from rsandell December 18, 2024 14:26
@@ -236,6 +245,19 @@ public void setCreateSnapshot(boolean createSnapshot) {
this.createSnapshot = createSnapshot && this.oneShot;
}

/**
* Required for JCasC Compatibility,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JCasC never guarantees backwards compatibility, and users should be aware of that. IMO fix the test instead.


package com.google.jenkins.plugins.computeengine.ui.helpers;

public enum ProvisioningTypeValue {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an enum here kind of defeats the purpose of having extensibility. I think it is not needed anyway as it is only used for casc compatibility which is not usually kept as per Bobby. But just in case you face this use case in the future, I would recommend either adding a specific check method returning boolean on the base implementation, or a marker interface that PreemptibleVm could implement

@@ -590,6 +627,11 @@ public static SshConfiguration defaultSshConfiguration() {
return SshConfiguration.builder().customPrivateKeyCredentialsId("").build();
}

@SuppressWarnings("unused")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@SuppressWarnings("unused")
@SuppressWarnings("unused") // jelly

@@ -943,6 +985,11 @@ public FormValidation doCheckNumExecutorsStr(
return FormValidation.ok();
}

@SuppressWarnings("unused")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@SuppressWarnings("unused")
@SuppressWarnings("unused") // jelly

Comment on lines +60 to +65
assertEquals("Zero configurations found", 1, cloud.getConfigurations().size());
InstanceConfiguration configuration = cloud.getConfigurations().get(0);
assertEquals(
"Provisioning type is wrong",
ProvisioningTypeValue.PREEMPTIBLE,
configuration.getProvisioningType().getValue());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you get rid of the enum (or not)

Suggested change
assertEquals("Zero configurations found", 1, cloud.getConfigurations().size());
InstanceConfiguration configuration = cloud.getConfigurations().get(0);
assertEquals(
"Provisioning type is wrong",
ProvisioningTypeValue.PREEMPTIBLE,
configuration.getProvisioningType().getValue());
assertThat(cloud.getConfigurations(), contains(instanceOf(PreemptibleVm.class)));

@@ -66,8 +65,11 @@
*/
@Log
public class ComputeEngineCloudRestartPreemptedIT {
@ClassRule
public static Timeout timeout = new Timeout(20 * TEST_TIMEOUT_MULTIPLIER, TimeUnit.MINUTES);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add @WithTimeout(1200) at test level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants