-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tasks: Lots of cleanup #590
Conversation
e8c5abc
to
d21fb87
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit message: consistentcy
commit message: we're also running tests-scan and friends directly in the main container (in addition to statistics)
# qcow overlays on tmpfs | ||
'--tmpfs=/tmp:size=14g', | ||
'--env=TEST_OVERLAY_DIR=/tmp', | ||
|
||
# image cache | ||
'--env=COCKPIT_IMAGES_DATA_DIR=/cache/images', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I'm missing the corresponding change in run-local.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's right here, in this very commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't understand how this works differently (ie: args given to outside container and then forwarded inside, with job-runner.toml created by shellscript).
I feel like it would be easier to understand (in run-local.sh
) if the never-changing parts of the run-args are listed directly in the toml file and we only use the variable substitution for the parts we change when we're mocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, of course -- this also should go into job-runner, not just the "queue run" tasks container, so that it reflects what's going on in production. Also agreed about the "never-changing parts", I cleaned this up.
859e8a0
to
f41df47
Compare
OK, this works now. I adjusted the commit messages, too. However, I'd like to queue it after #598, as it's gonna conflict (a little bit), and is less important. |
@allisonkarlitskaya This is the "first stage" review -- it may likely get some minor fixups when I deploy it, I just wanted us to agree before I change anything in prod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are all nice cleanups — but let's see how the rollout goes. Please see the comments about how job-runner.toml in run-local.sh could be easier to understand...
One thing I'd wish for is less user-setup stuff in the containerfile.
# qcow overlays on tmpfs | ||
'--tmpfs=/tmp:size=14g', | ||
'--env=TEST_OVERLAY_DIR=/tmp', | ||
|
||
# image cache | ||
'--env=COCKPIT_IMAGES_DATA_DIR=/cache/images', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't understand how this works differently (ie: args given to outside container and then forwarded inside, with job-runner.toml created by shellscript).
I feel like it would be easier to understand (in run-local.sh
) if the never-changing parts of the run-args are listed directly in the toml file and we only use the variable substitution for the parts we change when we're mocking.
Replace them with env variables in `job-config.toml`/OpenShift. `$TEST_OVERLAY_DIR` was already set. Also clean up the job-runner.toml construction in run-local.sh: Move the constant parts of `container.run-args` into the template, and only use `$run_args` for test specific adjustments.
With job-runner, all qemu jobs moved to separate containers. So we don't need the libvirt related cleanup any more. Note: We still need to retain the image-prune call. issue-scan/image-refresh only prunes S3, not the local cache. We may want to teach it to at some point.
These now just do `{tests,issue}-scan`, `run-queue`, `job-runner`, and test statistics. None of these uses a lot of resources, the expensive ones (tests and image refreshes) moved to job-runner.toml. So drop all the memory/shm configuration and obsolete env vars. Also drop the custom network. If inter-container socket leakage is still an issue, it'll need to be done to the job containers, but hopefully that was fixed long ago.🤞 Drop the remaining `Environment=` indirection. The env vars can just be expanded in-place when writing the service file. This was already done for many of them. Still keep importing /dev/kvm into the container. It doesn't strictly need this, but current bots `run-queue` uses the presence of this as an indication whether or not it can consume tasks from the public/rhel queues. This needs to be fixed first (which needs some thought). So dropping that will happen later.
This is a deployment script on the host, it has no place inside the container.
This separates files which affect the cockpit/tasks container image (and hence need an image refresh) from the files for deployment and tests. This simplifies and robustifies the "did the container change?" check in the tests workflow.
This is hopelessly outdated and useless. Production does not use that, and local developers just use toolbox or podman-run -- they wouldn't have any of these secrets directories anyway.
Adjust it to current reality of secrets, job-runner, and CentOS CI resources. Drop the remaining `$` prefixes from commands for consistency and working copy&paste. Move up the "using with toolbox" section, as it belongs to the deployment part. The bottom is documentation how our system works.
Updated for two issues that I overlooked. The /dev/kvm was a funny one.. I deployed this now, and sent cockpit-project/bots#6079 to confirm both an image refresh and a test run, and it's happy now. Update: I sent a queue-run fix to cockpit-project/bots#6080 . If that lands soon, I'm happy to revert the revert 😁 Update 2: Ah, this has to land first. Me refreshing the tasks container now broke bots tests, as they use the new container with the old run-local.sh code. I'll drop /dev/kvm as a follow-up. |
With job-runner, the bits that actually need /dev/kvm don't run in the "queue monitor" task container any more, but in separate containers. We want to drop importing /dev/kvm into the monitor container [1], so extend the check to also accept the presence of a job-runner configuration. With that we can drop the check for `COCKPIT_TESTMAP_INJECT` -- that was just a hack, and the cockpituous integration tests define `JOB_RUNNER_CONFIG` already. [1] cockpit-project/cockpituous#590
With job-runner, the bits that actually need /dev/kvm don't run in the "queue monitor" task container any more, but in separate containers. We want to drop importing /dev/kvm into the monitor container [1], so extend the check to also accept the presence of a job-runner configuration. With that we can drop the check for `COCKPIT_TESTMAP_INJECT` -- that was just a hack, and the cockpituous integration tests define `JOB_RUNNER_CONFIG` already. [1] cockpit-project/cockpituous#590
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the cleanup.
Let's get this in.
With job-runner, the bits that actually need /dev/kvm don't run in the "queue monitor" task container any more, but in separate containers. We want to drop importing /dev/kvm into the monitor container [1], so extend the check to also accept the presence of a job-runner configuration. With that we can drop the check for `COCKPIT_TESTMAP_INJECT` -- that was just a hack, and the cockpituous integration tests define `JOB_RUNNER_CONFIG` already. [1] cockpit-project/cockpituous#590
Rendered README: https://github.com/cockpit-project/cockpituous/blob/cleanups/tasks/README.md
Blocked on Convert .format() to f-strings bots#6035 and test: Drop rhel-8* from supported OSes, fix for ruff 0.3.1 (fixes failing "tox" test) cockpit#20149