fix: randomize lxc container names #651

mikemccracken · 2024-11-07T04:59:44Z

What type of PR is this?

bug

Which issue does this PR fix:

In situations where concurrent stacker runs are happening on a system, and they are building containers with the same name, and they are being done inside a mount namespace,
and the path name given as the roots dir is the same, but the actual mounted volume is different,
then both stackers will be able to acquire the file lock at $rootdir/.lock, and will go ahead and start containers named $name, which will then race to set up the lxc control socket, which is named after the container name and the rootfs path, which are both the same here.

What does this PR do / Why do we need it:

The fix is to add some randomness to the lxc container name, which ensures that the socket won't clash. This should not affect other uses of the image name, which will still use the un-randomized name.

If an issue # is not available please add repro steps and logs showing the issue:

#645

Testing done on this change:

manual only, CI pending

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

mikemccracken · 2024-11-07T05:00:38Z

Draft because it might need work if it breaks CI. the test suite is not reliable on my system right now so I am just using the github runners

In situations where concurrent stacker runs are happening on a system, and they are building containers with the same name, and they are being done inside a mount namespace, and the path name given as the roots dir is the same, but the actual mounted volume is different, then both stackers will be able to acquire the file lock at $rootdir/.lock, and will go ahead and start containers named $name, which will then race to set up the lxc control socket, which is named after the container name and the rootfs path, which are both the same here. The fix is to add some randomness to the lxc container name, which ensures that the socket won't clash. This should not affect other uses of the image name, which will still use the un-randomized name. Signed-off-by: Michael McCracken <[email protected]>

mikemccracken · 2024-12-04T22:19:33Z

The CI failures are real, there is a case now where a build creates a new layer when it shouldn't. empty-layer.bats checks for this and that's what's failing.

However it isn't obvious to me what exactly is going wrong, I spent a half day looking at it before Thanksgiving and didn't conclude. Will revisit later.

Maybe the Right Way is to have lxc include the mount ns id in its socket naming code?

mikemccracken requested review from rchincha, smoser and hallyn as code owners November 7, 2024 04:59

mikemccracken marked this pull request as draft November 7, 2024 04:59

mikemccracken force-pushed the 2024.11.01/main/uniquify-container-names branch from 669c12a to 74db868 Compare November 7, 2024 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: randomize lxc container names #651

fix: randomize lxc container names #651

mikemccracken commented Nov 7, 2024

mikemccracken commented Nov 7, 2024

mikemccracken commented Dec 4, 2024

fix: randomize lxc container names #651

Are you sure you want to change the base?

fix: randomize lxc container names #651

Conversation

mikemccracken commented Nov 7, 2024

mikemccracken commented Nov 7, 2024

mikemccracken commented Dec 4, 2024