Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shell provisioner random script failures #12908

Open
Stromweld opened this issue Apr 8, 2024 · 7 comments
Open

Shell provisioner random script failures #12908

Stromweld opened this issue Apr 8, 2024 · 7 comments

Comments

@Stromweld
Copy link

Stromweld commented Apr 8, 2024

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Overview of the Issue

using the shell provisioner with an array of scripts randomly I get an error when packer tries to run one

Reproduction Steps

run bento builds https://github.com/chef/bento

git clone https://github.com/chef/bento
cd bento
packer init -upgrade ./packer_templates
packer build -only=virtualbox-iso.vm -var-file=os_pkrvars/fedora/fedora-39-x86_64.pkrvars.hcl ./packer_templates

Packer version

v1.10.2

Operating system and Environment details

Ubuntu 22.04 github actions runner. https://github.com/chef/bento/actions/runs/8601528699/job/23569023182

Log Fragments and crash.log files

2024-04-08T14:15:51Z: ==> virtualbox-ovf.vm: Provisioning with shell script: ../../packer_templates/scripts/_common/motd.sh
2024-04-08T14:16:28Z: ==> virtualbox-ovf.vm: sh: /tmp/script_2235.sh: No such file or directory
2024-04-08T14:16:28Z: ==> virtualbox-ovf.vm: Provisioning step had errors: Running the cleanup provisioner, if present...
2024-04-08T14:16:28Z: ==> virtualbox-ovf.vm: Cleaning up floppy disk...
2024-04-08T14:16:28Z: ==> virtualbox-ovf.vm: Deregistering and deleting imported VM...
2024-04-08T14:16:29Z: ==> virtualbox-ovf.vm: Deleting output directory...
2024-04-08T14:16:29Z: Build 'virtualbox-ovf.vm' errored after 1 minute 33 seconds: Script exited with non-zero exit status: 1. Allowed exit codes are: [0]

@Stromweld Stromweld added the bug label Apr 8, 2024
@lbajolet-hashicorp
Copy link
Contributor

Hi @Stromweld,

Thanks for the bug report! This looks indeed like a Packer problem, we do create temporary scripts with the shell provisioner that we copy to the target's /tmp directory before executing it through ssh.

If you're able to reliably reproduce this error, is there a chance you could run the build with --debug or something like --on-error=ask (or abort, whichever you prefer), this way you will be able to ssh into the VM.
I'm interested in knowing if the shell script was actually copied, if it wasn't truncated for whatever reason, or if it was actually copied in the right place.
Might be also interesting to take a look at the mounts at the same time, I can't rule out /tmp being shadowed by another mount possibly.

Let me know what you're able to figure out, and if you need a hand let me know!

@Stromweld
Copy link
Author

it seems to happen the most on my Bento fedora builds when it gets to the build-tools_fedora.sh script.
I updated the reproduction steps for it. I'll give that and see if I can find more information.

@Stromweld
Copy link
Author

I've also seen some random behavior where a script is executed and in red I see the scripts contents but the stream doesn't show any further details and the commands don't seem to be actually running. This happens the most on Fedora 14 build when it gets to the vagrant script to install the vagrant users insecure key the wget command doesn't show any output and the end box created vagrant isn't able to ssh to.

@Stromweld
Copy link
Author

@lbajolet-hashicorp I was able to replicate it and have PACKER_LOG=1 set to get debug output. https://github.com/chef/bento/actions/runs/8649779119/job/23716839452

@lbajolet-hashicorp
Copy link
Contributor

Hey @Stromweld,

Looking back to this now, thanks for being able to replicate and share the logs, but on their own they won't help us troubleshoot the problem. The logs are verbose, but not enough to understand what is the root cause of the issue, especially if this is some random occurrence.

Would you be able to produce a minimal template that we can run locally on a hypervisor? It can be extracted from bento, no problem with that, but ideally I'd like something that doesn't have too many dependencies/local files so we can iterate efficiently on this.

I have a hunch that it's probably our scp that failed to write the file in place, but it's hard to say what is exactly the problem without a live VM to debug this on, as if we have that we can connect into it and look at the filesystem after each step.
The chmod command seems to fail though, and I'm surprised this doesn't mean the end of the process at this point. The exit code is non-zero, but I'm not sure if there's a specific meaning to this other than "it failed".

All in all, I need your help with this here, if you are able to reliably replicate this problem and share some configuration we can run on to troubleshoot that, we can look into it. Otherwise it will be exceedingly hard for us to investigate, and we cannot prioritise this in the current state.

Thanks in advance, and apologies I didn't come back to you earlier!

@Stromweld
Copy link
Author

I can try, it appears to be random if and when it fails. The one that seems to fail the most is for parallels when installing the guest tools if deps are installed first via apt, dnf, etc... it seems to do that then skip the rest of the script. In this case the build succeeded but testing fails as the prl_fs driver isn't available. I'll try to put something together tomorrow.

@lbajolet-hashicorp
Copy link
Contributor

This would be awesome, many thanks @Stromweld !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants