Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retries to autoboot in ipxe script: #87

Closed
wants to merge 3 commits into from

Conversation

jacobweinstock
Copy link
Member

Description

This gives the autoboot (in the embedded script) a chance to overcome transient network issues or similiar.

Why is this needed

Fixes: #

How Has This Been Tested?

How are existing users impacted? What migration steps/scripts do we need?

Checklist:

I have:

  • updated the documentation and/or roadmap (if required)
  • added unit or e2e tests
  • provided instructions on how to upgrade

@codecov
Copy link

codecov bot commented Jul 18, 2023

Codecov Report

Merging #87 (ded010a) into main (6eefab8) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main      #87   +/-   ##
=======================================
  Coverage   97.25%   97.25%           
=======================================
  Files           5        5           
  Lines         437      437           
=======================================
  Hits          425      425           
  Misses          8        8           
  Partials        4        4           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

chrisdoherty4
chrisdoherty4 previously approved these changes Jul 24, 2023
This gives the autoboot a chance to overcome
transient network issues or similiar.

Signed-off-by: Jacob Weinstock <[email protected]>
@jacobweinstock jacobweinstock added the do-not-merge Mergify: Block Merging label Aug 2, 2023
@jacobweinstock jacobweinstock removed the do-not-merge Mergify: Block Merging label Sep 11, 2023
@jacobweinstock
Copy link
Member Author

jacobweinstock commented Sep 11, 2023

Pending a basic test from @nshalman , thanks Nahum!

@nshalman
Copy link
Member

It's actually not behaving as expected for me. I'll see if I can figure out why.

autoboot
set attempt:int32 0
:retry
iseq ${attempt} 5 && error-with-autoboot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be iseq ${attempt} 5 && goto error-with-autoboot

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yes! good catch. will update

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Member

@nshalman nshalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't work when tested as written. comment with a possible fix.

@nshalman
Copy link
Member

The more I think about this, the less I like the automated retries, but I do like the offer of a shell after a failure.
In the situation where a machine ought to be booting from disk, but is accidentally booting from the network, 5 retries could burn a lot of boot time. An offer of a shell to an operator watching after failure on the other hand might be hugely helpful for debugging while being less intrusive.

Also, I've been unable to get the retry logic to work. :(

@jacobweinstock
Copy link
Member Author

The more I think about this, the less I like the automated retries, but I do like the offer of a shell after a failure. In the situation where a machine ought to be booting from disk, but is accidentally booting from the network, 5 retries could burn a lot of boot time. An offer of a shell to an operator watching after failure on the other hand might be hugely helpful for debugging while being less intrusive.

Also, I've been unable to get the retry logic to work. :(

Agreed, i think this is part of the reason i had this on hold for a while. the trade-offs of retrying weren't sitting right with me.

@nshalman
Copy link
Member

Agreed, i think this is part of the reason i had this on hold for a while. the trade-offs of retrying weren't sitting right with me.

I'm glad you never merged it as-is.

@nshalman
Copy link
Member

I did test the following which did appear to behave as I would expect:

diff --git a/binary/script/embed.ipxe b/binary/script/embed.ipxe
index b2c582b..5c85080 100644
--- a/binary/script/embed.ipxe
+++ b/binary/script/embed.ipxe
@@ -25,7 +25,9 @@ echo Booting from net${idx}...
 autoboot net${idx}

 :autoboot
-autoboot
+autoboot ||
+prompt --key 0x02 --timeout 2000 autoboot failed. Press Ctrl-B for the iPXE command line... && shell
+exit

 :boot-with-vlan
 set idx:int32 0

So I think I'm a 👎 on both the idea and the implementation of the retry logic, but I would be open to the brief offer of a shell after failure.

@jacobweinstock jacobweinstock added the do-not-merge Mergify: Block Merging label Sep 11, 2023
@jacobweinstock
Copy link
Member Author

closing, please see conversation above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge Mergify: Block Merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants