-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sled-agent] add preliminary "write boot disk OS" http endpoints #4633
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
// A previous write is done, but we're | ||
// immedately starting a new one, so discard the | ||
// previous result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha I had a note here asking if it's worth documenting whether we want an explicit clear step, but you have it right below!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in hindsight since we did this for both the SP and wicketd, it seems reasonable to do it here too. Changed in 8a65c20.
} | ||
})?; | ||
|
||
let mut buf = vec![0; disk_block_size]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(General note, mostly for myself, that this can't be Vec::with_capacity(disk_block_size)
, because read_exact
takes &mut [u8]
which will be empty in the with_capacity
case.)
/// Current progress of an OS image being written to disk. | ||
#[derive( | ||
Debug, Clone, Copy, PartialEq, Eq, Deserialize, JsonSchema, Serialize, | ||
)] | ||
#[serde(tag = "state", rename_all = "snake_case")] | ||
pub enum BootDiskOsWriteProgress { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering whether the update engine might be a good fit here, it seems like this is at the same state of evolution we were with the installinator, before which we ended up writing the update engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I didn't even think of that, but you're definitely right. It kinda feels like overkill since there's no consumer for the spec, etc., I think? I don't know off the top of my head which would be more approachable; I'm inclined to leave this as-is but add a comment that if we come back to this and need it to do more we could consider replacing the hand rolled state machine - seem reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that sounds fine.
Tested on madrid (and demo'd today); only ran into one small problem which I fixed in 865d281: sled-agent was hardcoding a 1 MiB request limit, so that commit moves the hard-coding to sled-agent's Testing notes from madrid are below. After setting up madrid:
BRM44220001 # shasum zfs.img
16be0b2ae1bf30e3b330b5ebb9e8140544db5df6 zfs.img
# A
BRM44220001 # dd status=none if=/dev/rdsk/c1t00A0750132752F12d0s0 bs=4096 count=204801 | shasum -
16be0b2ae1bf30e3b330b5ebb9e8140544db5df6 -
# B
BRM44220001 # dd status=none if=/dev/rdsk/c7t00A0750132748127d0s0 bs=4096 count=204801 | shasum -
7dff835fbfd5d5a92fa7a428637021baeeec4b4f - Confirmed the GET endpoint for both slots shows no update: BRM44220001 # curl http://[fd00:1122:3344:102::1]:12345/boot-disk/A/os/write/status
{"status":"no_update_started"}
BRM44220001 # curl http://[fd00:1122:3344:102::1]:12345/boot-disk/B/os/write/status
{"status":"no_update_started"} Start an update to slot B: BRM44220001 # curl -X POST -T zfs.img 'http://[fd00:1122:3344:102::1]:12345/boot-disk/B/os/write?update_id=5ef7e0d5-9892-4806-a767-1df71d9d83bb&sha3_256_digest=6d01294beba485946d512afe9cbb82a657bf0c72eb884f22357dea902e50b62e' Polling status every second while it runs: {"status":"no_update_started"}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"receiving_uploaded_image","bytes_received":98821728}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"receiving_uploaded_image","bytes_received":388933376}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"receiving_uploaded_image","bytes_received":680485856}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"writing_image_to_disk","bytes_written":37359616}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"writing_image_to_disk","bytes_written":121917440}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"writing_image_to_disk","bytes_written":206319616}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"writing_image_to_disk","bytes_written":289697792}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"writing_image_to_disk","bytes_written":374255616}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"writing_image_to_disk","bytes_written":458887168}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"writing_image_to_disk","bytes_written":543580160}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"writing_image_to_disk","bytes_written":628396032}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"writing_image_to_disk","bytes_written":712097792}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"writing_image_to_disk","bytes_written":796778496}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"validating_written_image","bytes_read":24424448}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"validating_written_image","bytes_read":76963840}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"validating_written_image","bytes_read":129142784}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"validating_written_image","bytes_read":226209792}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"validating_written_image","bytes_read":302428160}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"validating_written_image","bytes_read":369467392}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"validating_written_image","bytes_read":440737792}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"validating_written_image","bytes_read":506818560}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"validating_written_image","bytes_read":576548864}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"validating_written_image","bytes_read":628310016}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"validating_written_image","bytes_read":680747008}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"validating_written_image","bytes_read":733007872}}
{"status":"in_progress","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb","progress":{"state":"validating_written_image","bytes_read":785166336}}
{"status":"complete","update_id":"5ef7e0d5-9892-4806-a767-1df71d9d83bb"} Confirming the disk has the BRM44220001 # dd if=/dev/rdsk/c7t00A0750132748127d0s0 bs=4096 count=204801 | shasum -
16be0b2ae1bf30e3b330b5ebb9e8140544db5df6 - Also manually tested several of the edge cases covered by the unit tests:
|
This PR adds two new endpoints to
sled-agent
:POST /boot-disk/{boot_disk}/os/write
to start a new update to one of the two boot disksGET /boot-disk/{boot_disk}/os/write/status
to get the status of the most-recently-started update to the specified boot diskThe actual drive-writing-machinery is extracted from
installinator
intoinstallinator-common
, which is now a new dependency ofsled-agent
. The bulk of the changes in this PR center around being able to run that drive-write from within the context of the pair of dropshot endpoints above, plus a bit of glue for unit testing with "in-memory disks".TODO before merging: