-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CASMHMS-6233: Paradise: Adapted SMD to BMC fw changing the behavior of the /Power endpoint read over redfish #157
Conversation
…f the /Power endpoint read over redfish
👋 Hey! Here is the image we built for you (Artifactory Link): artifactory.algol60.net/csm-docker/unstable/cray-smd:2.27.0-20240716181358.d6ee9db Use podman or docker to pull it down and inspect locally: podman pull artifactory.algol60.net/csm-docker/unstable/cray-smd:2.27.0-20240716181358.d6ee9db Or, use this script to pull the image from the build server to a dev system: Dev System Pull Script
#!/usr/bin/env bash
IMAGE=artifactory.algol60.net/csm-docker/unstable/cray-smd:2.27.0-20240716181358.d6ee9db
podman run --rm --network host \
quay.io/skopeo/stable copy \
--src-tls-verify=false \
--dest-tls-verify=false \
--dest-username "$(kubectl -n nexus get secret nexus-admin-credential -o json | jq -r '.data.username | @base64d')" \
--dest-password "$(kubectl -n nexus get secret nexus-admin-credential -o json | jq -r '.data.password | @base64d')" \
docker://$IMAGE \
docker://registry.local/$IMAGE Snyk ReportComing soon Software Bill of Materialscosign download sbom artifactory.algol60.net/csm-docker/unstable/cray-smd:2.27.0-20240716181358.d6ee9db > container_image.spdx If you don't have cosign, then you can get it here. Note: this SHA is the merge of f27e855 and the PR base branch. Good luck and make rocket go now! 🌮 🚀 |
👋 Hey! Here is the image we built for you (Artifactory Link): artifactory.algol60.net/csm-docker/unstable/cray-smd-hmth-test:2.27.0-20240717154709.d6ee9db Use podman or docker to pull it down and inspect locally: podman pull artifactory.algol60.net/csm-docker/unstable/cray-smd-hmth-test:2.27.0-20240717154709.d6ee9db Or, use this script to pull the image from the build server to a dev system: Dev System Pull Script
#!/usr/bin/env bash
IMAGE=artifactory.algol60.net/csm-docker/unstable/cray-smd-hmth-test:2.27.0-20240717154709.d6ee9db
podman run --rm --network host \
quay.io/skopeo/stable copy \
--src-tls-verify=false \
--dest-tls-verify=false \
--dest-username "$(kubectl -n nexus get secret nexus-admin-credential -o json | jq -r '.data.username | @base64d')" \
--dest-password "$(kubectl -n nexus get secret nexus-admin-credential -o json | jq -r '.data.password | @base64d')" \
docker://$IMAGE \
docker://registry.local/$IMAGE Snyk ReportComing soon Software Bill of Materialscosign download sbom artifactory.algol60.net/csm-docker/unstable/cray-smd-hmth-test:2.27.0-20240717154709.d6ee9db > container_image.spdx If you don't have cosign, then you can get it here. Note: this SHA is the merge of f27e855 and the PR base branch. Good luck and make rocket go now! 🌮 🚀 |
Summary and Scope
In BMC fw 1.17, Foxconn changed the behavior associated with reading the /Power endpoint in the ProcessorModule_0 chassis over redfish. In previous versions, the read just failed and returned no data when the data was not yet available (ie. when node power was off, or shortly after when node power was turned on). SMD was coded to retry until the read succeeded, up to four times. When it did finally succeed, the data returned was always correct as BMC fw did not allow it to be read until the data was available. With BMC fw v1.17, the /Power endpoint no longer returns an error when the data is not available (again, when node power is off or shortly after it is turned on). If the data is not yet available on the BMC, the BMC fw simply returns zero or nil for it. This means that the mini-discover that is run after receiving a node power on event will not read the data needed for power capping.
The change in this PR looks at the data returned by the read and retries the read up to 4 times if it is zero or nil. It sleeps for 10, 20, 40, and 80 seconds before the next retry. These values were determined empirically, balancing sleep time while ensuring we eventually do read the data. Another goto statement was used to avoid significant refactoring of the code.
There were also a lot of updates to code comments to reflect the new behavior associated with this redfish endpoint.
Issues and Related PRs
Testing
Tested on:
Test description:
Test cases:
run_hms_ct_tests.sh
Test checklist:
Pull Request Checklist