Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASMHMS-6233: Paradise: Adapted SMD to BMC fw changing the behavior of the /Power endpoint read over redfish #157

Merged
merged 3 commits into from
Jul 18, 2024

Conversation

jwlv
Copy link
Contributor

@jwlv jwlv commented Jul 15, 2024

Summary and Scope

In BMC fw 1.17, Foxconn changed the behavior associated with reading the /Power endpoint in the ProcessorModule_0 chassis over redfish. In previous versions, the read just failed and returned no data when the data was not yet available (ie. when node power was off, or shortly after when node power was turned on). SMD was coded to retry until the read succeeded, up to four times.  When it did finally succeed, the data returned was always correct as BMC fw did not allow it to be read until the data was available.  With BMC fw v1.17, the /Power endpoint no longer returns an error when the data is not available (again, when node power is off or shortly after it is turned on). If the data is not yet available on the BMC, the BMC fw simply returns zero or nil for it. This means that the mini-discover that is run after receiving a node power on event will not read the data needed for power capping.

The change in this PR looks at the data returned by the read and retries the read up to 4 times if it is zero or nil. It sleeps for 10, 20, 40, and 80 seconds before the next retry. These values were determined empirically, balancing sleep time while ensuring we eventually do read the data. Another goto statement was used to avoid significant refactoring of the code.

There were also a lot of updates to code comments to reflect the new behavior associated with this redfish endpoint.

Issues and Related PRs

Testing

Tested on:

  • tyr

Test description:

Test cases:

  • Manual discover against Paradise BMC with node power on
  • Manual discover against Paradise BMC with node power off
  • Manual discover against non-Paradice BMC with node power on
  • Many auto mini-discovers of Paradise BMC after receiving node power on event
  • Verified that all smoke and functional smd tests still pass by executing run_hms_ct_tests.sh

Test checklist:

  • Were the install/upgrade-based validation checks/tests run (goss tests/install-validation doc)? Y
  • Were continuous integration tests run? If not, why? Y
  • Was upgrade tested? If not, why? Y
  • Was downgrade tested? If not, why? Y

Pull Request Checklist

  • Version number(s) incremented, if applicable
  • Copyrights updated
  • License file intact
  • Target branch correct
  • CHANGELOG.md updated
  • Testing is appropriate and complete, if applicable

@jwlv jwlv requested review from a team as code owners July 15, 2024 16:22
Copy link

github-actions bot commented Jul 15, 2024

👋 Hey! Here is the image we built for you (Artifactory Link):

artifactory.algol60.net/csm-docker/unstable/cray-smd:2.27.0-20240716181358.d6ee9db

Use podman or docker to pull it down and inspect locally:

podman pull artifactory.algol60.net/csm-docker/unstable/cray-smd:2.27.0-20240716181358.d6ee9db

Or, use this script to pull the image from the build server to a dev system:

Dev System Pull Script

Note the following script only applies to systems running CSM 1.2 or later.

#!/usr/bin/env bash

IMAGE=artifactory.algol60.net/csm-docker/unstable/cray-smd:2.27.0-20240716181358.d6ee9db

podman run --rm --network host  \
    quay.io/skopeo/stable copy \
    --src-tls-verify=false \
    --dest-tls-verify=false \
    --dest-username "$(kubectl -n nexus get secret nexus-admin-credential -o json | jq -r '.data.username | @base64d')" \
    --dest-password "$(kubectl -n nexus get secret nexus-admin-credential -o json | jq -r '.data.password | @base64d')" \
    docker://$IMAGE \
    docker://registry.local/$IMAGE
Snyk Report

Coming soon

Software Bill of Materials
cosign download sbom artifactory.algol60.net/csm-docker/unstable/cray-smd:2.27.0-20240716181358.d6ee9db > container_image.spdx

If you don't have cosign, then you can get it here.

Note: this SHA is the merge of f27e855 and the PR base branch. Good luck and make rocket go now! 🌮 🚀

pkg/redfish/rfcomponents.go Outdated Show resolved Hide resolved
pkg/redfish/rfcomponents.go Outdated Show resolved Hide resolved
pkg/redfish/rfcomponents.go Show resolved Hide resolved
pkg/redfish/rfcomponents.go Outdated Show resolved Hide resolved
pkg/redfish/rfcomponents.go Outdated Show resolved Hide resolved
pkg/redfish/rfcomponents.go Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Jul 16, 2024

👋 Hey! Here is the image we built for you (Artifactory Link):

artifactory.algol60.net/csm-docker/unstable/cray-smd-hmth-test:2.27.0-20240717154709.d6ee9db

Use podman or docker to pull it down and inspect locally:

podman pull artifactory.algol60.net/csm-docker/unstable/cray-smd-hmth-test:2.27.0-20240717154709.d6ee9db

Or, use this script to pull the image from the build server to a dev system:

Dev System Pull Script

Note the following script only applies to systems running CSM 1.2 or later.

#!/usr/bin/env bash

IMAGE=artifactory.algol60.net/csm-docker/unstable/cray-smd-hmth-test:2.27.0-20240717154709.d6ee9db

podman run --rm --network host  \
    quay.io/skopeo/stable copy \
    --src-tls-verify=false \
    --dest-tls-verify=false \
    --dest-username "$(kubectl -n nexus get secret nexus-admin-credential -o json | jq -r '.data.username | @base64d')" \
    --dest-password "$(kubectl -n nexus get secret nexus-admin-credential -o json | jq -r '.data.password | @base64d')" \
    docker://$IMAGE \
    docker://registry.local/$IMAGE
Snyk Report

Coming soon

Software Bill of Materials
cosign download sbom artifactory.algol60.net/csm-docker/unstable/cray-smd-hmth-test:2.27.0-20240717154709.d6ee9db > container_image.spdx

If you don't have cosign, then you can get it here.

Note: this SHA is the merge of f27e855 and the PR base branch. Good luck and make rocket go now! 🌮 🚀

@jwlv jwlv merged commit 17f8b52 into master Jul 18, 2024
15 checks passed
@jwlv jwlv deleted the CASMHMS-6233.1.6 branch July 18, 2024 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants