The page walks a user through setting up the Cray LiveCD with the intention of installing Cray System Management (CSM).
- Boot installation environment
- Download and extract the CSM tarball
- Create system configuration
- Import the CSM Tarball
- Validate the LiveCD
- Next topic
This section walks the user through booting and connecting to the LiveCD.
Before proceeding, the user must obtain the CSM tarball containing the LiveCD.
NOTE: Each step denotes where its commands must run;
external#
refers to a server that is not the Cray, whereaspit#
refers to the LiveCD itself.
Any steps run on an external
server require that server to have the following tools:
ipmitool
ssh
tar
NOTE: The CSM tarball will be fetched from the external server in the download and extract the CSM tarball step using
curl
orscp
. If a web server is not installed, thenscp
is the backup option.
-
(
external#
) Download the CSM software release from the public Artifactory instance.NOTES:
-C -
is used to allow partial downloads. These tarballs are large; in the event of a connection disruption, the samecurl
command can be used to continue the disrupted download.- If air-gapped or behind a strict firewall, then the tarball must be obtained from the medium delivered by Cray-HPE. For these cases, copy or download the tarball to the working
directory and then proceed to the next step. The tarball will need to be fetched with
scp
during the download and extract the CSM tarball step.
-
(
external#
) Set the CSM RELEASE versionExample release versions:
- An alpha build:
CSM_RELEASE=1.6.0-alpha.99
- A release candidate:
CSM_RELEASE=1.6.0-rc.1
- A stable release:
CSM_RELEASE=1.6.0
CSM_RELEASE=<value>
- An alpha build:
-
(
external#
) Download the CSM tarballNOTE: CSM does NOT support the use of proxy servers for anything other than downloading artifacts from external endpoints. Using
http_proxy
orhttps_proxy
in any way other than the following examples will cause many failures in subsequent steps.-
Without proxy:
curl -C - -f -O "https://release.algol60.net/$(awk -F. '{print "csm-"$1"."$2}' <<< ${CSM_RELEASE})/csm/csm-${CSM_RELEASE}.tar.gz"
-
With https proxy:
https_proxy=https://example.proxy.net:443 curl -C - -f -O "https://release.algol60.net/$(awk -F. '{print "csm-"$1"."$2}' <<< ${CSM_RELEASE})/csm/csm-${CSM_RELEASE}.tar.gz"
-
-
(
external#
) Extract the LiveCD from the tarball.OUT_DIR="$(pwd)/csm-temp" mkdir -pv "${OUT_DIR}" tar -C "${OUT_DIR}" --wildcards --no-anchored --transform='s/.*\///' -xzvf "csm-${CSM_RELEASE}.tar.gz" 'pre-install-toolkit-*.iso'
-
(
external#
) Start a typescript and set thePS1
variable to record timestamps.NOTE: Typescripts help triage if problems are encountered.
script -a "boot.livecd.$(date +%Y-%m-%d).txt" export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
-
(
external#
) Follow one of the procedures below based on the vendor for thencn-m001
node:-
HPE iLO BMCs
Prepare a server on the network to host the
pre-install-toolkit
ISO file, if the current server is insufficient. Then follow the HPE iLO BMCs to boot the RemoteISO before returning here. -
Gigabyte BMCs and Intel BMCs
Create a USB stick using the following procedure.
-
(
external#
) Getcray-site-init
from the tarball.OUT_DIR="$(pwd)/csm-temp" mkdir -pv "${OUT_DIR}" tar -C "${OUT_DIR}" --wildcards --no-anchored --transform='s/.*\///' -xzvf "csm-${CSM_RELEASE}.tar.gz" 'cray-site-init-*.rpm'
-
(
external#
) Install thewrite-livecd.sh
script:-
RPM-based systems:
rpm -Uvh --force ${OUT_DIR}/cray-site-init*.rpm
-
Non-RPM-based systems (requires
bsdtar
):bsdtar xvf "${OUT_DIR}"/cray-site-init-*.rpm --include *write-livecd.sh -C "${OUT_DIR}" mv -v "${OUT_DIR}"/usr/local/bin/write-livecd.sh "./${OUT_DIR}" rmdir -pv "${OUT_DIR}/usr/local/bin/"
-
Non-RPM-based distros (requires
rpm2cpio
):rpm2cpio cray-site-init-*.rpm | cpio -idmv mv -v ./usr/local/bin/write-livecd.sh "./${OUT_DIR}" rm -vrf ./usr
-
-
Follow Bootstrap a LiveCD USB and then return here.
-
-
On first login, the LiveCD will prompt the administrator to change the password.
-
(
pit#
) Log in.NOTE: The initial password is empty.
At the login prompt, enter
root
as the username. Because the initial password is blank, press return twice at the first two password prompts. The LiveCD will force a new password to be set.Password: <-------just press Enter here for a blank password You are required to change your password immediately (administrator enforced) Changing password for root. Current password: <------- press Enter here, again, for a blank password New password: <------- type new password Retype new password:<------- retype new password Welcome to the CRAY Pre-Install Toolkit (LiveOS)
-
(
pit#
) Configure the site-link (lan0
), DNS, and gateway IP addresses.NOTE: The
site_ip
,site_gw
, andsite_dns
values must come from the local network administration or authority.-
Set
site_ip
variable.Set the
site_ip
value in CIDR format (A.B.C.D/N
):site_ip=<IP CIDR>
-
Set the
site_gw
andsite_dns
variables.Set the
site_gw
andsite_dns
values in IPv4 dotted decimal format (A.B.C.D
):site_gw=<Gateway IP address> site_dns=<DNS IP address>
-
Set the
site_nics
variable.The
site_nics
value or values are found while the user is in the LiveCD (for example,site_nics='p2p1 p2p2 p2p3'
orsite_nics=em1
).site_nics='<site NIC or NICs>'
-
Set the
SYSTEM_NAME
variable.SYSTEM_NAME
is the name of the system. This will only be used for the PIT hostname. This variable is capitalized because it will be used in a subsequent section.SYSTEM_NAME=<system name>
-
Run the
csi-setup-lan0.sh
script to set up the site link and set the hostname.NOTES:
- All of the
/root/bin/csi-*
scripts can be run without parameters to display usage statements. - The hostname is auto-resolved based on reverse DNS.
/root/bin/csi-setup-lan0.sh "${SYSTEM_NAME}" "${site_ip}" "${site_gw}" "${site_dns}" "${site_nics}"
- All of the
-
-
(
pit#
) Verify that the assigned IP address was successfully applied tolan0
.wicked ifstatus --verbose lan0
NOTE: The output from the above command must say
leases: ipv4 static granted
. If the IPv4 address was not granted, then go back and recheck the variable values. The output will indicate the IP address failed to assign, which can happen if the given IP address is already taken on the connected network.
-
(
pit#
) Mount thePITDATA
partition.Use either the RemoteISO or the USB option below, depending how the LiveCD was connected in the Boot the LiveCD step.
-
RemoteISO
Use a local disk for
PITDATA
:disk="$(lsblk -l -o SIZE,NAME,TYPE,TRAN -e7 -e11 -d -n | grep -v usb | sort -h | awk '{print $2}' | xargs -I {} bash -c "if ! grep -Fq {} /proc/mdstat; then echo {}; fi" | head -n 1)" echo "Using ${disk}" parted --wipesignatures -m --align=opt --ignore-busy -s "/dev/${disk}" -- mklabel gpt mkpart primary ext4 2048s 100% partprobe "/dev/${disk}" mkfs.ext4 -L PITDATA "/dev/${disk}1" mount -vL PITDATA
-
USB
Mount the USB data partition:
mount -vL PITDATA
-
These variables will need to be set for many procedures within the CSM installation process.
NOTE: This sets some variables that were already set. These should be set again anyway.
-
(
pit#
) Set the variables.-
Set the
PITDATA
variable.export PITDATA="$(lsblk -o MOUNTPOINT -nr /dev/disk/by-label/PITDATA)"
-
Set the
CSM_RELEASE
variable.The value is based on the version of the CSM release being installed.
Example release versions:
- An alpha build:
CSM_RELEASE=1.6.0-alpha.99
- A release candidate:
CSM_RELEASE=1.6.0-rc.1
- A stable release:
CSM_RELEASE=1.6.0
export CSM_RELEASE=<value>
- An alpha build:
-
Set the
CSM_PATH
variable.After the CSM release tarball has been expanded, this will be the path to its base directory.
export CSM_PATH="${PITDATA}/csm-${CSM_RELEASE}"
-
Set the
SYSTEM_NAME
variable.This is the user friendly name for the system. For example, for
eniac-ncn-m001
,SYSTEM_NAME
should be set toeniac
.export SYSTEM_NAME=<value>
-
-
(
pit#
) Update/etc/environment
.This ensures that these variables will be set in all future shells on the PIT node.
cat << EOF >/etc/environment CSM_RELEASE=${CSM_RELEASE} CSM_PATH=${PITDATA}/csm-${CSM_RELEASE} GOSS_BASE=${GOSS_BASE} PITDATA=${PITDATA} SYSTEM_NAME=${SYSTEM_NAME} EOF
-
(
pit#
) Create theadmin
directory for the typescripts and administrative scratch work.mkdir -pv "$(lsblk -o MOUNTPOINT -nr /dev/disk/by-label/PITDATA)/prep/admin" ls -l "$(lsblk -o MOUNTPOINT -nr /dev/disk/by-label/PITDATA)/prep/admin"
-
(
pit#
) Exit the typescript and log out.exit exit
-
(
pit#
) Exit the console.This is done by typing the key-sequence: tilde, period. That is,
~.
If the console was accessed over an SSH session (that is, the user used SSH to log into another server, and from there used
ipmitool
to access the console), then press tilde twice followed by a period, in order to prevent exiting the parent SSH session. That is,~~.
-
(
external#
) Copy the typescript to the running LiveCD.scp boot.livecd.*.txt root@eniac-ncn-m001:/tmp/
-
(
pit#
) SSH into the LiveCD.livecd=eniac-ncn-m001.example.company.com ssh root@"${livecd}"
-
(
pit#
) Copy the previous typescript and start a new one.cp -pv /tmp/boot.livecd.*.txt "${PITDATA}/prep/admin" script -af "${PITDATA}/prep/admin/csm-install.$(date +%Y-%m-%d).txt" export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
-
(
pit#
) Print information about the booted PIT image for logging purposes.Having this information in the typescript can be helpful if problems are encountered during the install.
/root/bin/metalid.sh
Expected output looks similar to the following (the versions in the example below may differ). There should be no errors.
= PIT Identification = COPY/CUT START ======================================= VERSION=ed97205-1706718622724 TIMESTAMP=2024-01-31_16:30:22 CRAY-Site-Init build signature... Build Commit : c9a07e366151a72be71d168061d08bd97da5344c-heads-v1.32.4 Build Time : 2023-10-20T14:53:30Z Go Version : go1.19 Version : v1.32.4 Platform : linux/amd64 canu-1.8.0-1.x86_64 ilorest-4.2.0.0-20.x86_64 metal-basecamp-1.2.6-1.x86_64 metal-ipxe-2.4.7-1.noarch metal-init-1.4.6-1.noarch metal-nexus-1.3.1-3.38.0_1.x86_64 metal-observability-1.0.9-1.x86_64 = PIT Identification = COPY/CUT END =========================================
-
Download and install the latest documentation and scripts RPMs, see Check for latest documentation.
-
(
pit#
) Download the CSM tarball.-
From Cray using
curl
:-C -
is used to allow partial downloads. These tarballs are large; in the event of a connection disruption, the samecurl
command can be used to continue the disrupted download.- CSM does NOT support the use of proxy servers for anything other than downloading artifacts from external endpoints. Using
http_proxy
orhttps_proxy
in any way other than the following examples will cause many failures in subsequent steps.
Without proxy:
curl -C - -f -o "/var/www/ephemeral/csm-${CSM_RELEASE}.tar.gz" \ "https://release.algol60.net/$(awk -F. '{print "csm-"$1"."$2}' <<< ${CSM_RELEASE})/csm/csm-${CSM_RELEASE}.tar.gz"
With HTTPS proxy:
https_proxy=https://example.proxy.net:443 curl -C - -f -o "/var/www/ephemeral/csm-${CSM_RELEASE}.tar.gz" \ "https://release.algol60.net/$(awk -F. '{print "csm-"$1"."$2}' <<< ${CSM_RELEASE})/csm/csm-${CSM_RELEASE}.tar.gz"
-
scp
from the external server used in Prepare installation environment server:scp "<external-server>:/<path>/csm-${CSM_RELEASE}.tar.gz" /var/www/ephemeral/
-
-
(
pit#
) Extract the tarball.tar -zxvf "${PITDATA}/csm-${CSM_RELEASE}.tar.gz" -C ${PITDATA}
-
(
pit#
) Install/update the RPMs necessary for the CSM installation.NOTE
--no-gpg-checks
is used because the repository contained within the tarball does not provide a GPG key.-
Check for, and update
cray-site-init
,metal-init
, andmetal-ipxe
.NOTES
cray-site-init
providescsi
, a tool for creating and managing configurations, as well as orchestrating the handoff and deploy of the final non-compute node.metal-init
provides several scripts in/root/bin
used for fresh installations.metal-ipxe
provides boot parameters for the NCNs, as well as EFI binaries for PXE/iPXE/HTTP booting.
zypper --plus-repo "${CSM_PATH}/rpm/cray/csm/noos" --no-gpg-checks update -y cray-site-init metal-init metal-ipxe
-
-
(
pit#
) Get the artifact versions.KUBERNETES_VERSION="$(find ${CSM_PATH}/images/kubernetes -name '*.squashfs' -exec basename {} .squashfs \; | awk -F '-' '{print $(NF-1)}')" echo "${KUBERNETES_VERSION}" CEPH_VERSION="$(find ${CSM_PATH}/images/storage-ceph -name '*.squashfs' -exec basename {} .squashfs \; | awk -F '-' '{print $(NF-1)}')" echo "${CEPH_VERSION}"
-
(
pit#
) Copy the NCN images from the expanded tarball.NOTE This hard-links the files to do this copy as fast as possible, as well as to mitigate space waste on the USB stick.
mkdir -pv "${PITDATA}/data/k8s/" "${PITDATA}/data/ceph/" rsync -rltDP --delete "${CSM_PATH}/images/kubernetes/" --link-dest="${CSM_PATH}/images/kubernetes/" "${PITDATA}/data/k8s/${KUBERNETES_VERSION}" rsync -rltDP --delete "${CSM_PATH}/images/storage-ceph/" --link-dest="${CSM_PATH}/images/storage-ceph/" "${PITDATA}/data/ceph/${CEPH_VERSION}"
-
(
pit#
) Modify the NCN images with SSH keys androot
passwords.The following substeps provide the most commonly used defaults for this process. For more advanced options, see Set NCN Image Root Password, SSH Keys, and Timezone on PIT Node.
-
Generate SSH keys.
NOTE The code block below assumes there is an RSA key without a passphrase. This step can be customized to use a passphrase if desired.
ssh-keygen -N "" -t rsa
-
Export the password hash for
root
that is needed for thencn-image-modification.sh
script.This will set the NCN
root
user password to be the same as theroot
user password on the PIT.export SQUASHFS_ROOT_PW_HASH="$(awk -F':' /^root:/'{print $2}' < /etc/shadow)"
-
Inject these into the NCN images by running
ncn-image-modification.sh
from the CSM documentation RPM.NCN_MOD_SCRIPT=$(rpm -ql docs-csm | grep ncn-image-modification.sh) echo "${NCN_MOD_SCRIPT}" "${NCN_MOD_SCRIPT}" -p \ -d /root/.ssh \ -k "/var/www/ephemeral/data/k8s/${KUBERNETES_VERSION}/kubernetes-${KUBERNETES_VERSION}-$(uname -i).squashfs" \ -s "/var/www/ephemeral/data/ceph/${CEPH_VERSION}/storage-ceph-${CEPH_VERSION}-$(uname -i).squashfs"
-
-
(
pit#
) Log the currently installed PIT packages.Having this information in the typescript can be helpful if problems are encountered during the install. This command was run once in a previous step -- running it again now is intentional.
/root/bin/metalid.sh
Expected output looks similar to the following (the versions in the example below may differ). There should be no errors.
= PIT Identification = COPY/CUT START ======================================= VERSION=ed97205-1706718622724 TIMESTAMP=2024-01-31_16:30:22 CRAY-Site-Init build signature... Build Commit : c9a07e366151a72be71d168061d08bd97da5344c-heads-v1.32.4 Build Time : 2023-10-20T14:53:30Z Go Version : go1.19 Version : v1.32.4 Platform : linux/amd64 canu-1.8.0-1.x86_64 ilorest-4.2.0.0-20.x86_64 metal-basecamp-1.2.6-1.x86_64 metal-ipxe-2.4.8-1.noarch metal-init-1.4.6-1.noarch metal-nexus-1.3.1-3.38.0_1.x86_64 metal-observability-1.0.9-1.x86_64 = PIT Identification = COPY/CUT END =========================================
Create the system configuration using one of the following options:
-
Create System Configuration Using Cluster Discovery Service: This is a dynamic discovery process, the system and its connections are dynamically discovered and compared with the SHCD to create the system configuration files. This method is highly recommended for the new installations.
-
Create System Configuration Using SHCD: This method relies on the SHCD data to create the system configuration files.
The following steps require create system configuration to have completed successfully.
-
(
pit#
) Upload the CSM tarball's RPMs and container images to the local Nexus instance./srv/cray/metal-provision/scripts/nexus/setup-nexus.sh -s
-
Add the local Zypper repositories for
noos
and the current SLES distribution.NOTE The
${releasever_major}
and${releasever_minor}
variables are interpolated by Zypper, the URI is intentionally wrapped with single-quotes to prevent the shell from interpolating them. Zypper will replace these variables with the currently running distributions major and minor version numbers.zypper addrepo --no-gpgcheck --refresh http://packages/repository/csm-noos csm-noos zypper addrepo --no-gpgcheck --refresh 'http://packages/repository/csm-sle-${releasever_major}sp${releasever_minor}' 'csm-sle'
-
(
pit#
) Ensure any new, updated packages pertinent to the CSM install are installed.NOTES
csm-testing
package provides the necessary tests and their dependencies for validating the pre-installation, installation, and more.- This provides
iuf
, a command line interface to the Install and Upgrade Framework.
zypper --no-gpg-checks install -y canu craycli csm-testing hpe-csm-goss-package iuf-cli platform-utils
-
(
pit#
) Verify that the LiveCD is ready by running the preflight tests.csi pit validate --livecd-preflight
If any tests fail, they need to be investigated. After actions have been taken to rectify the tests (for example, editing configuration or CSI inputs), then restart from the beginning of the Initialize the LiveCD procedure.
The following test failure may be ignored if the management network switches have not been configured. This is often the case when the system is being installed with CSM for the first time. Configuring switches is covered in the next topic.
Result: FAIL Source: /opt/cray/tests/install/livecd/suites/livecd-preflight-tests.yaml Test Name: sls_input.json IPs Correct Description: Extracts the switch IP addresses from sls_input.json and pings them to ensure they are accurate. Test Summary: check_sls_file_ips: exit-status: Error: Command execution timed out (20s) Execution Time: 0.000002214 seconds Node: eniac-pit
-
Save the
prep
directory for re-use.This needs to be copied off the system and either stored in a secure location or in a secured Git repository. There are secrets in this directory that should not be accidentally exposed.
After completing this procedure, proceed to configure the management network switches.