Skip to content

Commit

Permalink
Add module 7 and 8
Browse files Browse the repository at this point in the history
  • Loading branch information
cptmorgan-rh committed Dec 18, 2024
1 parent 01dd425 commit 7412576
Show file tree
Hide file tree
Showing 3 changed files with 278 additions and 6 deletions.
10 changes: 8 additions & 2 deletions content/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,15 @@
** xref:module-06.adoc#commonerrors[Common Errors]
** xref:module-06.adoc#singleerrors[Searching for Specific Errors]
* xref:module-07.adoc[7. ]
* xref:module-07.adoc[7. Reviewing Cluster Upgrades]
** xref:module-07.adoc#gettingstarted[Getting Started]
** xref:module-07.adoc#partialupgrade[Partial Upgrade]
* xref:module-08.adoc[8. ]
* xref:module-08.adoc[8. Reviewing Installed Operators]
** xref:module-08.adoc#operators[Operators]
** xref:module-08.adoc#csv[ClusterServiceVersion]
** xref:module-08.adoc#subscription[Subscriptions]
** xref:module-08.adoc#installplan[InstallPlan]
* xref:module-09.adoc[9. OCP networking - Traffic is not distributed among Pod replicas]
** xref:module-09.adoc#configureomc[Configure `omc` to use the correct _must-gather_]
Expand Down
104 changes: 102 additions & 2 deletions content/modules/ROOT/pages/module-07.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,104 @@
= 3
= Reviewing Cluster Upgrades
:prewrap!:

temp
When reviewing a must-gather, it's very important to review the output of the `omc get clusterversion` command to identify the current Cluster Version, if an install is currently progressing, any errors in the Status field, and if there have been any failed installations which can be causing issues.

[#gettingstarted]
To get started we will be running the `omc get clusterversion` command and then running the command a second time and outputting to yaml. We specifically want to look at the History section which will show every upgrade ever performed on the cluster. In the example below we see three upgrades with the `4.14.18` upgrade showing a state of `Partial`.

.clusterversion
====
[source,bash]
----
$ omc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.14.27 True False 17d Cluster version is 4.14.27
$ omc get clusterversion -o yaml
[
history:
- completionTime: null
image: fr2.icr.io/armada-master/ocp-release:4.13.43-x86_64
startedTime: "2024-07-04T06:21:36Z"
state: Partial
verified: false
version: 4.13.43
- completionTime: "2024-07-04T06:21:36Z"
image: fr2.icr.io/armada-master/ocp-release:4.12.58-x86_64
startedTime: "2024-06-26T16:25:53Z"
state: Partial
verified: false
version: 4.12.58
- completionTime: "2024-06-26T16:25:53Z"
image: fr2.icr.io/armada-master/ocp-release:4.12.56-x86_64
startedTime: "2024-06-05T17:06:12Z"
state: Partial
verified: false
version: 4.12.56
...
----
====

[#partialupgrade]
A Partial upgrade is the result of manifests failing to be applied, objects not being updated or deleted, or items missing that result in the upgrade looping as it tries to progress past the issue. This then result in Cluster Operators remaining on an older version as seen in an example from an actual Customer case.

.clusteroperators
====
[source,bash]
----
$ omc get clusteroperators
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
console 4.13.43 True False False 1d
csi-snapshot-controller 4.13.43 True False False 239d
dns 4.12.40 True False False 239d
image-registry 4.13.43 True False False 134d
ingress 4.13.43 True False False 8d
insights 4.13.43 True False False 99d
kube-apiserver 4.13.43 True False False 239d
kube-controller-manager 4.13.43 True False False 239d
kube-scheduler 4.13.43 True False False 239d
kube-storage-version-migrator 4.13.43 True False False 8d
marketplace 4.13.43 True False False 239d
monitoring 4.13.43 True False False 159d
network 4.12.40 True False False 239d
node-tuning 4.13.43 True False False 8d
openshift-apiserver 4.13.43 True False False 239d
openshift-controller-manager 4.13.43 True False False 239d
openshift-samples 4.13.43 True False False 8d
operator-lifecycle-manager 4.13.43 True False False 239d
operator-lifecycle-manager-catalog 4.13.43 True False False 239d
operator-lifecycle-manager-packageserver 4.13.43 True False False 17h
service-ca 4.13.43 True False False 239d
storage 4.13.43 True False False 239d
----
====

After reviwing the Cluster Version and the Cluster Operators we next want to move to the `cluster-version-operator` pod located in the `openshift-cluster-version` namespace. There you can review the logs to see where the upgrade is stalling. In the example below we can see that the logs show the upgrade process is getting stuck on the DNS and Network Operator which matches what we see in the ClusterOperator status.

.cluster-version-operator
====
[source,bash]
----
I0718 14:20:33.366086 1 sync_worker.go:978] Precreated resource clusteroperator "network" (511 of 615)
I0718 14:20:33.404169 1 sync_worker.go:978] Precreated resource clusteroperator "dns" (522 of 615)
I0718 14:20:33.404237 1 sync_worker.go:708] Dropping status report from earlier in sync loop
I0718 14:20:33.404254 1 sync_worker.go:987] Running sync for namespace "openshift-network-operator" (505 of 615)
I0718 14:20:33.449849 1 sync_worker.go:1007] Done syncing for namespace "openshift-network-operator" (505 of 615)
I0718 14:20:33.449956 1 sync_worker.go:708] Dropping status report from earlier in sync loop
I0718 14:20:33.450181 1 sync_worker.go:987] Running sync for customresourcedefinition "networks.operator.openshift.io" (506 of 615)
I0718 14:20:33.498554 1 sync_worker.go:1007] Done syncing for customresourcedefinition "networks.operator.openshift.io" (506 of 615)
I0718 14:20:33.498601 1 sync_worker.go:708] Dropping status report from earlier in sync loop
I0718 14:20:33.498614 1 sync_worker.go:987] Running sync for customresourcedefinition "egressrouters.network.operator.openshift.io" (507 of 615)
I0718 14:20:33.545449 1 sync_worker.go:1007] Done syncing for customresourcedefinition "egressrouters.network.operator.openshift.io" (507 of 615)
I0718 14:20:33.545495 1 sync_worker.go:708] Dropping status report from earlier in sync loop
I0718 14:20:33.545507 1 sync_worker.go:987] Running sync for customresourcedefinition "operatorpkis.network.operator.openshift.io" (508 of 615)
I0718 14:20:33.593751 1 sync_worker.go:1007] Done syncing for customresourcedefinition "operatorpkis.network.operator.openshift.io" (508 of 615)
I0718 14:20:33.593790 1 sync_worker.go:708] Dropping status report from earlier in sync loop
I0718 14:20:33.593799 1 sync_worker.go:987] Running sync for clusterrolebinding "default-account-cluster-network-operator" (509 of 615)
I0718 14:20:33.641898 1 sync_worker.go:1007] Done syncing for clusterrolebinding "default-account-cluster-network-operator" (509 of 615)
I0718 14:20:33.642013 1 sync_worker.go:708] Dropping status report from earlier in sync loop
I0718 14:20:33.642033 1 sync_worker.go:987] Running sync for deployment "openshift-network-operator/network-operator" (510 of 615)
I0718 14:20:33.696357 1 sync_worker.go:1007] Done syncing for deployment "openshift-network-operator/network-operator" (510 of 615)
I0718 14:20:33.696477 1 sync_worker.go:987] Running sync for clusteroperator "network" (511 of 615)
E0718 14:20:33.696795 1 task.go:117] error running apply for clusteroperator "network" (511 of 615): Cluster operator network is updating version
----
====
170 changes: 168 additions & 2 deletions content/modules/ROOT/pages/module-08.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,170 @@
= 3
= Reviewing Installed Operators
:prewrap!:

temp
[#operators]
To view all of the operators installed on a cluster we will be utilizing the `omc get operators` command which will output all of the operators like we see in the following example. The default is show all operators but you can narrow them down by running specifying the namespace with the `-n` option.

.operators
====
[source,bash]
NAME AGE
ansible-automation-platform-operator.aap 1y
citrix-ingress-controller-operator.openshift-operators 1y
cluster-kube-descheduler-operator.openshift-kube-descheduler-op 1y
datagrid.openshift-operators 256d
falcon-operator-rhmp.falcon-operator 1y
falcon-operator.falcon-operator 1y
grafana-operator.openshift-operators 1d
openshift-gitops-operator.openshift-operators 1y
openshift-pipelines-operator-rh.openshift-operators 1y
portworx-certified.openshift-operators 1y
quay-operator.openshift-operators 1y
====

[#csv]
Next will will look at the CluserServiceVersion (CSV) which represents a particular version of a running operator on a cluster. It includes metadata such as name, description, version, repository link, labels, icon, etc. It declares owned/required CRDs, cluster requirements, and install strategy that tells the Operator Lifecycle Manager how to create required resources and set up the operator as a deployment.

In this example we will look at the CSVs installed in the `aap` namespace.

.csv
====
[source,bash]
omc get csv -n aap
NAME DISPLAY VERSION REPLACES PHASE
aap-operator.v2.4.0-0.1692675723 Ansible Automation Platform 2.4.0+0.1692675723 aap-operator.v2.3.0-0.1692727374 Succeeded
datagrid-operator.v8.5.1 Data Grid 8.5.1 datagrid-operator.v8.5.0 Succeeded
falcon-operator.v0.6.2 CrowdStrike Falcon Platform - Operator 0.6.2 Succeeded
grafana-operator.v5.12.0 Grafana Operator 5.12.0 grafana-operator.v5.11.0 Succeeded
openshift-gitops-operator.v1.12.5 Red Hat OpenShift GitOps 1.12.5 openshift-gitops-operator.v1.12.4 Succeeded
portworx-operator.v24.1.1 Portworx Enterprise 24.1.1 portworx-operator.v24.1.0 Succeeded
quay-operator.v3.9.8 Red Hat Quay 3.9.8 quay-operator.v3.9.6 Succeeded
====

[#subscription]
A Subscription represents an intention to install an operator. It is the CustomResource that relate an operator to a CatalogSource. Subscriptions describe which channel of an operator package to subscribe to, and whether to perform updates automatically or manually. If set to automatic, the Subscription ensures OLM will manage and upgrade the operator to ensure the latest version is always running in the cluster.

In this example we will look at the Subscription for the `ansible-automation-platform-operator` which will show us the Channel, installPlanApproval, name, source, sourceNamespace, and startingCSV.

Additionally, under status, it provided the InstallPlan.

.subscription
====
[source,bash]
$ omc get subscriptions -n aap ansible-automation-platform-operator -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
creationTimestamp: "2023-06-29T21:11:28Z"
generation: 5
labels:
operators.coreos.com/ansible-automation-platform-operator.aap: ""
name: ansible-automation-platform-operator
namespace: aap
resourceVersion: "700220891"
uid: fe232c2b-5c33-405a-929b-419a4191aeee
spec:
channel: stable-2.4-cluster-scoped
installPlanApproval: Manual
name: ansible-automation-platform-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
startingCSV: aap-operator.v2.3.0-0.1686242173
status:
catalogHealth:
- catalogSourceRef:
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
name: certified-operators
namespace: openshift-marketplace
resourceVersion: "700220797"
uid: bd368dbe-e081-42d9-b9e6-278ee26d372a
healthy: true
lastUpdated: "2024-04-28T06:45:35Z"
- catalogSourceRef:
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
name: community-operators
namespace: openshift-marketplace
resourceVersion: "700220851"
uid: 753a9d96-bc3c-4499-aecd-3f68e9420a3d
healthy: true
lastUpdated: "2024-04-28T06:45:35Z"
- catalogSourceRef:
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
name: redhat-marketplace
namespace: openshift-marketplace
resourceVersion: "700220809"
uid: 77a4db7f-6abf-4e49-9ed1-a30a89c46d2d
healthy: true
lastUpdated: "2024-04-28T06:45:35Z"
- catalogSourceRef:
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
name: redhat-operators
namespace: openshift-marketplace
resourceVersion: "700220844"
uid: a786e921-ac4c-4a06-a119-577591414821
healthy: true
lastUpdated: "2024-04-28T06:45:35Z"
conditions:
- lastTransitionTime: "2024-04-28T06:45:35Z"
message: all available catalogsources are healthy
reason: AllCatalogSourcesHealthy
status: "False"
type: CatalogSourcesUnhealthy
- lastTransitionTime: "2023-09-07T22:13:50Z"
reason: RequiresApproval
status: "True"
type: InstallPlanPending
currentCSV: aap-operator.v2.4.0-0.1693440031
installPlanGeneration: 5
installPlanRef:
apiVersion: operators.coreos.com/v1alpha1
kind: InstallPlan
name: install-pqvgf
namespace: aap
resourceVersion: "427979356"
uid: e4880b85-62bc-4a9d-ba46-affeb6244577
installedCSV: aap-operator.v2.4.0-0.1692675723
installplan:
apiVersion: operators.coreos.com/v1alpha1
kind: InstallPlan
name: install-pqvgf
uuid: e4880b85-62bc-4a9d-ba46-affeb6244577
lastUpdated: "2024-04-28T06:45:35Z"
state: UpgradePending
====

[#installplan]
Finally, we will look at the InstallPlan which defines a set of resources to be created in order to install or upgrade to a specific version of a ClusterService defined by a CSV.

.installplan
====
[source,bash]
apiVersion: operators.coreos.com/v1alpha1
kind: InstallPlan
metadata:
creationTimestamp: "2023-09-07T22:13:33Z"
generateName: install-
generation: 1
labels:
operators.coreos.com/ansible-automation-platform-operator.aap: ""
name: install-pqvgf
namespace: aap
ownerReferences:
- apiVersion: operators.coreos.com/v1alpha1
blockOwnerDeletion: false
controller: false
kind: Subscription
name: ansible-automation-platform-operator
uid: fe232c2b-5c33-405a-929b-419a4191aeee
resourceVersion: "427979731"
uid: e4880b85-62bc-4a9d-ba46-affeb6244577
spec:
approval: Manual
approved: false
clusterServiceVersionNames:
- aap-operator.v2.4.0-0.1693440031
generation: 5
====

0 comments on commit 7412576

Please sign in to comment.