Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-19690: Enable host network to access host sysctls #497

Conversation

yuumasato
Copy link
Member

@yuumasato yuumasato commented Mar 15, 2024

  • With HostNetwork: true the sysctl net.core.bpf_jit_harden becomes visible to the scanner container.
    Below is a pod that has access to the sysctls:
apiVersion: v1
kind: Pod
metadata:
  name: list-sysctls
spec:
  hostNetwork: true
  volumes:
    - name: host
      hostPath:
        path: /
        type: Directory
  containers:
  - name: list
    command:
      - cat
      - /host/proc/sys/net/ipv6/conf/all/accept_ra
      - /host/proc/sys/net/core/bpf_jit_harden
    image: registry.access.redhat.com/ubi8/ubi-minimal
    securityContext:
      runAsUser: 0
      privileged: true
    volumeMounts:
    - name: host
      mountPath: /host

$ oc create -f list-syctls-proc.yaml
$ oc logs list-sysctls

  • DNSPolicy: ClusterFirstWithHostNet allows the CO to upload to resultserver, otherwise we get the following error:
    {"level":"info","ts":"2024-03-15T18:45:57Z","logger":"cmd","msg":"Trying to upload to resultserver","url":"https://upstream-rhcos4-high-worker-rs:8443/"} {"level":"error","ts":"2024-03-15T18:45:57Z","logger":"cmd","msg":"Failed to upload results to server","error":"Post \"https://upstream-rhcos4-high-worker-rs:8443/\": dial tcp: lookup upstream-rhcos4-high-worker-rs on 10.0.0.2:53: no such host","stacktrace":"github.com/ComplianceAsCode/compliance-operator/cmd/manager.uploadToResultServer.func1\n\tgithub.com/ComplianceAsCode/compliance-operator/cmd/manager/resultcollector.go:316\ngithub.com/cenkalti/backoff/v4.RetryNotifyWithTimer.Operation.withEmptyData.func1\n\tgithub.com/cenkalti/backoff/[email protected]/retry.go:18\ngithub.com/cenkalti/backoff/v4.doRetryNotify[...]\n\tgithub.com/cenkalti/backoff/[email protected]/retry.go:88\ngithub.com/cenkalti/backoff/v4.RetryNotifyWithTimer\n\tgithub.com/cenkalti/backoff/[email protected]/retry.go:61\ngithub.com/cenkalti/backoff/v4.RetryNotify\n\tgithub.com/cenkalti/backoff/[email protected]/retry.go:49\ngithub.com/cenkalti/backoff/v4.Retry\n\tgithub.com/cenkalti/backoff/[email protected]/retry.go:38\ngithub.com/ComplianceAsCode/compliance-operator/cmd/manager.uploadToResultServer\n\tgithub.com/ComplianceAsCode/compliance-operator/cmd/manager/resultcollector.go:299\ngithub.com/ComplianceAsCode/compliance-operator/cmd/manager.handleCompleteSCAPResults.func1\n\tgithub.com/ComplianceAsCode/compliance-operator/cmd/manager/resultcollector.go:390"}

  • Use the content from Re-enable runtime check on network related sysctls content#11722, to check whether the scanner container can access the sysctls correctly.
    oc compliance bind -S default-auto-apply -N test profile/upstream-rhcos4-moderate

EDIT: I have re-tested and DNSPolicy: ClusterFirstWithHostNet indeed solves the no such host error when trying to upload to resultserver.

@openshift-ci-robot
Copy link
Collaborator

@yuumasato: This pull request references Jira Issue OCPBUGS-19690, which is invalid:

  • expected the bug to target either version "4.16." or "openshift-4.16.", but it targets "4.15.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

  • With HostNetwork: true the sysctl net.core.bpf_jit_harden becomes visible to the scanner container.
    Below is a pod that has access to the sysctls:
apiVersion: v1
kind: Pod
metadata:
 name: list-sysctls
spec:
 hostNetwork: true
 volumes:
   - name: host
     hostPath:
       path: /
       type: Directory
 containers:
 - name: list
   command:
     - cat
     - /host/proc/sys/net/ipv6/conf/all/accept_ra
     - /host/proc/sys/net/core/bpf_jit_harden
   image: registry.access.redhat.com/ubi8/ubi-minimal
   securityContext:
     runAsUser: 0
     privileged: true
   volumeMounts:
   - name: host
     mountPath: /host

$ oc create -f list-syctls-proc.yaml
$ oc logs list-sysctls

  • But with HostNetwork: true, the CO fails to upload to resultserver.
    {"level":"info","ts":"2024-03-15T18:45:57Z","logger":"cmd","msg":"Trying to upload to resultserver","url":"https://upstream-rhcos4-high-worker-rs:8443/"} {"level":"error","ts":"2024-03-15T18:45:57Z","logger":"cmd","msg":"Failed to upload results to server","error":"Post \"https://upstream-rhcos4-high-worker-rs:8443/\": dial tcp: lookup upstream-rhcos4-high-worker-rs on 10.0.0.2:53: no such host","stacktrace":"github.com/ComplianceAsCode/compliance-operator/cmd/manager.uploadToResultServer.func1\n\tgithub.com/ComplianceAsCode/compliance-operator/cmd/manager/resultcollector.go:316\ngithub.com/cenkalti/backoff/v4.RetryNotifyWithTimer.Operation.withEmptyData.func1\n\tgithub.com/cenkalti/backoff/[email protected]/retry.go:18\ngithub.com/cenkalti/backoff/v4.doRetryNotify[...]\n\tgithub.com/cenkalti/backoff/[email protected]/retry.go:88\ngithub.com/cenkalti/backoff/v4.RetryNotifyWithTimer\n\tgithub.com/cenkalti/backoff/[email protected]/retry.go:61\ngithub.com/cenkalti/backoff/v4.RetryNotify\n\tgithub.com/cenkalti/backoff/[email protected]/retry.go:49\ngithub.com/cenkalti/backoff/v4.Retry\n\tgithub.com/cenkalti/backoff/[email protected]/retry.go:38\ngithub.com/ComplianceAsCode/compliance-operator/cmd/manager.uploadToResultServer\n\tgithub.com/ComplianceAsCode/compliance-operator/cmd/manager/resultcollector.go:299\ngithub.com/ComplianceAsCode/compliance-operator/cmd/manager.handleCompleteSCAPResults.func1\n\tgithub.com/ComplianceAsCode/compliance-operator/cmd/manager/resultcollector.go:390"}
  • DNSPolicy: ClusterFirstWithHostNet is my unsuccessful attempt to fix that.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@Vincent056
Copy link

nice finding!

@BhargaviGudi
Copy link
Collaborator

/hold for test

@BhargaviGudi
Copy link
Collaborator

BhargaviGudi commented Apr 17, 2024

Verification passed with 4.16.0-0.nightly-2024-04-16-195622 + compliance-operator with PR #497 code + PR #11722 code

  1. Install CO
$ oc get pb
NAME              CONTENTIMAGE                                 CONTENTFILE         STATUS
ocp4              ghcr.io/complianceascode/k8scontent:latest   ssg-ocp4-ds.xml     VALID
rhcos4            ghcr.io/complianceascode/k8scontent:latest   ssg-rhcos4-ds.xml   VALID
upstream-ocp4     ghcr.io/complianceascode/k8scontent:11722    ssg-ocp4-ds.xml     VALID
upstream-rhcos4   ghcr.io/complianceascode/k8scontent:11722    ssg-rhcos4-ds.xml   VALID
  1. Create custom wrscan
  2. create auto-rem-ss to scan wrscan mcp rule only
$ oc get ss auto-rem-ss -oyaml
apiVersion: compliance.openshift.io/v1alpha1
autoApplyRemediations: true
autoUpdateRemediations: true
kind: ScanSetting
maxRetryOnTimeout: 3
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"compliance.openshift.io/v1alpha1","autoApplyRemediations":true,"autoUpdateRemediations":true,"kind":"ScanSetting","maxRetryOnTimeout":3,"metadata":{"annotations":{},"creationTimestamp":"2023-09-25T02:05:43Z","generation":1,"name":"auto-rem-ss","namespace":"openshift-compliance","resourceVersion":"43973","uid":"29426481-7cd1-48f0-a3cf-934c96f651eb"},"rawResultStorage":{"pvAccessModes":["ReadWriteOnce"],"rotation":5,"size":"2Gi"},"roles":["wrscan"],"scanTolerations":[{"operator":"Exists"}],"schedule":"0 1 * * *","showNotApplicable":false,"strictNodeScan":false,"timeout":"30m"}
  creationTimestamp: "2024-04-17T10:14:11Z"
  generation: 1
  name: auto-rem-ss
  namespace: openshift-compliance
  resourceVersion: "108142"
  uid: b3a50385-baad-43cf-8ac3-2fb3f1c502a6
rawResultStorage:
  pvAccessModes:
  - ReadWriteOnce
  rotation: 5
  size: 2Gi
roles:
- wrscan
scanTolerations:
- operator: Exists
schedule: 0 1 * * *
showNotApplicable: false
strictNodeScan: false
suspend: false
timeout: 30m
  1. Create ssb
$ oc compliance bind -N rhcos4-high-test -S auto-rem-ss profile/upstream-rhcos4-high
Creating ScanSettingBinding rhcos4-high-test
$ oc get scan
NAME                 PHASE   RESULT
upstream-rhcos4-high-wrscan   DONE    NON-COMPLIANT
  1. All the rules with auto-remediations are applied after 3 rounds are rescan.
$ oc compliance rerun-now scansettingbinding rhcos4-high-test
Rerunning scans from 'rhcos4-high-test': upstream-rhcos4-high-wrscan
Re-running scan 'openshift-compliance/upstream-rhcos4-high-wrscan'
$ oc get ccr -l compliance.openshift.io/automated-remediation=,compliance.openshift.io/check-status=FAIL  
No resources found in openshift-compliance namespace.

@BhargaviGudi
Copy link
Collaborator

/unhold

@BhargaviGudi
Copy link
Collaborator

/label qe-approved

@BhargaviGudi
Copy link
Collaborator

/lgtm

@yuumasato
Copy link
Member Author

@BhargaviGudi Thank you for testing this.

I re-tested again and cannot reproduce the error I had mentioned in PR description.
I was probably doing something wrong before.

@yuumasato
Copy link
Member Author

Below are some of the runtime objects collected, they match the static configuration now.

<unix-sys:sysctl_item id="100008715" status="exists">
  <unix-sys:name>net.core.bpf_jit_harden</unix-sys:name>
  <unix-sys:value>2</unix-sys:value>
</unix-sys:sysctl_item>
<ind-sys:textfilecontent_item id="100008714" status="exists">
  <ind-sys:filepath>/etc/sysctl.d/75-sysctl_net_core_bpf_jit_harden.conf</ind-sys:filepath>
  <ind-sys:path>/etc/sysctl.d</ind-sys:path>
  <ind-sys:filename>75-sysctl_net_core_bpf_jit_harden.conf</ind-sys:filename>
  <ind-sys:pattern>^[\s]*net.core.bpf_jit_harden[\s]*=[\s]*(.*)[\s]*$</ind-sys:pattern>
  <ind-sys:instance datatype="int">1</ind-sys:instance>
  <ind-sys:line>^[\s]*net.core.bpf_jit_harden[\s]*=[\s]*(.*)[\s]*$</ind-sys:line>
  <ind-sys:text>net.core.bpf_jit_harden=2</ind-sys:text>
  <ind-sys:subexpression>2</ind-sys:subexpression>
</ind-sys:textfilecontent_item>
<unix-sys:sysctl_item id="100008621" status="exists">
  <unix-sys:name>net.ipv6.conf.default.accept_ra</unix-sys:name>
  <unix-sys:value>0</unix-sys:value>
</unix-sys:sysctl_item>
<ind-sys:textfilecontent_item id="100008620" status="exists">
  <ind-sys:filepath>/etc/sysctl.d/75-sysctl_net_ipv6_conf_default_accept_ra.conf</ind-sys:filepath>
  <ind-sys:path>/etc/sysctl.d</ind-sys:path>
  <ind-sys:filename>75-sysctl_net_ipv6_conf_default_accept_ra.conf</ind-sys:filename>
  <ind-sys:pattern>^[\s]*net.ipv6.conf.default.accept_ra[\s]*=[\s]*(.*)[\s]*$</ind-sys:pattern>
  <ind-sys:instance datatype="int">1</ind-sys:instance>
  <ind-sys:line>^[\s]*net.ipv6.conf.default.accept_ra[\s]*=[\s]*(.*)[\s]*$</ind-sys:line>
  <ind-sys:text>net.ipv6.conf.default.accept_ra=0</ind-sys:text>
  <ind-sys:subexpression>0</ind-sys:subexpression>
</ind-sys:textfilecontent_item>

@yuumasato yuumasato requested review from Vincent056 and rhmdnd and removed request for mrogers950 April 19, 2024 16:20
Copy link

@Vincent056 Vincent056 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link

openshift-ci bot commented Jun 20, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BhargaviGudi, Vincent056, yuumasato

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [BhargaviGudi,Vincent056]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yuumasato
Copy link
Member Author

/jira refresh

@openshift-ci-robot
Copy link
Collaborator

@yuumasato: This pull request references Jira Issue OCPBUGS-19690, which is invalid:

  • expected the bug to target either version "4.17." or "openshift-4.17.", but it targets "4.16.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@yuumasato
Copy link
Member Author

/jira refresh

@openshift-ci-robot
Copy link
Collaborator

@yuumasato: This pull request references Jira Issue OCPBUGS-19690, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.17.0) matches configured target version for branch (4.17.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @xiaojiey

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from xiaojiey June 20, 2024 16:34
@rhmdnd
Copy link

rhmdnd commented Jun 25, 2024

The ROSA failure here looks like a provisioning/setup issue before the test even runs. Attempting to recheck since I'm not convinced the failure is due to this patch.

@rhmdnd
Copy link

rhmdnd commented Jun 25, 2024

/test e2e-rosa

'hostNetwork: true' grants access to the host's sysctl configurations.
'dnsPolicy: ClusterFirstWithHostnet' is required to access services.
@yuumasato yuumasato force-pushed the enable_host_network_for_sysctls branch from a10228d to f05e870 Compare June 26, 2024 08:10
@openshift-ci openshift-ci bot removed the lgtm label Jun 26, 2024
@yuumasato
Copy link
Member Author

Rebased to latest master, lets see how testing goes.

Copy link

🤖 To deploy this PR, run the following command:

make catalog-deploy CATALOG_IMG=ghcr.io/complianceascode/compliance-operator-catalog:497

@Vincent056
Copy link

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Jun 28, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit f3e5a91 into ComplianceAsCode:master Jun 28, 2024
14 checks passed
@openshift-ci-robot
Copy link
Collaborator

@yuumasato: Jira Issue OCPBUGS-19690: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-19690 has been moved to the MODIFIED state.

In response to this:

  • With HostNetwork: true the sysctl net.core.bpf_jit_harden becomes visible to the scanner container.
    Below is a pod that has access to the sysctls:
apiVersion: v1
kind: Pod
metadata:
 name: list-sysctls
spec:
 hostNetwork: true
 volumes:
   - name: host
     hostPath:
       path: /
       type: Directory
 containers:
 - name: list
   command:
     - cat
     - /host/proc/sys/net/ipv6/conf/all/accept_ra
     - /host/proc/sys/net/core/bpf_jit_harden
   image: registry.access.redhat.com/ubi8/ubi-minimal
   securityContext:
     runAsUser: 0
     privileged: true
   volumeMounts:
   - name: host
     mountPath: /host

$ oc create -f list-syctls-proc.yaml
$ oc logs list-sysctls

  • DNSPolicy: ClusterFirstWithHostNet allows the CO to upload to resultserver, otherwise we get the following error:
    {"level":"info","ts":"2024-03-15T18:45:57Z","logger":"cmd","msg":"Trying to upload to resultserver","url":"https://upstream-rhcos4-high-worker-rs:8443/"} {"level":"error","ts":"2024-03-15T18:45:57Z","logger":"cmd","msg":"Failed to upload results to server","error":"Post \"https://upstream-rhcos4-high-worker-rs:8443/\": dial tcp: lookup upstream-rhcos4-high-worker-rs on 10.0.0.2:53: no such host","stacktrace":"github.com/ComplianceAsCode/compliance-operator/cmd/manager.uploadToResultServer.func1\n\tgithub.com/ComplianceAsCode/compliance-operator/cmd/manager/resultcollector.go:316\ngithub.com/cenkalti/backoff/v4.RetryNotifyWithTimer.Operation.withEmptyData.func1\n\tgithub.com/cenkalti/backoff/[email protected]/retry.go:18\ngithub.com/cenkalti/backoff/v4.doRetryNotify[...]\n\tgithub.com/cenkalti/backoff/[email protected]/retry.go:88\ngithub.com/cenkalti/backoff/v4.RetryNotifyWithTimer\n\tgithub.com/cenkalti/backoff/[email protected]/retry.go:61\ngithub.com/cenkalti/backoff/v4.RetryNotify\n\tgithub.com/cenkalti/backoff/[email protected]/retry.go:49\ngithub.com/cenkalti/backoff/v4.Retry\n\tgithub.com/cenkalti/backoff/[email protected]/retry.go:38\ngithub.com/ComplianceAsCode/compliance-operator/cmd/manager.uploadToResultServer\n\tgithub.com/ComplianceAsCode/compliance-operator/cmd/manager/resultcollector.go:299\ngithub.com/ComplianceAsCode/compliance-operator/cmd/manager.handleCompleteSCAPResults.func1\n\tgithub.com/ComplianceAsCode/compliance-operator/cmd/manager/resultcollector.go:390"}

  • Use the content from Re-enable runtime check on network related sysctls content#11722, to check whether the scanner container can access the sysctls correctly.
    oc compliance bind -S default-auto-apply -N test profile/upstream-rhcos4-moderate

EDIT: I have re-tested and DNSPolicy: ClusterFirstWithHostNet indeed solves the no such host error when trying to upload to resultserver.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@yuumasato yuumasato deleted the enable_host_network_for_sysctls branch June 28, 2024 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants