Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

taskrun pods not scheduled due to affinity issues #8422

Open
alanreynosov opened this issue Dec 6, 2024 · 1 comment
Open

taskrun pods not scheduled due to affinity issues #8422

alanreynosov opened this issue Dec 6, 2024 · 1 comment
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@alanreynosov
Copy link

alanreynosov commented Dec 6, 2024

Expected Behavior

to run a taskrun pod

Actual Behavior

does not schedule taskrun pod due to pod affinity conditions not met

Steps to Reproduce the Problem

  1. create a kn function (func create myfunc -l go)
  2. deploy kn function : func deploy --remote --registry ttl.sh

Additional Info

  • Kubernetes version:

    Output of kubectl version:

kubectl version
Client Version: v1.29.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.0+k3s1
  • Tekton Pipeline version:

    Output of tkn version or kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'

tkn version
Client version: 0.38.1
Pipeline version: v0.66.0

this is running on a vcluster.

with this tekton latest version get affinity issuer error

0/8 nodes are available: 1 Too many pods, 3 node(s) didn't match Pod's node affinity/selector, 4 node(s) didn't match pod affinity rules. preemption: 0/8 nodes are available: 1 No preemption victims found for incoming pod, 7 Preemption is not helpful for scheduling.
kubectl get po
NAME                                                READY   STATUS    RESTARTS   AGE
testn-pack-upload-pipeline-run-77rlh-scaffold-pod   0/1     Pending   0          46m
affinity-assistant-82b5fdbd2b-0                     1/1     Running   0          46m
kubectl get po demo-pack-upload-pipeline-run-test-scaffold-pod -ojson | jq -r '.status.conditions[]'        
{
  "lastProbeTime": null,
  "lastTransitionTime": "2024-12-09T16:48:04Z",
  "message": "0/4 nodes are available: 1 node(s) had untolerated taint {node-pool-dev: dev}, 3 node(s) didn't match pod affinity rules. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.",
  "reason": "Unschedulable",
  "status": "False",
  "type": "PodScheduled"
}

checking on affinity conditions I find this

  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app.kubernetes.io/component: affinity-assistant
              app.kubernetes.io/instance: affinity-assistant-65ba113202
          topologyKey: kubernetes.io/hostname

so evidently no running pod matches with required condition

Why I am not reporting this at knative project repo? because this works more less fine on tektoncd version 0.49.0
Why don't to continue using such version? because intermitently finish or get stucked waiting for something, not sure what since not any affinity report but when it fails shows an error message from tekton-pipeline controller about a missing pod

{"severity":"error","timestamp":"2024-12-04T10:11:40.379Z","logger":"tekton-pipelines-controller","caller":"controller/controller.go:566","message":"Reconcile error","commit":"c802069","knative.dev/controller":"github.com.tektoncd.pipeline.pkg.reconciler.taskrun.Reconciler","knative.dev/kind":"tekton.dev.TaskRun","knative.dev/traceid":"8e4d29ab-74fa-49d8-a405-0f318cb59f99","knative.dev/key":"default/devfunc-pack-upload-pipeline-run-228jf-scaffold","duration":0.004444159,"error":"pods \"devfunc-pack-upload-pipeline-run-228jf-scaffold-pod\" not found","stacktrace":"knative.dev/pkg/controller.(*Impl).handleErr\n\tknative.dev/[email protected]/controller/controller.go:566\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\tknative.dev/[email protected]/controller/controller.go:543\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\tknative.dev/[email protected]/controller/controller.go:491"}

edit:

it fails also running on GKE

@alanreynosov alanreynosov added the kind/bug Categorizes issue or PR as related to a bug. label Dec 6, 2024
@metacoma
Copy link

metacoma commented Dec 16, 2024

I have encountered the same issue

$ tkn version
Client version: 0.39.0
Pipeline version: v0.66.0
Dashboard version: v0.53.0
$ kubectl version
Client Version: v1.31.0
Kustomize Version: v5.4.2
Server Version: v1.31.3+k3s1
$ kubectl get pods
NAME                                                        READY   STATUS    RESTARTS   AGE
affinity-assistant-6177472940-0                             1/1     Running   0          8m42s
my-function-pack-git-pipeline-run-dns8x-fetch-sources-pod   0/1     Pending   0          8m42s

$ kubectl get po my-function-pack-git-pipeline-run-dns8x-fetch-sources-pod  -o yaml
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-12-16T17:18:35Z"
    message: '0/1 nodes are available: 1 node(s) didn''t match pod affinity rules.
      preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: BestEffort

metacoma added a commit to mindwm/mindwm-gitops that referenced this issue Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants