Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run finally pipeline even if task is failed at the validation #8314

Merged
merged 4 commits into from
Oct 29, 2024

Conversation

divyansh42
Copy link
Member

@divyansh42 divyansh42 commented Oct 7, 2024

Fixes: #7330

Changes

Presently if one of the tasks in the pipeline is consuming results from the previous task but the previous task failed to produce the result then the pipeline fails without running the finally tasks.
These changes handle tasks that failed in the validation step by adding it to the new field named ValidationFailedTasks under struct PipelineRunFacts which will have all the tasks for which the taskrun is not created as it failed in the validation step by the controller.
These changes will also remove the logic of panic errors occurring if any of the validation fails. Instead of the controller returning a permanent error, it will remove all the tasks that have to be scheduled. This also changes the skip logic to skip the task and stop running the pipeline if any of the validation is failed.

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • Has Tests included if any functionality added or changed
  • pre-commit Passed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including functionality, content, code)
  • Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

NONE

@tekton-robot tekton-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesnt merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 7, 2024
@divyansh42
Copy link
Member Author

/kind bug

@tekton-robot tekton-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 7, 2024
@divyansh42
Copy link
Member Author

/test check-pr-has-kind-label

@tekton-robot
Copy link
Collaborator

@divyansh42: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-tekton-pipeline-alpha-integration-tests
  • /test pull-tekton-pipeline-beta-integration-tests
  • /test pull-tekton-pipeline-build-tests
  • /test pull-tekton-pipeline-integration-tests
  • /test pull-tekton-pipeline-unit-tests

The following commands are available to trigger optional jobs:

  • /test pull-tekton-pipeline-go-coverage

Use /test all to run all jobs.

In response to this:

/test check-pr-has-kind-label

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@divyansh42
Copy link
Member Author

/test pull-tekton-pipeline-go-coverage-df

@tekton-robot
Copy link
Collaborator

@divyansh42: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-tekton-pipeline-alpha-integration-tests
  • /test pull-tekton-pipeline-beta-integration-tests
  • /test pull-tekton-pipeline-build-tests
  • /test pull-tekton-pipeline-integration-tests
  • /test pull-tekton-pipeline-unit-tests

The following commands are available to trigger optional jobs:

  • /test pull-tekton-pipeline-go-coverage

Use /test all to run all jobs.

In response to this:

/test pull-tekton-pipeline-go-coverage-df

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 91.7% 91.8% 0.1
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.2% -0.5
pkg/reconciler/pipelinerun/resources/pipelinerunstate.go 99.6% 99.2% -0.4

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 91.7% 91.8% 0.1
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.2% -0.5
pkg/reconciler/pipelinerun/resources/pipelinerunstate.go 99.6% 99.2% -0.4
pkg/reconciler/pipelinerun/resources/resultrefresolution.go 99.2% 98.4% -0.8

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 91.7% 91.8% 0.1
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.2% -0.5
pkg/reconciler/pipelinerun/resources/pipelinerunstate.go 99.6% 99.2% -0.4
pkg/reconciler/pipelinerun/resources/resultrefresolution.go 99.2% 98.4% -0.8

@divyansh42 divyansh42 force-pushed the fix-finally-not-running branch from 57012ed to 9cb5f91 Compare October 8, 2024 07:06
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 91.7% 91.8% 0.1
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.2% -0.5
pkg/reconciler/pipelinerun/resources/pipelinerunstate.go 99.6% 99.2% -0.4

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 91.7% 91.8% 0.1
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.2% -0.5
pkg/reconciler/pipelinerun/resources/pipelinerunstate.go 99.6% 99.2% -0.4
pkg/reconciler/pipelinerun/resources/resultrefresolution.go 99.2% 98.4% -0.8

@divyansh42 divyansh42 force-pushed the fix-finally-not-running branch from 9cb5f91 to 5265441 Compare October 10, 2024 16:01
@divyansh42 divyansh42 changed the title WIP: Run finally pipeline even if task is failed at the validation Run finally pipeline even if task is failed at the validation Oct 10, 2024
@tekton-robot tekton-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 10, 2024
@divyansh42
Copy link
Member Author

/hold cancel

@divyansh42
Copy link
Member Author

/cc @vdemeester @afrittoli @chitrangpatel
Can I please get reviews on these changes? Thanks in advance.

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 91.7% 91.8% 0.1
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.2% -0.5
pkg/reconciler/pipelinerun/resources/pipelinerunstate.go 99.6% 99.2% -0.4

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 91.7% 91.8% 0.1
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.2% -0.5
pkg/reconciler/pipelinerun/resources/pipelinerunstate.go 99.6% 99.2% -0.4

Presently if one of the task in pipeline is consuming result from the previous task
but the previous failed to produce the result then pipeline fails without running
the finally tasks. These changes handles tasks which got failed in the validation
step.

Signed-off-by: divyansh42 <[email protected]>
@divyansh42 divyansh42 force-pushed the fix-finally-not-running branch from 5265441 to ceb2f6c Compare October 10, 2024 16:52
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 91.7% 91.8% 0.1
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.2% -0.5
pkg/reconciler/pipelinerun/resources/pipelinerunstate.go 99.6% 99.2% -0.4

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 91.7% 91.8% 0.1
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.2% -0.5
pkg/reconciler/pipelinerun/resources/pipelinerunstate.go 99.6% 99.2% -0.4

@afrittoli
Copy link
Member

/test pull-tekton-pipeline-go-coverage-df

@tekton-robot
Copy link
Collaborator

@afrittoli: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-tekton-pipeline-alpha-integration-tests
  • /test pull-tekton-pipeline-beta-integration-tests
  • /test pull-tekton-pipeline-build-tests
  • /test pull-tekton-pipeline-integration-tests
  • /test pull-tekton-pipeline-unit-tests

The following commands are available to trigger optional jobs:

  • /test pull-tekton-pipeline-go-coverage

Use /test all to run all jobs.

In response to this:

/test pull-tekton-pipeline-go-coverage-df

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 91.7% 91.7% 0.0
pkg/reconciler/pipelinerun/resources/pipelinerunresolution.go 96.7% 96.2% -0.5
pkg/reconciler/pipelinerun/resources/pipelinerunstate.go 99.6% 99.2% -0.4
pkg/reconciler/pipelinerun/resources/resultrefresolution.go 99.2% 98.4% -0.8

@divyansh42
Copy link
Member Author

@afrittoli, I have made the changes as per the discussion that happened over the Slack channel. Could you please take a look again?
Will squash the commits once all the reviews are resolved. Thank you.

Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, this looks good to me.
It may be good to add more test cases. but the current coverage seems enough to merge this.
/approve

// The PipelineRun should be marked as failed due to InvalidTaskResultReference.
// The PipelineRun should be marked as failed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the reason for failure will be available at TaskRun level, is that right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also concerned about not showing the exact reason for the failure. When I checked in the implementation inside the GetPipelineConditionStatus function, it gave status based on failed, and success. So it is more generalized.
The status will not be available at the TaskRun level as TaskRun will not be created.
Do you have any suggestions to deal with this in a better way so that we can have a proper reason for the failure?
Earlier we were failing right away so we were getting the exact error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea that I can think of is to have ValidationFailed as a map where it will store failed tasks along with the reason and inside the GetPipelineConditionStatus function we can have a check for the ValidationFailed and add the status accordingly.
If this looks good, I can try the implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question that comes with this approach is when setting the reason in GetPipelineConditionStatus is how to decide the reason for the multiple Validation Failed Task.
May be we can set this to a generic error like https://github.com/tektoncd/pipeline/blob/main/pkg/apis/pipeline/v1/pipelinerun_types.go#L385

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 23, 2024
@afrittoli
Copy link
Member

@tektoncd/core-maintainers can we have a second review please?

@vdemeester
Copy link
Member

/lgtmz

@vdemeester
Copy link
Member

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 29, 2024
@tekton-robot tekton-robot merged commit a161298 into tektoncd:main Oct 29, 2024
14 checks passed
@divyansh42
Copy link
Member Author

Thank you @afrittoli @vdemeester for the reviews 🙏

@vdemeester
Copy link
Member

@afrittoli should we backport it to LTS(es) ? (I missed it for 0.65.0...)

@vdemeester
Copy link
Member

/cherry-pick release-v0.65.x

@vdemeester
Copy link
Member

/cherry-pick release-v0.62.x

@vdemeester
Copy link
Member

/cherry-pick release-v0.59.x

@vdemeester
Copy link
Member

/cherry-pick release-v0.56.x

@tekton-robot
Copy link
Collaborator

@vdemeester: new pull request created: #8367

In response to this:

/cherry-pick release-v0.65.x

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot
Copy link
Collaborator

@vdemeester: new pull request created: #8368

In response to this:

/cherry-pick release-v0.62.x

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot
Copy link
Collaborator

@vdemeester: new pull request created: #8369

In response to this:

/cherry-pick release-v0.59.x

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot
Copy link
Collaborator

@vdemeester: new pull request created: #8370

In response to this:

/cherry-pick release-v0.56.x

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@divyansh42
Copy link
Member Author

@afrittoli @vdemeester is it possible to have a patch release for this? I see 0.65.0 is already released.

@vdemeester
Copy link
Member

@divyansh42 yes, we'll just wait for a few more bugfixes 👼🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesnt merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Finally tasks not triggered in case of task failing due to missing results
4 participants