-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEP-0097 breakpoint before steps for taskrun #7391
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,7 +13,8 @@ weight: 108 | |
- [Breakpoint on Failure](#breakpoint-on-failure) | ||
- [Failure of a Step](#failure-of-a-step) | ||
- [Halting a Step on failure](#halting-a-step-on-failure) | ||
- [Exiting breakpoint](#exiting-breakpoint) | ||
- [Exiting onfailure breakpoint](#exiting-onfailure-breakpoint) | ||
- [Breakpoint before step](#breakpoint-before-step) | ||
- [Debug Environment](#debug-environment) | ||
- [Mounts](#mounts) | ||
- [Debug Scripts](#debug-scripts) | ||
|
@@ -59,12 +60,26 @@ stopping write of the `<step-no>.err` file and waiting on a signal by the user t | |
In this breakpoint, which is essentially a limbo state the TaskRun finds itself in, the user can interact with the step | ||
environment using a CLI or an IDE. | ||
|
||
#### Exiting breakpoint | ||
#### Exiting onfailure breakpoint | ||
|
||
To exit a step which has been paused upon failure, the step would wait on a file similar to `<step-no>.breakpointexit` which | ||
would unpause and exit the step container. eg: Step 0 fails and is paused. Writing `0.breakpointexit` in `/tekton/run` | ||
would unpause and exit the step container. | ||
|
||
### Breakpoint before step | ||
|
||
|
||
TaskRun will be stuck waiting for user debugging before the step execution. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NIT: The |
||
When beforeStep-Breakpoint takes effect, the user can see the following information | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NIT: what's "beforeStep-Breakpoint"? If it is the name of a configuration option or variable please use the inline code-block, else please use a description in plain english i.e. "When the breakpoint before step takes effect" |
||
from the corresponding step container log: | ||
``` | ||
debug before step breakpoint has taken effect, waiting for user's decision: | ||
1) continue, use cmd: /tekton/debug/scripts/debug-beforestep-continue | ||
2) fail-continue, use cmd: /tekton/debug/scripts/debug-beforestep-fail-continue | ||
``` | ||
1. Executing /tekton/debug/scripts/debug-beforestep-continue will continue to execute the step program | ||
2. Executing /tekton/debug/scripts/debug-beforestep-fail-continue will not continue to execute the task, and will mark the step as failed | ||
Comment on lines
+80
to
+81
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NIT: please wrap the file-paths with inline code blocks (e.g. `code block`) |
||
|
||
## Debug Environment | ||
|
||
Additional environment augmentations made available to the TaskRun Pod to aid in troubleshooting and managing step lifecycle. | ||
|
@@ -80,7 +95,13 @@ to reflect step number. eg: Step 0 will have `/tekton/debug/info/0`, Step 1 will | |
### Debug Scripts | ||
|
||
`/tekton/debug/scripts/debug-continue` : Mark the step as completed with success by writing to `/tekton/run`. eg: User wants to exit | ||
breakpoint for failed step 0. Running this script would create `/tekton/run/0` and `/tekton/run/0.breakpointexit`. | ||
onfailure breakpoint for failed step 0. Running this script would create `/tekton/run/0` and `/tekton/run/0/out.breakpointexit`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NIT: breakpoint on failure |
||
|
||
`/tekton/debug/scripts/debug-fail-continue` : Mark the step as completed with failure by writing to `/tekton/run`. eg: User wants to exit | ||
breakpoint for failed step 0. Running this script would create `/tekton/run/0.err` and `/tekton/run/0.breakpointexit`. | ||
onfailure breakpoint for failed step 0. Running this script would create `/tekton/run/0` and `/tekton/run/0/out.breakpointexit.err`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ditto |
||
|
||
`/tekton/debug/scripts/debug-beforestep-continue` : Mark the step continue to execute by writing to `/tekton/run`. eg: User wants to exit | ||
before step breakpoint for before step 0. Running this script would create `/tekton/run/0` and `/tekton/run/0/out.beforestepexit`. | ||
|
||
`/tekton/debug/scripts/debug-beforestep-fail-continue` : Mark the step not continue to execute by writing to `/tekton/run`. eg: User wants to exit | ||
before step breakpoint for before step 0. Running this script would create `/tekton/run/0` and `/tekton/run/0/out.beforestepexit.err`. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -284,4 +284,54 @@ There are known issues with the existing implementation of sidecars: | |
but an Error when the sidecar exits with an error. This is only apparent when | ||
using `kubectl` to get the pods of a TaskRun, not when describing the Pod | ||
using `kubectl describe pod ...` nor when looking at the TaskRun, but can be | ||
quite confusing. | ||
quite confusing. | ||
|
||
## Breakpoint on Failure | ||
|
||
Halting a TaskRun execution on Failure of a step. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NIT: s/Failure/failure |
||
|
||
### Failure of a Step | ||
|
||
The entrypoint binary is used to manage the lifecycle of a step. Steps are aligned beforehand by the TaskRun controller | ||
allowing each step to run in a particular order. This is done using `-wait_file` and the `-post_file` flags. The former | ||
let's the entrypoint binary know that it has to wait on creation of a particular file before starting execution of the step. | ||
And the latter provides information on the step number and signal the next step on completion of the step. | ||
|
||
On success of a step, the `-post-file` is written as is, signalling the next step which would have the same argument given | ||
for `-wait_file` to resume the entrypoint process and move ahead with the step. | ||
|
||
On failure of a step, the `-post_file` is written with appending `.err` to it denoting that the previous step has failed with | ||
and error. The subsequent steps are skipped in this case as well, marking the TaskRun as a failure. | ||
|
||
### Halting a Step on failure | ||
|
||
The failed step writes `<step-no>.err` to `/tekton/run` and stops running completely. To be able to debug a step we would | ||
need it to continue running (not exit), not skip the next steps and signal health of the step. By disabling step skipping, | ||
stopping write of the `<step-no>.err` file and waiting on a signal by the user to disable the halt, we would be simulating a | ||
"breakpoint". | ||
|
||
In this breakpoint, which is essentially a limbo state the TaskRun finds itself in, the user can interact with the step | ||
environment using a CLI or an IDE. | ||
|
||
### Exiting onfailure breakpoint | ||
|
||
To exit a step which has been paused upon failure, the step would wait on a file similar to `<step-no>.breakpointexit` which | ||
would unpause and exit the step container. eg: Step 0 fails and is paused. Writing `0.breakpointexit` in `/tekton/run` | ||
would unpause and exit the step container. | ||
|
||
## Breakpoint before step | ||
|
||
TaskRun will be stuck waiting for user debugging before the step execution. | ||
|
||
### Halting a Step before execution | ||
|
||
The step program will be executed after all the `-wait_file` monitoring ends. If want the user to enter the debugging before the step is executed, | ||
need to pass a parameter `debug_before_step` to `entrypoint`, | ||
and `entrypoint` will end the monitoring of `waitFiles` back pause, | ||
waiting to listen to the `/tekton/run/0/out.beforestepexit` file | ||
|
||
### Exiting before step breakpoint | ||
|
||
`entrypoint` listening `/tekton/run/{{ stepID }}/out.beforestepexit` or `/tekton/run/{{ stepID }}/out.beforestepexit.err` to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NIT: |
||
decide whether to proceed this step, `out.beforestepexit` means continue with step, | ||
`out.beforestepexit.err` means do not continue with the step. |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,6 +26,7 @@ import ( | |
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
"k8s.io/apimachinery/pkg/runtime/schema" | ||
"k8s.io/apimachinery/pkg/types" | ||
"k8s.io/apimachinery/pkg/util/sets" | ||
"k8s.io/utils/clock" | ||
"knative.dev/pkg/apis" | ||
duckv1 "knative.dev/pkg/apis/duck/v1" | ||
|
@@ -121,6 +122,9 @@ type TaskBreakpoints struct { | |
// failed step will not exit | ||
// +optional | ||
OnFailure string `json:"onFailure,omitempty"` | ||
// +optional | ||
// +listType=atomic | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NIT: I guess this could be "set" as well, since it doesn't make sense to have the same step twice, and it's also enforece at validation time |
||
BeforeSteps []string `json:"beforeSteps,omitempty"` | ||
} | ||
|
||
// NeedsDebugOnFailure return true if the TaskRun is configured to debug on failure | ||
|
@@ -131,14 +135,28 @@ func (trd *TaskRunDebug) NeedsDebugOnFailure() bool { | |
return trd.Breakpoints.OnFailure == EnabledOnFailureBreakpoint | ||
} | ||
|
||
// NeedsDebugBeforeStep return true if the step is configured to debug before execution | ||
func (trd *TaskRunDebug) NeedsDebugBeforeStep(stepName string) bool { | ||
if trd.Breakpoints == nil { | ||
return false | ||
} | ||
beforeStepSets := sets.NewString(trd.Breakpoints.BeforeSteps...) | ||
return beforeStepSets.Has(stepName) | ||
} | ||
|
||
// StepNeedsDebug return true if the step is configured to debug | ||
func (trd *TaskRunDebug) StepNeedsDebug(stepName string) bool { | ||
return trd.NeedsDebugOnFailure() | ||
return trd.NeedsDebugOnFailure() || trd.NeedsDebugBeforeStep(stepName) | ||
} | ||
|
||
// NeedsDebug return true if defined onfailure or have any before, after steps | ||
func (trd *TaskRunDebug) NeedsDebug() bool { | ||
return trd.NeedsDebugOnFailure() | ||
return trd.NeedsDebugOnFailure() || trd.HaveBeforeSteps() | ||
} | ||
|
||
// HaveBeforeSteps return true if have any before steps | ||
func (trd *TaskRunDebug) HaveBeforeSteps() bool { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NIT: to be consistent with other method names, this should be |
||
return trd.Breakpoints != nil && len(trd.Breakpoints.BeforeSteps) > 0 | ||
} | ||
|
||
// TaskRunInputs holds the input values that this task was invoked with. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, but it seems unrelated to this PR.
Is the reason for making
CheckForBreakpointOnFailure
public is to provide unit test coverage?