-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aggregate or separately process output artifacts of parallel pods #6805
Comments
I think this one of the intended uses of key-only artifacts. Could you achieve this by writing each output to a unique key within bucket storage, and have the aggregating/grouping pod can then just list the key within the bucket to find the needed files: spec:
entrypoint: dag
templates:
- name: dag
dag:
tasks:
# generate output artifacts of parallel pods
- name: gen
template: gen-artifact
arguments:
parameters:
- name: message
value: "{{item}}"
withItems: ["1", "2", "3"]
outputs:
artifacts:
- name: output
path: /output
s3:
key: "{{workflow.name}}/{{item}}"
# aggregate
- name: aggregate
template: aggregate
- name: gen-artifact
inputs:
...
outputs:
artifacts:
- name: output
path: /output
...
- name: aggregate
inputs:
artifacts:
- name: input
path: /inputs
s3:
key: "{{workflow.name}}" The use of |
Thanks for your reply!
|
Sorry for replying to you with my company account. "jixinchi" is my company account and "clumsy456" is my personal account. I should have used my company account to raise this issue. |
You are correct, it is in the template: spec:
entrypoint: dag
templates:
- name: dag
dag:
tasks:
# generate output artifacts of parallel pods
- name: gen
template: gen-artifact
arguments:
parameters:
- name: message
value: "{{item}}"
withItems: ["1", "2", "3"]
# aggregate
- name: aggregate
template: aggregate
- name: gen-artifact
inputs:
...
outputs:
artifacts:
- name: output
path: /output
s3:
key: "{{workflow.name}}/{{item}}"
...
- name: aggregate
inputs:
artifacts:
- name: input
path: /inputs
s3:
key: "{{workflow.name}}" |
I'm not sure if the {{workflow.name}} and {{item}} tag work when using What we discussed above is only about the "aggregate" case. Do you have any comments on the "separately process" case? |
I think most patterns should be supported using key-only artifacts. |
I've tried this workflow. The Key-only artifacts only supports cases that you have already known how many artifacts will be generated and how to user them before running the workflow. It is not suitable for |
Do you want to set-up a 30m to chat? |
I am from China, and it's 00:10 now... |
My email is [email protected]. |
What about using data template with S3? That allows you to drop a number of artifacts into a bucket, then use https://argoproj.github.io/argo-workflows/data-sourcing-and-transformation/ |
Data template is really a good method to solve similar problems. However, the |
I think that is not what data template does. It basically allows you to list artifacts, and then start a new process for each artifact. |
As is shown in the doc you provided, data template can only process each artifact by methods limited in the |
@clumsy456 Hi, I used the way you mentioned. But I get the error sometimes. apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: artifact-passing-
spec:
entrypoint: artifact-example
templates:
- name: artifact-example
steps:
- - name: generate-artifact
template: whalesay
- - name: consume-artifact
template: print-message
arguments:
artifacts:
# bind message to the hello-art artifact
# generated by the generate-artifact step
- name: message
from: "{{steps.generate-artifact.outputs.artifacts.hello-art}}"
- name: whalesay
container:
image: docker/whalesay:latest
command: [sh, -c]
args: ["cowsay hello world | tee /tmp/hello_world.txt"]
volumeMounts:
- name: out
mountPath: /tmp
volumes:
- name: out
emptyDir: { }
outputs:
artifacts:
# generate hello-art artifact from /tmp/hello_world.txt
# artifacts can be directories as well as files
- name: hello-art
path: /tmp/hello_world.txt
- name: print-message
inputs:
artifacts:
# unpack the message input artifact
# and put it at /tmp/message
- name: message
path: /tmp/message
container:
image: alpine:latest
command: [sh, -c]
args: ["cat /tmp/message"]
|
Above is just my proposal, not implemented... |
Is there a workaround for this problem? The issue summary indicates: Does that mean dag/tasks don't work with I see #6899 has been closed without merging, so I'm wondering what I can do to get |
Is this related to #13678? We see the same error when a non-existing optional artifact is consumed in an optional argument. Even without any looping. |
Summary
If a dag/task includes withItems/withParam parameter, we are not able to resolve its output artifacts, as is shown in #1625. I suggest a syntax to aggregate or separately process the output artifacts of parallel pods.
Use Cases
There are two situations to use the output artifacts of parallel pods.
separately process
I suggest task2 should include
withParam
parameter which refers to the output of task1, so the number of pods of task2 will be equal to that of task1.withParam
is{{tasks.A.outputs.artifacts.path}}
, we can use{{item}}
to refer the output artifact;withParam
is{{tasks.A.outputs.artifacts}}
, we can use{{item.path1}}
and{{item.path2}}
to refer to multi output artifacts of one pod;withParam
is{{tasks.A.outputs}}
, we can use{{item.parameters.param}}
and{{item.artifacts.path}}
to refer to output parameter and artifact of one pod.aggregate
I suggest add
fromMulti
andpathMulti
field to tasks.arguments.artifacts and templates.inputs.artifacts seperately.pathMulti
is just a template with {{index}}. The actual path in the aggregation pod, will be /tmp/0_path, /tmp/1_path and /tmp/2_path.Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
The text was updated successfully, but these errors were encountered: