You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When implementing large workflows at scale we often only want failed tasks to be re-run in the event of a failure as this preserves resources for other jobs in the infrastructure where our workflows are running.
Describe the solution you'd like
Implementing cacheing on tasks and/or using results storage can help alleviate the need to re-run successful tasks in large workflows (caching docs). It would be helpful for users to have a template when implementing this for large workloads, in the same vein including reporting/logging for where the tasks may be failing and why.
💻
What problem are you trying to solve?
When implementing large workflows at scale we often only want failed tasks to be re-run in the event of a failure as this preserves resources for other jobs in the infrastructure where our workflows are running.
Describe the solution you'd like
Implementing cacheing on tasks and/or using results storage can help alleviate the need to re-run successful tasks in large workflows (caching docs). It would be helpful for users to have a template when implementing this for large workloads, in the same vein including reporting/logging for where the tasks may be failing and why.
Results (TBD)(currently uses local storage)
1.0 potential reference: https://discourse.prefect.io/t/how-to-resume-mapped-task-runs-from-failure-at-scale-or-limit-the-amount-of-allowed-runs-that-may-fail/715
Describe alternatives you've considered
None at the moment.
Documentation, Adoption, Migration Strategy
No response
The text was updated successfully, but these errors were encountered: