Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Cost Estimator Using Past Statistics for Schedule Generator #3156

Merged
merged 19 commits into from
Jan 1, 2025

Conversation

Xiao-zhen-Liu
Copy link
Collaborator

@Xiao-zhen-Liu Xiao-zhen-Liu commented Dec 15, 2024

This PR introduces the CostEstimator trait which estimates the cost of a region, given some resource units.

  • The cost estimator is used by CostBasedScheduleGenerator to calculate the cost of a schedule during search.
  • Currently we only consider one type of schedule for each region plan, which is a total order of the regions. The cost of the schedule (and also the cost of the region plan) is thus the summation of the cost of each region.
  • The resource units are currently passed as placeholders because we assume a region will have all the resources when doing the estimation. The units may be used in the future if we consider different methods of schedule-generation. For example, if we allow two regions to run concurrently, the units will be split in half for each region.

A DefaultCostEstimator implementation is also added, which uses past execution statistics to estimate the wall-clock runtime of a region:

  • The runtime of each region is represented by the runtime of its longest-running operator.
  • The runtime of operators are estimated using the statistics from the latest successful execution of the workflow.
  • If such statistics do not exist (e.g., if it is the first execution, or if past executions all failed), we fall back to using number of materialized edges as the cost.
  • Added test cases using mock mysql data.

Copy link
Collaborator

@Yicong-Huang Yicong-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Xiao-zhen-Liu Xiao-zhen-Liu merged commit bf6ffc9 into master Jan 1, 2025
8 checks passed
@Xiao-zhen-Liu Xiao-zhen-Liu deleted the xiaozhen-add-cost-estimator branch January 1, 2025 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants