-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch active tasks from memory in SeekableStreamSupervisor #16098
base: master
Are you sure you want to change the base?
Fetch active tasks from memory in SeekableStreamSupervisor #16098
Conversation
What problems does this PR address? |
The SeekableStreamSupervisor fetches the task payloads for every active task in its datasource twice every RunNotice. |
This pull request has been marked as stale due to 60 days of inactivity. |
This pull request/issue has been closed due to lack of activity. If you think that |
@AmatyaAvadhanula , the change here makes sense to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
@AmatyaAvadhanula , the SeekableStreamSupervisor
also makes calls to taskStorage.getTask()
. I wonder if these calls should also first check for those tasks in memory. If yes, then we should probably just remove TaskStorage
from SeekableStreamSupervisor
and use TaskQueryTool
instead and route everything from there.
The TaskQueryTool
can decide if a task should be served from memory or storage.
What do you think?
…e_tasks_from_memory
This pull request has been marked as stale due to 60 days of inactivity. |
The SeekableStreamSupervisor fetches the task payloads for every active task in its datasource twice every RunNotice.
In large clusters, this may cause the RunNotice to take a long time when it may be able to complete within a couple of seconds otherwise.
If there are hundreds of supervisors, there are 4 * supervisors calls to the metadata store every minute to fetch all the active datasource task payloads. This change can help reduce the load on the db significantly in such cases.
This PR has: