Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Deployments] Sync workflow before activity tqs and batch updates #7003

Closed
wants to merge 4 commits into from

Conversation

dnr
Copy link
Member

@dnr dnr commented Dec 18, 2024

What changed?

  • When the deployment workflow updates user data of task queues that are in it when the "current" state changes, update workflow before activity task queues.
  • Call the sync user data activity in batches of 100, waiting for propagation between each one.

Why?

  • Updating workflow before activity is required for handling activities on unpinned workflows, to avoid workflows getting transitioned to a deployment by an activity start and then backwards by a workflow task start.
  • Batching avoids too-large activity payloads and redoing too much work on retries.

How did you test it?

existing tests

Potential risks

the workflow logic change is guarded with GetVersion call

@dnr dnr requested a review from a team as a code owner December 18, 2024 00:16
})
}
if workflow.GetVersion(ctx, "syncToTaskQueues", workflow.DefaultVersion, 0) == workflow.DefaultVersion {
err = d.syncToTaskQueues1(ctx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should we call these syncToTaskQueues0 and syncToTaskQueues1 to match the version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, makes sense

// Sync and wait for workflow task queues, then activity task queues, to ensure that
// workflows don't bounce between deployments due to activity/wft starts.
enumspb.TASK_QUEUE_TYPE_WORKFLOW,
enumspb.TASK_QUEUE_TYPE_ACTIVITY,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to support Nexus tasks too. they should be synced with activities I believe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

byType := d.State.TaskQueueFamilies[tqName].TaskQueues
for _, tqTypeInt := range workflow.DeterministicKeys(byType) {
tqType := enumspb.TaskQueueType(tqTypeInt)
if doWorkflow == (tqType == enumspb.TASK_QUEUE_TYPE_WORKFLOW) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this more. I don't think anymore that syncing WF type first will help us because it will make another race condition which might be even worse: wf moves to the new build via a WFT and now it schedules an activity but the activity goes to the previous build because the activity TQ user data is not yet updated.

I think it's best to keep things simple and sync all types in a single type. That also reduces the number of user data writes. For now, we can accept the back-and-forth between versions for unpinned wfs when the sync is not propagated to all partitions. That should be fine though because it happens all the time in unversioned rn. Later I think we can improve this by making the sync 2-phased: prepare then sync.

@ShahabT
Copy link
Collaborator

ShahabT commented Jan 14, 2025

It seems this wont land before we clone to entity workflows with new names. Let's close it but once the new wfs are added, add the batching part to the new wfs only. The workflow vs activity separation is not needed anymore.

@dnr
Copy link
Member Author

dnr commented Jan 14, 2025

It seems this wont land before we clone to entity workflows with new names. Let's close it but once the new wfs are added, add the batching part to the new wfs only. The workflow vs activity separation is not needed anymore.

Sounds good to me

@dnr dnr closed this Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants