Skip to content

v1.23.0

Compare
Choose a tag to compare
@temporal-cicd temporal-cicd released this 22 Mar 17:29
· 1341 commits to main since this release

Release Highlights

Breaking Changes

github.com/gogo/protobuf has been replaced with google.golang.org/protobuf

We've fully replaced the use of gogo/protobuf with the official google protobuf runtime. This has both developmental and operational impacts as prior to Server version v1.23.0 our protobuf code generator allowed invalid UTF-8 data to be stored as proto strings. This isn't allowed by the proto3 spec, so if you're running a custom-built temporal server and think some tenant may store arbitrary binary data in our strings you should set -tags protolegacy when compiling the server. If you use our Makefile this is already done.

If you don't and see an error like grpc: error unmarshalling request: string field contains invalid UTF-8 then you will need to enable this when building the server. If you're unsure then you should specify it anyways as there's no harm in doing so unless you relied on the protobuf compiler to ensure all strings were valid UTF-8.

Developers using our protobuf-generated code will notice that:

  • time.Time in proto structs will now be [timestamppb.Timestamp](https://pkg.go.dev/google.golang.org/[email protected]/types/known/timestamppb#section-documentation)
  • time.Duration will now be [durationpb.Duration](https://pkg.go.dev/google.golang.org/protobuf/types/known/durationpb)
  • V2-generated structs embed locks, so you cannot dereference them.
  • Proto enums will, when formatted to JSON, now be in SCREAMING_SNAKE_CASE rather than PascalCase.
    • If trying to deserialize old JSON with PascalCase to proto use [go.temporal.io/api/temporalproto](https://pkg.go.dev/go.temporal.io/api/temporalproto)
  • google/protobuf objects, or objects embedding these protos, cannot be compared using reflect.DeepEqual or anything that uses it. This includes testify and mock equality testers!
    • If you need reflect.DeepEqual for any reason you can use go.temporal.io/api/temporalproto.DeepEqual instead
    • If you need testify require/assert compatible checkers you can use the go.temporal.io/server/common/testing/protorequire, go.temporal.io/server/common/testing/protoassert packages
    • If you need matchers for gomock we have a helper under go.temporal.io/server/common/testing/protomock

New System Search Attributes

We added two new system search attributes: ParentWorkflowId and ParentRunId. If you have previously created custom search attributes with one of these names, attempts to set them will start to fail. We suggest updating your workflows to not set those search attributes, delete those search attributes and then upgrade Temporal to this version.

Alternatively, you can also set the dynamic config system.supressErrorSetSystemSearchAttribute to true. When this dynamic config is set values for system search attributes will be ignored instead of causing your workflow to fail. Please use this as temporary workaround, because it could hide real issue in users workflows.

Schema changes

Before upgrading your Temporal Cluster to v1.23.0, you must upgrade your core and visibility schemas to the following:

  • Core:
    • MySQL schema v1.11
    • PostgreSQL schema v1.11
    • Cassandra schema v1.9
  • Visibility:
    • Elasticsearch schema v6
    • MySQL schema 1.4
    • PostgreSQL schema v1.4

Please see our upgrade documentation for the necessary steps to upgrade your schemas.

Deprecation Announcements

  • We've replaced all individual metrics describing Commands (e.g. complete_workflow_command, continue_as_new_command etc.) with a single metric called command which has a tag “commandType” describing the specific command type (see #4995)
  • Standard visibility will be deprecated in the next release v1.24.0 along with old config key names, i.e. this is the last minor version to support them. Please refer to v1.20.0 release notes for upgrade instructions, and also refer to v1.21.0 release notes for config key changes.
  • In advanced visibility, the LIKE operator will no longer be supported in v1.24.0. It never did what it was meant to do, and only added confusing behavior when used with Elasticsearch.

Golang version

  • Upgraded golang to 1.21

Batch workflow reset by Build ID

For situations where an operator wants to handle a bad deployment using workflow
reset, the batch reset operation can now reset to before the first workflow task
processed by a specific build id. This is based on reset points that are created
when build id changes between workflow tasks. Note that this also applies across
continue-as-new.

This operation is not currently supported by a released version of the CLI, but
you can use it through the gRPC API directly, e.g. using the Go SDK:

client.WorkflowService().StartBatchOperationRequest(ctx, &workflowservice.StartBatchOperationRequest{
	JobId:           uuid.New(),
	Namespace:       "my-namespace",
	// Select workflows that were touched by a specific build id:
	VisibilityQuery: fmt.Sprintf(`BuildIds = "unversioned:bad-build"`),
	Reason:          "reset bad build",
	Operation: &workflowservice.StartBatchOperationRequest_ResetOperation{
		ResetOperation: &batch.BatchOperationReset{
			Identity: "bad build resetter",
			Options: &commonpb.ResetOptions{
				Target: &commonpb.ResetOptions_BuildId{
					BuildId: "bad-build",
				},
				ResetReapplyType: enumspb.RESET_REAPPLY_TYPE_SIGNAL,
			},
		},
	},
})

History Task DLQ

We've added a DLQ to history service to handle poison pills in transfer / timer queues and other history task queues including visibility and replication queues. You can see our operators guide for more details.

If you want tasks experiencing unexpected errors to go to the DLQ after a certain number of failures you can set the history.TaskDLQUnexpectedErrorAttempts dynamic config.

Approximately FIFO Task Queueing

Once this feature is enabled, our task queues will be roughly FIFO.

This is disabled by default in 1.23, as we continue testing it but expect that it’ll be enabled by default in 1.24. To enable it the following config should be set to a short duration (e.g. 5sec) from its current default value (10yrs): "matching.backlogNegligibleAge"

We've added the following metrics as part of this effort:

  • poll_latency - this is a per-task-queue histogram of the duration between worker poll request and response (with or without task) calculated from the Matching server’s perspective
  • task_dispatch_latency - this is a histogram of schedule_to_start time from Matching's perspective, broken down by task queue and task source (backlog vs history)

Global Persistence Rate Limiting

We've added the ability to specify global (cluster level) rate limiting value for the persistence layer. You can configure by specifying the following dynamic config values:

  • frontend.persistenceGlobalMaxQPS
  • history.persistenceGlobalMaxQPS
  • matching.persistenceGlobalMaxQPS
  • worker.persistenceGlobalMaxQPS

You can also specify this on the per-namespace level using

  • frontend.persistenceGlobalNamespaceMaxQPS
  • history.persistenceGlobalNamespaceMaxQPS
  • matching.persistenceGlobalNamespaceMaxQPS
  • worker.persistenceGlobalNamespaceMaxQPS

Please be aware that this functionality is experimental. This global rate limiting isn’t workload aware but shard-aware; we currently allocate this QPS to each pod based on the number of shards they own rather than the demands of the workload, so pods with many low-workload shards will have a higher allocation of this limit than pods with fewer but more active workloads. If you plan to use this you will want to set the QPS value with some headroom (like 25%) to account for this.

Renamed Deadlock-detector metrics

The metrics exported by the deadlock detector were renamed to use a dd_ prefix to avoid confusion with other lock latency metrics. Affected metrics: dd_cluster_metadata_lock_latency, dd_cluster_metadata_callback_lock_latency, dd_shard_controller_lock_latency, dd_shard_lock_latency, dd_namespace_registry_lock_latency.

Visibility Prefix Search

Visibility API now supports prefix search by using the keyword STARTS_WITH. Eg: WorkflowType STARTS_WITH 'hello_world'. Check the Visibility documentation for additional information on supported operators.

Helpful links to get you started with Temporal

Temporal Docs
Server
Docker Compose
Helm Chart

Docker images for this release (use tag v1.23.0)

Server
Server With Auto Setup (what is Auto-Setup?)
Admin-Tools

New Contributors

Full Changelog: v1.22.6...v1.23.0