-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CompressionLevel and make v2 Kafka sink default #19169
base: main
Are you sure you want to change the base?
Conversation
Files changed: |
✅ Deploy Preview for cockroachdb-api-docs canceled.
|
✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.
|
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify site configuration. |
53b6051
to
c9da936
Compare
c9da936
to
de9c4ec
Compare
The `kafka_sink_config` option allows configuration of a changefeed's message delivery, Kafka server version, and batching parameters. | ||
You can configure flushing, acknowledgments, compression, and concurrency behavior of changefeeds running to a Kafka sink with the following: | ||
|
||
- Set the [`changefeed.sink_io_workers` cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-changefeed-sink-io-workers) to configure the number of concurrent workers used by changefeeds in the cluster when sending requests to a Kafka sink. When you set `changefeed.sink_io_workers`, it will not affect running changefeeds; [pause the changefeed]({% link {{ page.version.version }}/pause-job.md %}), set `changefeed.sink_io_workers`, and then [resume the changefeed]({% link {{ page.version.version }}/resume-job.md %}). Note that this cluster setting will also affect changefeeds running to [Google Cloud Pub/Sub](#google-cloud-pub-sub) sinks and [webhook sinks](#webhook-sink). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to add this now that the v2 Kafka sink is the default? (This paragraph is included for Pub/Sub + Webhook too.)
@@ -154,6 +146,7 @@ Field | Type | Description | Default | |||
`"Version"` | [`STRING`]({% link {{ page.version.version }}/string.md %}) | Sets the appropriate Kafka cluster version, which can be used to connect to [Kafka versions < v1.0](https://docs.confluent.io/platform/current/installation/versions-interoperability.html) (`kafka_sink_config='{"Version": "0.8.2.0"}'`). | `"1.0.0.0"` | |||
<a name="kafka-required-acks"></a>`"RequiredAcks"` | [`STRING`]({% link {{ page.version.version }}/string.md %}) | Specifies what a successful write to Kafka is. CockroachDB [guarantees at least once delivery of messages]({% link {{ page.version.version }}/changefeed-messages.md %}#ordering-and-delivery-guarantees) — this value defines the **delivery**. The possible values are: <br><br>`"ONE"`: a write to Kafka is successful once the leader node has committed and acknowledged the write. Note that this has the potential risk of dropped messages; if the leader node acknowledges before replicating to a quorum of other Kafka nodes, but then fails.<br><br>`"NONE"`: no Kafka brokers are required to acknowledge that they have committed the message. This will decrease latency and increase throughput, but comes at the cost of lower consistency.<br><br>`"ALL"`: a quorum must be reached (that is, most Kafka brokers have committed the message) before the leader can acknowledge. This is the highest consistency level. {% include {{ page.version.version }}/cdc/kafka-acks.md %} | `"ONE"` | |||
<a name="kafka-compression"></a>`"Compression"` | [`STRING`]({% link {{ page.version.version }}/string.md %}) | Sets a compression protocol that the changefeed should use when emitting events. The possible values are: `"NONE"`, `"GZIP"`, `"SNAPPY"`, `"LZ4"`, `"ZSTD"`. | `"NONE"` | |||
<span class="version-tag">New in v24.3:</span>`"CompressionLevel"` | [`INT`]({% link {{ page.version.version }}/int.md %}) | Sets the level of compression. This determines the level of compression ratio versus compression speed, i.e., how much the data size is reduced (better compression) and how quickly the compression process is completed. The compression protocols have the following ranges:<br>`GZIP`:<ul><li>`0` no compression</li><li>`1` to `9` best speed to best compression</li><li>`-1` default</li><li>`-2` [Huffman-only compression](https://en.wikipedia.org/wiki/Huffman_coding)</li></ul>`ZSTD`:<ul><li>`1` fastest</li><li>`2` default</li><li>`3` better compression</li><li>`4` best compression</li></ul>`LZ4`<ul><li>0 fast default</li><li>`512 * N` Level N, where N is between `1` and `9`. The higher the number, the better compression</li></ul>**Note:** If you have the `changefeed.new_kafka_sink.enabled` cluster setting disabled, `CompressionLevel` will not affect `LZ4` compression. `SNAPPY` does not support `CompressionLevel`. | `GZIP`: `-1`<br><br>`ZSTD`: `2`<br><br>`LZ4`: `0` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you mind cleaning this up a bit? capitalization, more words, english, etc. eg - 0: No Compression
also there's a gzip compression level -3: stateless compression
also, i just found that in kafkav2, it won't let you set compression level < 0. this should be a known issue i guess. just filed an issue for it: cockroachdb/cockroach#136492
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes DOC-11339, DOC-10867, DOC-10830, DOC-10700
This PR:
kafka_sink_config
option with theCompressionLevel
field in v24.3.changefeed.sink-io-workers
under Kafka for the default v2 sink.Rendered Preview
https://deploy-preview-19169--cockroachdb-docs.netlify.app/docs/v24.3/changefeed-sinks.html#kafka-sink-configuration