Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CompressionLevel and make v2 Kafka sink default #19169

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kathancox
Copy link
Contributor

@kathancox kathancox commented Nov 21, 2024

Fixes DOC-11339, DOC-10867, DOC-10830, DOC-10700

This PR:

  • Updates the kafka_sink_config option with the CompressionLevel field in v24.3.
  • Removes the note for the cluster setting to enable the v2 Kafka sink, because this is the default in v24.3.
  • Adds the cluster setting changefeed.sink-io-workers under Kafka for the default v2 sink.

Rendered Preview

https://deploy-preview-19169--cockroachdb-docs.netlify.app/docs/v24.3/changefeed-sinks.html#kafka-sink-configuration

Copy link

Files changed:

Copy link

netlify bot commented Nov 21, 2024

Deploy Preview for cockroachdb-api-docs canceled.

Name Link
🔨 Latest commit de9c4ec
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-api-docs/deploys/673fa4bbd57dbf0008b0e34c

Copy link

netlify bot commented Nov 21, 2024

Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name Link
🔨 Latest commit de9c4ec
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-interactivetutorials-docs/deploys/673fa4bb440c470008119d6b

Copy link

netlify bot commented Nov 21, 2024

Netlify Preview

Name Link
🔨 Latest commit de9c4ec
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-docs/deploys/673fa4bb084dbf0008b46722
😎 Deploy Preview https://deploy-preview-19169--cockroachdb-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@kathancox kathancox marked this pull request as ready for review November 21, 2024 21:26
The `kafka_sink_config` option allows configuration of a changefeed's message delivery, Kafka server version, and batching parameters.
You can configure flushing, acknowledgments, compression, and concurrency behavior of changefeeds running to a Kafka sink with the following:

- Set the [`changefeed.sink_io_workers` cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#setting-changefeed-sink-io-workers) to configure the number of concurrent workers used by changefeeds in the cluster when sending requests to a Kafka sink. When you set `changefeed.sink_io_workers`, it will not affect running changefeeds; [pause the changefeed]({% link {{ page.version.version }}/pause-job.md %}), set `changefeed.sink_io_workers`, and then [resume the changefeed]({% link {{ page.version.version }}/resume-job.md %}). Note that this cluster setting will also affect changefeeds running to [Google Cloud Pub/Sub](#google-cloud-pub-sub) sinks and [webhook sinks](#webhook-sink).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add this now that the v2 Kafka sink is the default? (This paragraph is included for Pub/Sub + Webhook too.)

@@ -154,6 +146,7 @@ Field | Type | Description | Default
`"Version"` | [`STRING`]({% link {{ page.version.version }}/string.md %}) | Sets the appropriate Kafka cluster version, which can be used to connect to [Kafka versions < v1.0](https://docs.confluent.io/platform/current/installation/versions-interoperability.html) (`kafka_sink_config='{"Version": "0.8.2.0"}'`). | `"1.0.0.0"`
<a name="kafka-required-acks"></a>`"RequiredAcks"` | [`STRING`]({% link {{ page.version.version }}/string.md %}) | Specifies what a successful write to Kafka is. CockroachDB [guarantees at least once delivery of messages]({% link {{ page.version.version }}/changefeed-messages.md %}#ordering-and-delivery-guarantees) — this value defines the **delivery**. The possible values are: <br><br>`"ONE"`: a write to Kafka is successful once the leader node has committed and acknowledged the write. Note that this has the potential risk of dropped messages; if the leader node acknowledges before replicating to a quorum of other Kafka nodes, but then fails.<br><br>`"NONE"`: no Kafka brokers are required to acknowledge that they have committed the message. This will decrease latency and increase throughput, but comes at the cost of lower consistency.<br><br>`"ALL"`: a quorum must be reached (that is, most Kafka brokers have committed the message) before the leader can acknowledge. This is the highest consistency level. {% include {{ page.version.version }}/cdc/kafka-acks.md %} | `"ONE"`
<a name="kafka-compression"></a>`"Compression"` | [`STRING`]({% link {{ page.version.version }}/string.md %}) | Sets a compression protocol that the changefeed should use when emitting events. The possible values are: `"NONE"`, `"GZIP"`, `"SNAPPY"`, `"LZ4"`, `"ZSTD"`. | `"NONE"`
<span class="version-tag">New in v24.3:</span>`"CompressionLevel"` | [`INT`]({% link {{ page.version.version }}/int.md %}) | Sets the level of compression. This determines the level of compression ratio versus compression speed, i.e., how much the data size is reduced (better compression) and how quickly the compression process is completed. The compression protocols have the following ranges:<br>`GZIP`:<ul><li>`0` no compression</li><li>`1` to `9` best speed to best compression</li><li>`-1` default</li><li>`-2` [Huffman-only compression](https://en.wikipedia.org/wiki/Huffman_coding)</li></ul>`ZSTD`:<ul><li>`1` fastest</li><li>`2` default</li><li>`3` better compression</li><li>`4` best compression</li></ul>`LZ4`<ul><li>0 fast default</li><li>`512 * N` Level N, where N is between `1` and `9`. The higher the number, the better compression</li></ul>**Note:** If you have the `changefeed.new_kafka_sink.enabled` cluster setting disabled, `CompressionLevel` will not affect `LZ4` compression. `SNAPPY` does not support `CompressionLevel`. | `GZIP`: `-1`<br><br>`ZSTD`: `2`<br><br>`LZ4`: `0`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind cleaning this up a bit? capitalization, more words, english, etc. eg - 0: No Compression

also there's a gzip compression level -3: stateless compression

also, i just found that in kafkav2, it won't let you set compression level < 0. this should be a known issue i guess. just filed an issue for it: cockroachdb/cockroach#136492

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also the formula for lz4 is actually 2^(8 + N) where n is between 1 and 9 inclusive), and 0 = "fast"

might be easier to just list the values tbh.
Screenshot 2024-12-02 at 12 25 29 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants