diff --git a/docs/data-management/automatic-compaction.md b/docs/data-management/automatic-compaction.md index b63c0013c453..93c5538d5b66 100644 --- a/docs/data-management/automatic-compaction.md +++ b/docs/data-management/automatic-compaction.md @@ -136,7 +136,7 @@ For more details on each of the specs in an auto-compaction configuration, see [ Compaction tasks may be interrupted when they interfere with ingestion. For example, this occurs when an ingestion task needs to write data to a segment for a time interval locked for compaction. If there are continuous failures that prevent compaction from making progress, consider one of the following strategies: * Enable [concurrent append and replace tasks](#enable-concurrent-append-and-replace) on your datasource and on the ingestion tasks. -* Set `skipOffsetFromLatest` to reduce the chance of conflicts between ingestion and compaction. See more details in this section below. +* Set `skipOffsetFromLatest` to reduce the chance of conflicts between ingestion and compaction. See more details in [Skip latest segments from compaction](#skip-latest-segments-from-compaction). * Increase the priority value of compaction tasks relative to ingestion tasks. Only recommended for advanced users. This approach can cause ingestion jobs to fail or lag. To change the priority of compaction tasks, set `taskPriority` to the desired priority value in the auto-compaction configuration. For details on the priority values of different task types, see [Lock priority](../ingestion/tasks.md#lock-priority). The Coordinator compacts segments from newest to oldest. In the auto-compaction configuration, you can set a time period, relative to the end time of the most recent segment, for segments that should not be compacted. Assign this value to `skipOffsetFromLatest`. Note that this offset is not relative to the current time but to the latest segment time. For example, if you want to skip over segments from five days prior to the end time of the most recent segment, assign `"skipOffsetFromLatest": "P5D"`. @@ -149,7 +149,7 @@ You can use concurrent append and replace to safely replace the existing data in To do this, you need to update your datasource to allow concurrent append and replace tasks: -* If you're using the API, include the following `taskContext` in your API call: `"useConcurrentLocks": "true"` +* If you're using the API, include the following `taskContext` property in your API call: `"useConcurrentLocks": "true"` * If you're using the UI, enable **Allow concurrent compactions (experimental)** in the **Compaction config** for your datasource. You'll also need to update your ingestion jobs to include a task lock. diff --git a/docs/ingestion/concurrent-append-replace.md b/docs/ingestion/concurrent-append-replace.md index b0f7f23209ea..1a67b2ca87be 100644 --- a/docs/ingestion/concurrent-append-replace.md +++ b/docs/ingestion/concurrent-append-replace.md @@ -26,18 +26,20 @@ title: Concurrent append and replace Concurrent append and replace is an [experimental feature](../development/experimental.md) available for JSON-based batch and streaming. It is not currently available for SQL-based ingestion. ::: -This feature allows you to safely replace the existing data in an interval of a datasource while new data is being appended to that interval. One of the most common applications of this is appending new data (using say streaming ingestion) to an interval while compaction of that interval is already in progress. +Concurrent append and replace safely replaces the existing data in an interval of a datasource while new data is being appended to that interval. One of the most common applications of this feature is appending new data (using say streaming ingestion) to an interval while compaction of that interval is already in progress. -To set up concurrent append and replace, you need to use the context flag `useConcurrentLocks`. Druid will then determine the correct lock type for you, either append or replace. Although can set the type of lock manually, we don't recommend it. +To set up concurrent append and replace, use the context flag `useConcurrentLocks`. Druid will then determine the correct lock type for you, either append or replace. Although you can set the type of lock manually, we don't recommend it. ## Update the compaction settings -If want to append data to a datasource while compaction is running, you need to enable concurrent append and replace for the datasource by updating the compaction settings. +If you want to append data to a datasource while compaction is running, you need to enable concurrent append and replace for the datasource by updating the compaction settings. ### Update the compaction settings with the UI In the **Compaction config** for a datasource, enable **Allow concurrent compactions (experimental)**. +For details on accessing the compaction config in the UI, see [Enable automatic compaction with the web console](automatic-compaction.md#web-console). + ### Update the compaction settings with the API Add the `taskContext` like you would any other automatic compaction setting through the API: @@ -57,7 +59,7 @@ curl --location --request POST 'http://localhost:8081/druid/coordinator/v1/confi You also need to configure the ingestion job to allow concurrent tasks. -You can provide the context parameter through the API like any other parameter for ingestion job or through the UI. +You can provide the context parameter like any other parameter for ingestion jobs through the API or the UI. ### Add a task lock using the Druid console @@ -70,7 +72,7 @@ Add the following JSON snippet to your supervisor or ingestion spec if you're us ```json "context": { "useConcurrentLocks": true -} +} ``` @@ -88,7 +90,7 @@ When setting task lock types manually, you need to ensure the following: Additionally, keep the following in mind: -- Concurrent append and replace fails if the task with `APPEND` lock uses a coarser segment granularity than the task with the `REPLACE` lock. For example, if the `APPEND` task uses a segment granularity of YEAR and the `REPLACE` task uses a segment granularity of MONTH, you should not use concurrent append and replace. +- Concurrent append and replace fails if the task with the `APPEND` lock uses a coarser segment granularity than the task with the `REPLACE` lock. For example, if the `APPEND` task uses a segment granularity of YEAR and the `REPLACE` task uses a segment granularity of MONTH, you should not use concurrent append and replace. - Only a single task can hold a `REPLACE` lock on a given interval of a datasource. @@ -96,10 +98,10 @@ Additionally, keep the following in mind: #### Add a task lock type to your ingestion job -Next, you need to configure the task lock type for your ingestion job: +You configure the task lock type for your ingestion job as follows: -- For streaming jobs, the context parameter goes in your supervisor spec, and the lock type is always `APPEND` -- For legacy JSON-based batch ingestion, the context parameter goes in your ingestion spec, and the lock type can be either `APPEND` or `REPLACE`. +- For streaming jobs, the `taskLockType` context parameter goes in your supervisor spec, and the lock type is always `APPEND`. +- For classic JSON-based batch ingestion, the `taskLockType` context parameter goes in your ingestion spec, and the lock type can be either `APPEND` or `REPLACE`. You can provide the context parameter through the API like any other parameter for ingestion job or through the UI. diff --git a/docs/ingestion/native-batch.md b/docs/ingestion/native-batch.md index e7d01d450a50..fc234cce0a23 100644 --- a/docs/ingestion/native-batch.md +++ b/docs/ingestion/native-batch.md @@ -95,7 +95,7 @@ The `maxNumConcurrentSubTasks` in the `tuningConfig` determines the number of co By default, JSON-based batch ingestion replaces all data in the intervals in your `granularitySpec` for any segment that it writes to. If you want to add to the segment instead, set the `appendToExisting` flag in the `ioConfig`. JSON-based batch ingestion only replaces data in segments where it actively adds data. If there are segments in the intervals for your `granularitySpec` that don't have data from a task, they remain unchanged. If any existing segments partially overlap with the intervals in the `granularitySpec`, the portion of those segments outside the interval for the new spec remain visible. -You can also perform concurrent appends and replaces. For more information, see [Concurrent append and replace](./concurrent-append-replace.md) +You can also perform concurrent append and replace tasks. For more information, see [Concurrent append and replace](./concurrent-append-replace.md) #### Fully replacing existing segments using tombstones