Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
Co-authored-by: Victoria Lim <[email protected]>
  • Loading branch information
317brian and vtlim authored Jan 25, 2024
1 parent b5bfdf3 commit d3566fe
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 12 deletions.
4 changes: 2 additions & 2 deletions docs/data-management/automatic-compaction.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ For more details on each of the specs in an auto-compaction configuration, see [
Compaction tasks may be interrupted when they interfere with ingestion. For example, this occurs when an ingestion task needs to write data to a segment for a time interval locked for compaction. If there are continuous failures that prevent compaction from making progress, consider one of the following strategies:

* Enable [concurrent append and replace tasks](#enable-concurrent-append-and-replace) on your datasource and on the ingestion tasks.
* Set `skipOffsetFromLatest` to reduce the chance of conflicts between ingestion and compaction. See more details in this section below.
* Set `skipOffsetFromLatest` to reduce the chance of conflicts between ingestion and compaction. See more details in [Skip latest segments from compaction](#skip-latest-segments-from-compaction).
* Increase the priority value of compaction tasks relative to ingestion tasks. Only recommended for advanced users. This approach can cause ingestion jobs to fail or lag. To change the priority of compaction tasks, set `taskPriority` to the desired priority value in the auto-compaction configuration. For details on the priority values of different task types, see [Lock priority](../ingestion/tasks.md#lock-priority).

The Coordinator compacts segments from newest to oldest. In the auto-compaction configuration, you can set a time period, relative to the end time of the most recent segment, for segments that should not be compacted. Assign this value to `skipOffsetFromLatest`. Note that this offset is not relative to the current time but to the latest segment time. For example, if you want to skip over segments from five days prior to the end time of the most recent segment, assign `"skipOffsetFromLatest": "P5D"`.
Expand All @@ -149,7 +149,7 @@ You can use concurrent append and replace to safely replace the existing data in

To do this, you need to update your datasource to allow concurrent append and replace tasks:

* If you're using the API, include the following `taskContext` in your API call: `"useConcurrentLocks": "true"`
* If you're using the API, include the following `taskContext` property in your API call: `"useConcurrentLocks": "true"`
* If you're using the UI, enable **Allow concurrent compactions (experimental)** in the **Compaction config** for your datasource.

You'll also need to update your ingestion jobs to include a task lock.
Expand Down
20 changes: 11 additions & 9 deletions docs/ingestion/concurrent-append-replace.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,18 +26,20 @@ title: Concurrent append and replace
Concurrent append and replace is an [experimental feature](../development/experimental.md) available for JSON-based batch and streaming. It is not currently available for SQL-based ingestion.
:::

This feature allows you to safely replace the existing data in an interval of a datasource while new data is being appended to that interval. One of the most common applications of this is appending new data (using say streaming ingestion) to an interval while compaction of that interval is already in progress.
Concurrent append and replace safely replaces the existing data in an interval of a datasource while new data is being appended to that interval. One of the most common applications of this feature is appending new data (using say streaming ingestion) to an interval while compaction of that interval is already in progress.

To set up concurrent append and replace, you need to use the context flag `useConcurrentLocks`. Druid will then determine the correct lock type for you, either append or replace. Although can set the type of lock manually, we don't recommend it.
To set up concurrent append and replace, use the context flag `useConcurrentLocks`. Druid will then determine the correct lock type for you, either append or replace. Although you can set the type of lock manually, we don't recommend it.

## Update the compaction settings

If want to append data to a datasource while compaction is running, you need to enable concurrent append and replace for the datasource by updating the compaction settings.
If you want to append data to a datasource while compaction is running, you need to enable concurrent append and replace for the datasource by updating the compaction settings.

### Update the compaction settings with the UI

In the **Compaction config** for a datasource, enable **Allow concurrent compactions (experimental)**.

For details on accessing the compaction config in the UI, see [Enable automatic compaction with the web console](automatic-compaction.md#web-console).

### Update the compaction settings with the API

Add the `taskContext` like you would any other automatic compaction setting through the API:
Expand All @@ -57,7 +59,7 @@ curl --location --request POST 'http://localhost:8081/druid/coordinator/v1/confi

You also need to configure the ingestion job to allow concurrent tasks.

You can provide the context parameter through the API like any other parameter for ingestion job or through the UI.
You can provide the context parameter like any other parameter for ingestion jobs through the API or the UI.

### Add a task lock using the Druid console

Expand All @@ -70,7 +72,7 @@ Add the following JSON snippet to your supervisor or ingestion spec if you're us
```json
"context": {
"useConcurrentLocks": true
}
}
```


Expand All @@ -88,18 +90,18 @@ When setting task lock types manually, you need to ensure the following:

Additionally, keep the following in mind:

- Concurrent append and replace fails if the task with `APPEND` lock uses a coarser segment granularity than the task with the `REPLACE` lock. For example, if the `APPEND` task uses a segment granularity of YEAR and the `REPLACE` task uses a segment granularity of MONTH, you should not use concurrent append and replace.
- Concurrent append and replace fails if the task with the `APPEND` lock uses a coarser segment granularity than the task with the `REPLACE` lock. For example, if the `APPEND` task uses a segment granularity of YEAR and the `REPLACE` task uses a segment granularity of MONTH, you should not use concurrent append and replace.

- Only a single task can hold a `REPLACE` lock on a given interval of a datasource.

- Multiple tasks can hold `APPEND` locks on a given interval of a datasource and append data to that interval simultaneously.

#### Add a task lock type to your ingestion job

Next, you need to configure the task lock type for your ingestion job:
You configure the task lock type for your ingestion job as follows:

- For streaming jobs, the context parameter goes in your supervisor spec, and the lock type is always `APPEND`
- For legacy JSON-based batch ingestion, the context parameter goes in your ingestion spec, and the lock type can be either `APPEND` or `REPLACE`.
- For streaming jobs, the `taskLockType` context parameter goes in your supervisor spec, and the lock type is always `APPEND`.
- For classic JSON-based batch ingestion, the `taskLockType` context parameter goes in your ingestion spec, and the lock type can be either `APPEND` or `REPLACE`.

You can provide the context parameter through the API like any other parameter for ingestion job or through the UI.

Expand Down
2 changes: 1 addition & 1 deletion docs/ingestion/native-batch.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ The `maxNumConcurrentSubTasks` in the `tuningConfig` determines the number of co

By default, JSON-based batch ingestion replaces all data in the intervals in your `granularitySpec` for any segment that it writes to. If you want to add to the segment instead, set the `appendToExisting` flag in the `ioConfig`. JSON-based batch ingestion only replaces data in segments where it actively adds data. If there are segments in the intervals for your `granularitySpec` that don't have data from a task, they remain unchanged. If any existing segments partially overlap with the intervals in the `granularitySpec`, the portion of those segments outside the interval for the new spec remain visible.

You can also perform concurrent appends and replaces. For more information, see [Concurrent append and replace](./concurrent-append-replace.md)
You can also perform concurrent append and replace tasks. For more information, see [Concurrent append and replace](./concurrent-append-replace.md)


#### Fully replacing existing segments using tombstones
Expand Down

0 comments on commit d3566fe

Please sign in to comment.