-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add concurent compaction docs #15218
Changes from all commits
2e43bc9
09ce393
95dce16
8168f7e
f4cf2fc
fabee9e
387fd2b
3caf1ae
3bf8ab2
a217887
b803299
293d12b
a60d3fe
73a1255
94483cc
14c8c95
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -162,7 +162,7 @@ To get statistics by API, send a [`GET` request](../api-reference/automatic-comp | |
|
||
## Examples | ||
|
||
The following examples demonstrate potential use cases in which auto-compaction may improve your Druid performance. See more details in [Compaction strategies](../data-management/compaction.md#compaction-strategies). The examples in this section do not change the underlying data. | ||
The following examples demonstrate potential use cases in which auto-compaction may improve your Druid performance. See more details in [Compaction strategies](../data-management/compaction.md#compaction-guidelines). The examples in this section do not change the underlying data. | ||
|
||
### Change segment granularity | ||
|
||
|
@@ -203,6 +203,106 @@ The following auto-compaction configuration compacts updates the `wikipedia` seg | |
} | ||
``` | ||
|
||
## Concurrent append and replace | ||
|
||
:::info | ||
Concurrent append and replace is an [experimental feature](../development/experimental.md) and is not currently available for SQL-based ingestion. | ||
::: | ||
|
||
This feature allows you to safely replace the existing data in an interval of a datasource while new data is being appended to that interval. One of the most common applications of this is appending new data (using say streaming ingestion) to an interval while compaction of that interval is already in progress. | ||
|
||
To set up concurrent append and replace, you need to ensure that your ingestion jobs have the appropriate lock types: | ||
|
||
You can enable concurrent append and replace by ensuring the following: | ||
- The append task (with `appendToExisting` set to `true`) has `taskLockType` set to `APPEND` in the task context. | ||
- The replace task (with `appendToExisting` set to `false`) has `taskLockType` set to `REPLACE` in the task context. | ||
- The segment granularity of the append task is equal to or finer than the segment granularity of the replace task. | ||
|
||
:::info | ||
|
||
When using concurrent append and replace, keep the following in mind: | ||
|
||
- Concurrent append and replace fails if the task with `APPEND` lock uses a coarser segment granularity than the task with the `REPLACE` lock. For example, if the `APPEND` task uses a segment granularity of YEAR and the `REPLACE` task uses a segment granularity of MONTH, you should not use concurrent append and replace. | ||
|
||
- Only a single task can hold a `REPLACE` lock on a given interval of a datasource. | ||
|
||
- Multiple tasks can hold `APPEND` locks on a given interval of a datasource and append data to that interval simultaneously. | ||
|
||
::: | ||
|
||
|
||
### Configure concurrent append and replace | ||
|
||
##### Update the compaction settings with the API | ||
|
||
Prepare your datasource for concurrent append and replace by setting its task lock type to `REPLACE`. | ||
Add the `taskContext` like you would any other automatic compaction setting through the API: | ||
|
||
```shell | ||
curl --location --request POST 'http://localhost:8081/druid/coordinator/v1/config/compaction' \ | ||
--header 'Content-Type: application/json' \ | ||
--data-raw '{ | ||
"dataSource": "YOUR_DATASOURCE", | ||
"taskContext": { | ||
"taskLockType": "REPLACE" | ||
} | ||
}' | ||
``` | ||
|
||
##### Update the compaction settings with the UI | ||
|
||
In the **Compaction config** for a datasource, set **Allow concurrent compactions (experimental)** to **True**. | ||
|
||
#### Add a task lock type to your ingestion job | ||
|
||
Next, you need to configure the task lock type for your ingestion job: | ||
|
||
- For streaming jobs, the context parameter goes in your supervisor spec, and the lock type is always `APPEND` | ||
- For legacy JSON-based batch ingestion, the context parameter goes in your ingestion spec, and the lock type can be either `APPEND` or `REPLACE`. | ||
|
||
You can provide the context parameter through the API like any other parameter for ingestion job or through the UI. | ||
|
||
##### Add the task lock type through the API | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we please explicitly add that a streaming supervisor spec must always have an APPEND lock when using concurrent append and replace? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added below in the append section that talks about lock type |
||
|
||
Add the following JSON snippet to your supervisor or ingestion spec if you're using the API: | ||
|
||
```json | ||
"context": { | ||
"taskLockType": LOCK_TYPE | ||
} | ||
``` | ||
|
||
The `LOCK_TYPE` depends on what you're trying to accomplish. | ||
|
||
Set `taskLockType` to `APPEND` if either of the following are true: | ||
|
||
- Dynamic partitioning with append to existing is set to `true` | ||
- The ingestion job is a streaming ingestion job | ||
|
||
If you have multiple ingestion jobs that append all targeting the same datasource and want them to run simultaneously, you need to also include the following context parameter: | ||
|
||
```json | ||
"useSharedLock": "true" | ||
``` | ||
|
||
Keep in mind that `taskLockType` takes precedence over `useSharedLock`. Do not use it with `REPLACE` task locks. | ||
|
||
|
||
Set `taskLockType` to `REPLACE` if you're replacing data. For example, if you use any of the following partitioning types, use `REPLACE`: | ||
|
||
- hash partitioning | ||
- range partitioning | ||
- dynamic partitioning with append to existing set to `false` | ||
|
||
|
||
##### Add a task lock using the Druid console | ||
|
||
As part of the **Load data** wizard for classic batch (JSON-based ingestion) and streaming ingestion, you can configure the task lock type for the ingestion during the **Publish** step: | ||
|
||
- If you set **Append to existing** to **True**, you can then set **Allow concurrent append tasks (experimental)** to **True**. | ||
- If you set **Append to existing** to **False**, you can then set **Allow concurrent replace tasks (experimental)** to **True**. | ||
|
||
|
||
## Learn more | ||
|
||
See the following topics for more information: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renders as