-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support batch ingestion in bulk API (#12457) #13306
Conversation
❌ Gradle check result for e404b2c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
server/src/main/java/org/opensearch/action/bulk/BulkRequest.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/action/bulk/BatchIngestionOption.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/action/bulk/BatchIngestionOption.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/ingest/CompoundProcessor.java
Outdated
Show resolved
Hide resolved
❌ Gradle check result for 6dd9e4d: Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
server/src/main/java/org/opensearch/action/bulk/BatchIngestionOption.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/action/bulk/BatchIngestionOption.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/action/bulk/BatchIngestionOption.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/action/bulk/BatchIngestionOption.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/action/bulk/BulkRequest.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/action/bulk/BulkRequest.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/rest/action/document/RestBulkAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/action/bulk/BatchIngestionOption.java
Outdated
Show resolved
Hide resolved
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #13306 +/- ##
============================================
+ Coverage 71.42% 71.45% +0.03%
- Complexity 59978 60862 +884
============================================
Files 4985 5046 +61
Lines 282275 286403 +4128
Branches 40946 41489 +543
============================================
+ Hits 201603 204640 +3037
- Misses 63999 64824 +825
- Partials 16673 16939 +266 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from a code perspective. I would still like @reta and/or @navneet1v to evaluate this and whether it addresses performance concerns raised on the RFC discussion.
Signed-off-by: Liyun Xiu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(can be in a future PR)
Handle and add tests for batch_size
= 0 and -1.
modules/ingest-common/src/yamlRestTest/resources/rest-api-spec/test/ingest/70_bulk.yml
Show resolved
Hide resolved
❕ Gradle check result for e2fb585: UNSTABLE
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Signed-off-by: Liyun Xiu <[email protected]>
server/src/main/java/org/opensearch/action/bulk/BulkRequest.java
Outdated
Show resolved
Hide resolved
❌ Gradle check result for 334e0c1: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
org.opensearch.action.admin.indices.create.RemoteSplitIndexIT.testCreateSplitIndex Looks like setup timed out, not sure it's a flake. |
server/src/main/java/org/opensearch/action/bulk/BulkRequest.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/action/bulk/BulkRequest.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Andriy Redko <[email protected]> Signed-off-by: Liyun Xiu <[email protected]>
Signed-off-by: Liyun Xiu <[email protected]>
❌ Gradle check result for 3cc7f41: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 68cabe1: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-13306-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 1219c568248fafa479d67a1eaa6e3e2d9748701e
# Push it to GitHub
git push --set-upstream origin backport/backport-13306-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x Then, create a pull request where the |
@chishui this will need a manual backport pls |
…13462) * Support batch ingestion in bulk API (#12457) (#13306) * [PoC][issues-12457] Support Batch Ingestion Signed-off-by: Liyun Xiu <[email protected]> * Rewrite batch interface and handle error and metrics Signed-off-by: Liyun Xiu <[email protected]> * Remove unnecessary change Signed-off-by: Liyun Xiu <[email protected]> * Revert some unnecessary test change Signed-off-by: Liyun Xiu <[email protected]> * Keep executeBulkRequest main logic untouched Signed-off-by: Liyun Xiu <[email protected]> * Add UT Signed-off-by: Liyun Xiu <[email protected]> * Add UT & yamlRest test, fix BulkRequest se/deserialization Signed-off-by: Liyun Xiu <[email protected]> * Add missing java docs Signed-off-by: Liyun Xiu <[email protected]> * Remove Writable from BatchIngestionOption Signed-off-by: Liyun Xiu <[email protected]> * Add more UTs Signed-off-by: Liyun Xiu <[email protected]> * Fix spotlesscheck Signed-off-by: Liyun Xiu <[email protected]> * Rename parameter name to batch_size Signed-off-by: Liyun Xiu <[email protected]> * Add more rest yaml tests & update rest spec Signed-off-by: Liyun Xiu <[email protected]> * Remove batch_ingestion_option and only use batch_size Signed-off-by: Liyun Xiu <[email protected]> * Throw invalid request exception for invalid batch_size Signed-off-by: Liyun Xiu <[email protected]> * Update server/src/main/java/org/opensearch/action/bulk/BulkRequest.java Co-authored-by: Andriy Redko <[email protected]> Signed-off-by: Liyun Xiu <[email protected]> * Remove version constant Signed-off-by: Liyun Xiu <[email protected]> --------- Signed-off-by: Liyun Xiu <[email protected]> Signed-off-by: Liyun Xiu <[email protected]> Co-authored-by: Andriy Redko <[email protected]> (cherry picked from commit 1219c56) * Adjust changelog item position to trigger CI Signed-off-by: Liyun Xiu <[email protected]> --------- Signed-off-by: Liyun Xiu <[email protected]>
…earch-project#13306) * [PoC][issues-12457] Support Batch Ingestion Signed-off-by: Liyun Xiu <[email protected]> * Rewrite batch interface and handle error and metrics Signed-off-by: Liyun Xiu <[email protected]> * Remove unnecessary change Signed-off-by: Liyun Xiu <[email protected]> * Revert some unnecessary test change Signed-off-by: Liyun Xiu <[email protected]> * Keep executeBulkRequest main logic untouched Signed-off-by: Liyun Xiu <[email protected]> * Add UT Signed-off-by: Liyun Xiu <[email protected]> * Add UT & yamlRest test, fix BulkRequest se/deserialization Signed-off-by: Liyun Xiu <[email protected]> * Add missing java docs Signed-off-by: Liyun Xiu <[email protected]> * Remove Writable from BatchIngestionOption Signed-off-by: Liyun Xiu <[email protected]> * Add more UTs Signed-off-by: Liyun Xiu <[email protected]> * Fix spotlesscheck Signed-off-by: Liyun Xiu <[email protected]> * Rename parameter name to batch_size Signed-off-by: Liyun Xiu <[email protected]> * Add more rest yaml tests & update rest spec Signed-off-by: Liyun Xiu <[email protected]> * Remove batch_ingestion_option and only use batch_size Signed-off-by: Liyun Xiu <[email protected]> * Throw invalid request exception for invalid batch_size Signed-off-by: Liyun Xiu <[email protected]> * Update server/src/main/java/org/opensearch/action/bulk/BulkRequest.java Co-authored-by: Andriy Redko <[email protected]> Signed-off-by: Liyun Xiu <[email protected]> * Remove version constant Signed-off-by: Liyun Xiu <[email protected]> --------- Signed-off-by: Liyun Xiu <[email protected]> Signed-off-by: Liyun Xiu <[email protected]> Co-authored-by: Andriy Redko <[email protected]>
…earch-project#13306) * [PoC][issues-12457] Support Batch Ingestion Signed-off-by: Liyun Xiu <[email protected]> * Rewrite batch interface and handle error and metrics Signed-off-by: Liyun Xiu <[email protected]> * Remove unnecessary change Signed-off-by: Liyun Xiu <[email protected]> * Revert some unnecessary test change Signed-off-by: Liyun Xiu <[email protected]> * Keep executeBulkRequest main logic untouched Signed-off-by: Liyun Xiu <[email protected]> * Add UT Signed-off-by: Liyun Xiu <[email protected]> * Add UT & yamlRest test, fix BulkRequest se/deserialization Signed-off-by: Liyun Xiu <[email protected]> * Add missing java docs Signed-off-by: Liyun Xiu <[email protected]> * Remove Writable from BatchIngestionOption Signed-off-by: Liyun Xiu <[email protected]> * Add more UTs Signed-off-by: Liyun Xiu <[email protected]> * Fix spotlesscheck Signed-off-by: Liyun Xiu <[email protected]> * Rename parameter name to batch_size Signed-off-by: Liyun Xiu <[email protected]> * Add more rest yaml tests & update rest spec Signed-off-by: Liyun Xiu <[email protected]> * Remove batch_ingestion_option and only use batch_size Signed-off-by: Liyun Xiu <[email protected]> * Throw invalid request exception for invalid batch_size Signed-off-by: Liyun Xiu <[email protected]> * Update server/src/main/java/org/opensearch/action/bulk/BulkRequest.java Co-authored-by: Andriy Redko <[email protected]> Signed-off-by: Liyun Xiu <[email protected]> * Remove version constant Signed-off-by: Liyun Xiu <[email protected]> --------- Signed-off-by: Liyun Xiu <[email protected]> Signed-off-by: Liyun Xiu <[email protected]> Co-authored-by: Andriy Redko <[email protected]>
@@ -26,6 +26,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), | |||
- Add cluster setting to dynamically configure the buckets for filter rewrite optimization. ([#13179](https://github.com/opensearch-project/OpenSearch/pull/13179)) | |||
- [Tiered Caching] Add a dynamic setting to disable/enable disk cache. ([#13373](https://github.com/opensearch-project/OpenSearch/pull/13373)) | |||
- [Remote Store] Add capability of doing refresh as determined by the translog ([#12992](https://github.com/opensearch-project/OpenSearch/pull/12992)) | |||
- [Batch Ingestion] Add `batch_size` to `_bulk` API. ([#12457](https://github.com/opensearch-project/OpenSearch/issues/12457)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't link point to PR instead of github issue link here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously PRs are usually pointed to, but I have no problem (and personally prefer) that the issue be linked in the changelog when a good issue does exist.
Description
This PR is to enable batch ingestion in
_bulk
API. Please refer to RFC for proposal and discussion. It includes three major changes:_bulk
APIbatch_ingestion_option
: it has two options:disabled
(default option) andenabled
maximum_batch_size
: batch size. If there are 100 documents in a_bulk
API for ingest, andmaximum_batch_size
is set to 20, then, there will be 5 batches in total with 20 documents in each batch.batchExecute
inProcessor
,Pipeline
, andCompoundProcessor
, so that they can process documents in batches.IngestService
, documents are processed in batch flow andbatchExecute
ofPipeline
will be called.Related Issues
Resolves #[Issue number to be closed when this PR is merged]
#12457
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.