[Pull-based Ingestion][WIP] Introduce the new pull-based ingestion engine, APIs, and Kafka plugin #16958

yupeng9 · 2025-01-06T19:02:51Z

Description

This PR implements the basics of the pull-based ingestion described in this RFC, including:

The APIs for the pull-based ingestion source
A Kafka plugin that implements the ingestion source API
A new IngestionEngine that pulls data from the ingestion sources

Currently WIP, and there are a few improvements to make and test coverage to increase

Related Issues

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2025-01-06T19:44:38Z

❌ Gradle check result for 16dd9d0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

server/src/main/java/org/opensearch/index/engine/IngestionEngine.java

Bukhtawar · 2025-01-08T08:22:46Z

server/src/main/java/org/opensearch/index/engine/IngestionEngine.java

+            String clientId = engineConfig.getIndexSettings().getNodeName()
+                + "-"
+                + engineConfig.getIndexSettings().getIndex().getName()
+                + "-"
+                + engineConfig.getShardId().getId();


Should we use ids instead of names like index uuid, node id etc

this is mainly for monitoring and operation, for example, kafka supports quota set by client-id. as long as we can uniquely identify a streaming consumer, it's sufficient. any suggestion?

Bukhtawar

Curious how would the FGAC security model work, espl with security plugin which intercepts transport actions to validate if authorised users can perform bulk actions on certain indices. Is the intent to handle permissions at a Kafka "partition level"
Another aspect is maintaining Kafka checkpoints durably, I'm yet to read that part but would be good to understand how are we handling fail overs and recoveries

server/src/main/java/org/opensearch/plugins/IngestionConsumerPlugin.java

server/src/main/java/org/opensearch/indices/ingest/package-info.java

server/src/main/java/org/opensearch/indices/ingest/StreamPoller.java

server/src/main/java/org/opensearch/index/engine/IngestionEngine.java

plugins/ingestion-kafka/build.gradle

Signed-off-by: Yupeng Fu <[email protected]>

…cessing Signed-off-by: Yupeng Fu <[email protected]>

Signed-off-by: Yupeng Fu <[email protected]>

github-actions · 2025-01-12T23:13:10Z

❌ Gradle check result for 6d86683: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions bot added enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing labels Jan 6, 2025

Bukhtawar reviewed Jan 8, 2025

View reviewed changes

server/src/main/java/org/opensearch/index/engine/IngestionEngine.java Outdated Show resolved Hide resolved

Bukhtawar reviewed Jan 8, 2025

View reviewed changes

andrross reviewed Jan 8, 2025

View reviewed changes

yupeng9 added 26 commits January 12, 2025 14:45

local update

07b63dc

Signed-off-by: Yupeng Fu <[email protected]>

add batch_start/end to stream poller

b584b6f

Signed-off-by: Yupeng Fu <[email protected]>

add index settings

cea7d62

Signed-off-by: Yupeng Fu <[email protected]>

local change

98d1b53

Signed-off-by: Yupeng Fu <[email protected]>

pass docmapper

814dde9

Signed-off-by: Yupeng Fu <[email protected]>

basic recovery

785805c

Signed-off-by: Yupeng Fu <[email protected]>

add kafka ingestion as plugin

ac92804

Signed-off-by: Yupeng Fu <[email protected]>

add integration test for kafka plugin

666bf07

Signed-off-by: Yupeng Fu <[email protected]>

cleanup

a7c3b49

Signed-off-by: Yupeng Fu <[email protected]>

use byte[] for message payload type

aeec744

Signed-off-by: Yupeng Fu <[email protected]>

javadocs

3e0f96c

Signed-off-by: Yupeng Fu <[email protected]>

add ingestionEngineTest

254de1a

Signed-off-by: Yupeng Fu <[email protected]>

test recovery test in ingestionEngineTest

d090510

Signed-off-by: Yupeng Fu <[email protected]>

unit tests for kafka plugin

7ed996b

Signed-off-by: Yupeng Fu <[email protected]>

style fix

8de3b64

Signed-off-by: Yupeng Fu <[email protected]>

add license

00f972e

Signed-off-by: Yupeng Fu <[email protected]>

more unit tests

1fdd58e

Signed-off-by: Yupeng Fu <[email protected]>

cleanup

5bd382a

Signed-off-by: Yupeng Fu <[email protected]>

use a blocking queue to pass polled messages to the processor for pro…

8cff2b1

…cessing Signed-off-by: Yupeng Fu <[email protected]>

address comments also remove security policy from bootstrap files

02f242f

Signed-off-by: Yupeng Fu <[email protected]>

support _op_type in message processing

32f8b84

Signed-off-by: Yupeng Fu <[email protected]>

simplify ingestion source class

92b1b01

Signed-off-by: Yupeng Fu <[email protected]>

address more comments

f8b3f05

Signed-off-by: Yupeng Fu <[email protected]>

kafka client sha

250f6c7

Signed-off-by: Yupeng Fu <[email protected]>

fix style

66cdffa

Signed-off-by: Yupeng Fu <[email protected]>

more style fix

6d86683

Signed-off-by: Yupeng Fu <[email protected]>

yupeng9 force-pushed the pull-ingestion branch from ad8a00f to 6d86683 Compare January 12, 2025 22:48

This was referenced Jan 13, 2025

[AUTOCUT] Gradle Check Flaky Test Report for ResourceAwareTasksTests #14293

Open

[AUTOCUT] Gradle Check Flaky Test Report for IndexingIT #14302

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pull-based Ingestion][WIP] Introduce the new pull-based ingestion engine, APIs, and Kafka plugin #16958

[Pull-based Ingestion][WIP] Introduce the new pull-based ingestion engine, APIs, and Kafka plugin #16958

yupeng9 commented Jan 6, 2025

github-actions bot commented Jan 6, 2025

Bukhtawar Jan 8, 2025

yupeng9 Jan 9, 2025

Bukhtawar left a comment •

edited

Loading

github-actions bot commented Jan 12, 2025

[Pull-based Ingestion][WIP] Introduce the new pull-based ingestion engine, APIs, and Kafka plugin #16958

Are you sure you want to change the base?

[Pull-based Ingestion][WIP] Introduce the new pull-based ingestion engine, APIs, and Kafka plugin #16958

Conversation

yupeng9 commented Jan 6, 2025

Description

Related Issues

github-actions bot commented Jan 6, 2025

Bukhtawar Jan 8, 2025

Choose a reason for hiding this comment

yupeng9 Jan 9, 2025

Choose a reason for hiding this comment

Bukhtawar left a comment • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Jan 12, 2025

Bukhtawar left a comment •

edited

Loading