Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pull-based Ingestion][WIP] Introduce the new pull-based ingestion engine, APIs, and Kafka plugin #16958

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

yupeng9
Copy link

@yupeng9 yupeng9 commented Jan 6, 2025

Description

This PR implements the basics of the pull-based ingestion described in this RFC, including:

  1. The APIs for the pull-based ingestion source
  2. A Kafka plugin that implements the ingestion source API
  3. A new IngestionEngine that pulls data from the ingestion sources

Currently WIP, and there are a few improvements to make and test coverage to increase

Related Issues

Resolves #16927 #16929 #16928

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing labels Jan 6, 2025
Copy link
Contributor

github-actions bot commented Jan 6, 2025

❌ Gradle check result for 16dd9d0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Comment on lines 147 to 151
String clientId = engineConfig.getIndexSettings().getNodeName()
+ "-"
+ engineConfig.getIndexSettings().getIndex().getName()
+ "-"
+ engineConfig.getShardId().getId();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use ids instead of names like index uuid, node id etc

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is mainly for monitoring and operation, for example, kafka supports quota set by client-id. as long as we can uniquely identify a streaming consumer, it's sufficient. any suggestion?

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious how would the FGAC security model work, espl with security plugin which intercepts transport actions to validate if authorised users can perform bulk actions on certain indices. Is the intent to handle permissions at a Kafka "partition level"
Another aspect is maintaining Kafka checkpoints durably, I'm yet to read that part but would be good to understand how are we handling fail overs and recoveries

Signed-off-by: Yupeng Fu <[email protected]>
Signed-off-by: Yupeng Fu <[email protected]>
Signed-off-by: Yupeng Fu <[email protected]>
Signed-off-by: Yupeng Fu <[email protected]>
Signed-off-by: Yupeng Fu <[email protected]>
Signed-off-by: Yupeng Fu <[email protected]>
Signed-off-by: Yupeng Fu <[email protected]>
Signed-off-by: Yupeng Fu <[email protected]>
Signed-off-by: Yupeng Fu <[email protected]>
Signed-off-by: Yupeng Fu <[email protected]>
Signed-off-by: Yupeng Fu <[email protected]>
Signed-off-by: Yupeng Fu <[email protected]>
Signed-off-by: Yupeng Fu <[email protected]>
Signed-off-by: Yupeng Fu <[email protected]>
Signed-off-by: Yupeng Fu <[email protected]>
Signed-off-by: Yupeng Fu <[email protected]>
Copy link
Contributor

❌ Gradle check result for 6d86683: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Pull-based ingestion source APIs
5 participants