From db0e7c2c888040050dd04a4f24bb08dd0f0aa7b3 Mon Sep 17 00:00:00 2001 From: David Venable Date: Thu, 21 Mar 2024 17:29:28 -0500 Subject: [PATCH 1/2] Adds a configuration for the Data Prepper S3 source workers field. Signed-off-by: David Venable --- _data-prepper/pipelines/configuration/sources/s3.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_data-prepper/pipelines/configuration/sources/s3.md b/_data-prepper/pipelines/configuration/sources/s3.md index 47641b67a2..b9dc27fed5 100644 --- a/_data-prepper/pipelines/configuration/sources/s3.md +++ b/_data-prepper/pipelines/configuration/sources/s3.md @@ -104,6 +104,7 @@ Option | Required | Type | Description `s3_select` | No | [s3_select](#s3_select) | The Amazon S3 Select configuration. `scan` | No | [scan](#scan) | The S3 scan configuration. `delete_s3_objects_on_read` | No | Boolean | When `true`, the S3 scan attempts to delete S3 objects after all events from the S3 object are successfully acknowledged by all sinks. `acknowledgments` should be enabled when deleting S3 objects. Default is `false`. +`workers` | No | Integer | Configures the number of worker threads that the source uses to read data from S3. We recommend leaving this value at the default unless your S3 objects are all very small (less than 1MB). Performance may decrease for larger S3 objects. This currently only affects SQS-based sources. Defaults to 1. ## sqs From 2f0f1cc507ba477de0c907fd06d5f3b1bb72bb2f Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Fri, 22 Mar 2024 14:58:32 -0500 Subject: [PATCH 2/2] Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _data-prepper/pipelines/configuration/sources/s3.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_data-prepper/pipelines/configuration/sources/s3.md b/_data-prepper/pipelines/configuration/sources/s3.md index b9dc27fed5..165933887d 100644 --- a/_data-prepper/pipelines/configuration/sources/s3.md +++ b/_data-prepper/pipelines/configuration/sources/s3.md @@ -104,7 +104,7 @@ Option | Required | Type | Description `s3_select` | No | [s3_select](#s3_select) | The Amazon S3 Select configuration. `scan` | No | [scan](#scan) | The S3 scan configuration. `delete_s3_objects_on_read` | No | Boolean | When `true`, the S3 scan attempts to delete S3 objects after all events from the S3 object are successfully acknowledged by all sinks. `acknowledgments` should be enabled when deleting S3 objects. Default is `false`. -`workers` | No | Integer | Configures the number of worker threads that the source uses to read data from S3. We recommend leaving this value at the default unless your S3 objects are all very small (less than 1MB). Performance may decrease for larger S3 objects. This currently only affects SQS-based sources. Defaults to 1. +`workers` | No | Integer | Configures the number of worker threads that the source uses to read data from S3. Leaving this value at the default unless your S3 objects are less than 1MB. Performance may decrease for larger S3 objects. This setting only affects SQS-based sources. Default is `1`. ## sqs