S3 loader cannot access S3 across AWS regions (sink/load data to/from remote region) #283

donnyding · 2023-06-29T06:44:50Z

In 1.x, both the KinesisConfig (inStream) and S3Config (outStream) have its own setting of aws region. That means you can run the s3-loader in region A to consume data from Kinesis data streaming (region A) and persist raw/enriched events to S3 on region B.
Since 2.x, region is a global setting outside of "input" and "output" sections. The code logic always get the region from here even though I configure the s3 custom endpoint.

AWS client SDK provides the interface to turn on global bucket access. But snowplow-s3-loader has not exposed this setting in its pipeline configuration. Please review it and fix it.

donnyding · 2023-06-29T06:46:05Z

As a workaround, we can always force global bucket access.
client.setForceGlobalBucketAccessEnabled(true);

jbeemster · 2023-06-29T07:04:58Z

Hi @donnyding would you be able to share a bit more about the use-case you are trying to solve here and why you would want to read from Kinesis in one region and write to S3 in a different region?

As for the feature itself, if you have the bandwidth, we are always happy to review Pull Requests!

donnyding · 2023-06-29T09:19:40Z

hi @jbeemster,
Usage scenario:
In order to improve the HA, we plan to setup similar env in two aws regions. The health check API could be used in traffic routing policy. That means the event payloads will be routed to two regions, no duplicated data. The enrichment processing is better to persist raw/enriched events to a global s3 storage area.
That's why I consume data from Kinesis data streaming (region A) and sink data to S3 (region B).

As a workaround, I can force the global bucket access through AWS Client SDK interface. But it's not a perfect solution.

It's possible to separate the region setting for both input and output section in configuration file, just like what Snowplow-OSS does in v1.0. Or add new configuration item in output section, to provide the functionality to let customer make choice of enable/disable global bucket access. Make sense?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 loader cannot access S3 across AWS regions (sink/load data to/from remote region) #283

S3 loader cannot access S3 across AWS regions (sink/load data to/from remote region) #283

donnyding commented Jun 29, 2023

donnyding commented Jun 29, 2023

jbeemster commented Jun 29, 2023

donnyding commented Jun 29, 2023 •

edited

Loading

S3 loader cannot access S3 across AWS regions (sink/load data to/from remote region) #283

S3 loader cannot access S3 across AWS regions (sink/load data to/from remote region) #283

Comments

donnyding commented Jun 29, 2023

donnyding commented Jun 29, 2023

jbeemster commented Jun 29, 2023

donnyding commented Jun 29, 2023 • edited Loading

donnyding commented Jun 29, 2023 •

edited

Loading