Skip to content
This repository has been archived by the owner on Jun 22, 2022. It is now read-only.

feat(data-lake): add scaffolding for stochastic-flink and DataLake Stack #103

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sam-goodwin
Copy link
Contributor

Closes #84
Closes #95

This change adds a DataLake Stack that persists all observed Domain Events, Issued Commands and Store state changes in a Bounded Context. The data is collected into a single Kinesis Stream and processed by an Apache Flink application running on an Kinesis Analytics managed Flink cluster. This application consumes from the stream, partitions the data by type and time and stores the data in S3 as encrypted JSON and Parquet files. These partitions are then updated in the corresponding AWS Glue Tables so that they can be queried in Athena, Spark and Hadoop (or any other Hive-compatible consumer). Data can also be configured to be loaded into an AWS Timestream instance to enable fast time-stream analysis.

TODO:

  • Implement AWS Glue and S3 Sink in stochastic-flink
  • Implement AWS Timestream Sink in `stochastic-flink
  • Enhance DataLake construct, making it so each table can be configured individually - use mapped types to map the BoundedContext to DataLakeProps.

@sam-goodwin sam-goodwin added the aws-serverless Related to serverless applications on AWS label Jul 15, 2021
@sam-goodwin sam-goodwin added this to the 1.0 milestone Jul 15, 2021
@sam-goodwin sam-goodwin requested a review from ryan-mars July 15, 2021 18:57
@sam-goodwin sam-goodwin self-assigned this Jul 15, 2021
@ryan-mars
Copy link
Owner

Is Data Lake per Bounded Context an ideal pattern? Shouldn't there be one (few) Data Lake(s) in an org all fed by the various Bounded Contexts?

Why might it make sense to have one DL per BC?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
aws-serverless Related to serverless applications on AWS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS Timestream Construct: Generate Glue tables for events
2 participants