This repository has been archived by the owner on Jun 22, 2022. It is now read-only.
feat(data-lake): add scaffolding for stochastic-flink and DataLake Stack #103
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #84
Closes #95
This change adds a
DataLake
Stack that persists all observed Domain Events, Issued Commands and Store state changes in a Bounded Context. The data is collected into a single Kinesis Stream and processed by an Apache Flink application running on an Kinesis Analytics managed Flink cluster. This application consumes from the stream, partitions the data by type and time and stores the data in S3 as encrypted JSON and Parquet files. These partitions are then updated in the corresponding AWS Glue Tables so that they can be queried in Athena, Spark and Hadoop (or any other Hive-compatible consumer). Data can also be configured to be loaded into an AWS Timestream instance to enable fast time-stream analysis.TODO:
stochastic-flink
BoundedContext
toDataLakeProps
.