Data movement #2297
Jesse-Bakker
started this conversation in
Feature Requests
Data movement
#2297
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Transactional consistency
To ensure transactional consistency, transactions need to be ordered
and, if supported by the sink, committed together. One option for this
is for the pipeline to act on
Transformations can happen in parallel, as long as the ordering of
transformations is restored before pushing the data to the sink,
something like:
Denormalization
There are two main shapes that denormalization may take. Lets take the
following example schema:
To keep the Dozer pipeline as stateless as possible, denormalization
state is pushed to the sink database, and changes to the denormalized
sink table are applied in a trigger-like fashion.
There are two main ways this might be denormalized, depending on the
purpose and sink type. (these are terms I came up with, so don't try to google them)
Aggregation denormalization
Example json record:
This is most common for sinks like Mongodb, and data might be replicated
to this sink for an application-specific cache. An insert into the
source
address
table would cause an update to be performed in mongodblike:
Expressing this denormalization in SQL would require implementing
something like
json_agg
.Analytical denormalization
Example denormalized schema:
An insert into the source
address
table would cause an update to beperformed in postgres like:
Beta Was this translation helpful? Give feedback.
All reactions