-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DynamoDB: Add ctk load table
interface for processing CDC events
#247
Conversation
# TODO: Make configurable. | ||
create_stream=True, | ||
iterator_type="TRIM_HORIZON", | ||
sleep_time_no_records=0.2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. What the comment says.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iterator_type="TRIM_HORIZON"
(currently hard-coded) means it will always read the Stream from its starting point. Sure enough, this is probably the most important detail to be made configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
07b1974 adds a few options to configure details when connecting to the Kinesis Stream.
# TODO: Make configurable. | ||
create_stream=True, | ||
buffer_time=0.01, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dito.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
07b1974 adds a few options to configure details when connecting to the Kinesis Stream.
092da37
to
c299c4e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets go.
c299c4e
to
6841aa8
Compare
0ff229f
to
9c9cd6b
Compare
9c9cd6b
to
f305462
Compare
In contrast to the Lambda-based processor implementation, this one is a standalone one that can be used optimally in any Python environment, managed or not.
New options: batch-size, create, create-shards, start, seqno, idle-sleep, buffer-time.
f305462
to
07b1974
Compare
About
Running DynamoDB CDC events through Kinesis and processing them using an AWS Lambda is cumbersome more often than not, and not too suitable for collaboration and development purposes. This patch provides a standalone implementation, as a sister to the corresponding full-load implementation, DynamoDB Table Loader.
Documentation
Preview: https://cratedb-toolkit--247.org.readthedocs.build/io/dynamodb/cdc.html
Synopsis
Use AWS for real, or exercise using LocalStack.
Install
pip install 'cratedb-toolkit[kinesis] @ git+https://github.com/crate/cratedb-toolkit.git@dynamodb-cdc-standalone'
Details
This data nozzle is tapping into Change data capture for DynamoDB Streams, in this case using Kinesis Data Streams, for maximum universality, because using Kinesis isn't a bad idea: We will also use it to ingest other event/record types in the future, thus the protocol identifier
kinesis+dynamodb+cdc://
. On the egress side, towards CrateDB, it will use thedata
/aux
column strategy.It doesn't mean it's not cloud-ready, it is just more universal, because it can be used both in an ad hoc / standalone operations mode, in development sandboxes, and can also be invoked on any other managed Python environment, at your disposal.
Backlog I
create_stream
,iterator_type
,sleep_time_no_records
, etc.Backlog II
See DynamoDB: General backlog #231.
/cc @juanpardo, @hlcianfagna, @hammerhead, @wierdvanderhaar, @karynzv