Kinesis Samza

Kinesis Samza provides a connector architecture to allow Samza processing to be run against Amazon Kinesis. This connector makes the use of Kinesis Streams transparent to the Samza application, allowing for low friction migration between Apache Kafka and Amazon Kinesis.

Architecture

The Kinesis Samza connector integration architecture can be seen above. We continue to use YARN to run Samza Containers, which create a fixed number of Samza KinesisSystem based tasks. Each of these tasks uses the Amazon Kinesis Client Library https://github.com/awslabs/amazon-kinesis-client to access one or more Kinesis Shards. A Kinesis Record Processor pushes the Kinesis Events to the enclosing container in the IRecordProcess.processRecords method, and the Yarn Container persists these messages in a queue for subsequent delivery to Samza processing. When the Samza application completes processing, the Container checkpoints the latest record processed per Shard via the KCL application.

Caveats

Samza and Kafka are designed to work with a fixed number of Kafka partitions over the life of the Stream, and allocate a single Samza task per partition. Because Kinesis has an elastic and virtually unlimited number of Shards (which resemble Kafka partitions in some ways), if the Kinesis Stream is resharded at any point then a task may be delivered messages from different Shards. It is also possible that responsbility for a Shard could be transferred to another Samza task at any time. Because of this transfer capability, Samza tasks must not store state locally in process, as the partition keys which map to a container can change at any time. Instead, consider offloading state management tasks to an external datastore which can accomodate the message volume required.

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
bin		bin
conf		conf
src		src
.gitignore		.gitignore
.reviewboardrc		.reviewboardrc
Architecture.png		Architecture.png
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml
wikipedia-raw.json		wikipedia-raw.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kinesis Samza

Architecture

Caveats

About

Releases

Packages

Contributors 11

Languages

License

renato2099/SamzaKinesis

Folders and files

Latest commit

History

Repository files navigation

Kinesis Samza

Architecture

Caveats

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 11

Languages

Packages