Code repository for the upcoming O'Reilly book: Mastering Kafka Streams and ksqlDB by Mitch Seymour
Edition | Kafka Streams version | ksqlDB version | Publication Date | Branch |
---|---|---|---|---|
Early Release | 2.6.0 | 0.12.0 | May 2020 | early-release |
1st Edition | 2.7.0 | 0.14.0 | February 2021 | 1st-edition |
main | 2.7.2 | 0.25.0 | May 2022 | [master][master] |
The Streams API is not compatible with Kafka clusters running older Kafka versions (0.7, 0.8, 0.9).
Confluent Platform and Apache Kafka Compatibility
Confluent Platform | Apache Kafka® | Release Date | Standard End of Support | Platinum End of Support |
---|---|---|---|---|
7.1.x | 3.1.x | April 5, 2022 | April 5, 2024 | April 5, 2025 |
7.0.x | 3.0.x | October 27, 2021 | October 27, 2023 | October 27, 2024 |
6.2.x | 2.8.x | June 8, 2021 | June 8, 2023 | June 8, 2024 |
6.1.x | 2.7.x | February 9, 2021 | February 9, 2023 | February 9, 2024 |
6.0.x | 2.6.x | September 24, 2020 | September 24, 2022 | September 24, 2023 |
5.5.x | 2.5.x | April 24, 2020 | April 24, 2022 | April 24, 2023 |
5.4.x | 2.4.x | January 10, 2020 | January 10, 2022 | January 10, 2023 |
5.3.x | 2.3.x | July 19, 2019 | July 19, 2021 | July 19, 2022 |
5.2.x | 2.2.x | March 28, 2019 | March 28, 2021 | March 28, 2022 |
5.1.x | 2.1.x | December 14, 2018 | December 14, 2020 | December 14, 2021 |
5.0.x | 2.0.x | July 31, 2018 | July 31, 2020 | July 31, 2021 |
4.1.x | 1.1.x | April 16, 2018 | April 16, 2020 | April 16, 2021 |
4.0.x | 1.0.x | November 28, 2017 | November 28, 2019 | November 28, 2020 |
3.3.x | 0.11.0.x | August 1, 2017 | August 1, 2019 | August 1, 2020 |
3.2.x | 0.10.2.x | March 2, 2017 | March 2, 2019 | March 2, 2020 |
3.1.x | 0.10.1.x | November 15, 2016 | November 15, 2018 | November 15, 2019 |
3.0.x | 0.10.0.x | May 24, 2016 | May 24, 2018 | May 24, 2019 |
2.0.x | 0.9.0.x | December 7, 2015 | December 7, 2017 | December 7, 2018 |
1.0.0 | – | February 25, 2015 | February 25, 2017 | February 25, 2018 |
CFK Version | Compatible Confluent Platform Versions | Compatible Kubernetes Versions | Release Date | End of Support |
---|---|---|---|---|
2.3.x | 7.0.x, 7.1.x | 1.18 - 1.23 (OpenShift 4.6 - 4.10) | April 5, 2022 | April 5, 2023 |
2.2.x | 6.2.x, 7.0.x | 1.17 - 1.22 (OpenShift 4.6 - 4.9) | Nov 3, 2021 | Nov 3, 2022 |
2.1.x | 6.0.x, 6.1.x, 6.2.x | 1.17 - 1.22 (OpenShift 4.6 - 4.9) | Oct 12, 2021 | Oct 12, 2022 |
2.0.x | 6.0.x, 6.1.x, 6.2.x | 1.15 - 1.20 | May 12, 2021 | May 12, 2022 |
Spring Cloud Stream Version | Spring for Apache Kafka Version | Spring Integration for Apache Kafka Version | kafka-clients | Spring Boot | Spring Cloud |
---|---|---|---|---|---|
3.1.x (2020.0.x) | 2.6.x | 5.4.x | 2.6.x | 2.4.x | 2020.0.x |
3.0.x (Horsham)* | 2.5.x, 2.3.x | 3.3.x, 3.2.x | 2.5.x, 2.3.x | 2.3.x, 2.2.x | Hoxton* |
- Chapter 1 - A Rapid Introduction to Kafka
- Chapter 2 - Getting Started with Kafka Streams
- Chapter 3 - Stateless Processing (Sentiment Analysis of Cryptcurreny Tweets)
- Chapter 4 - Stateful Processing (Video game leaderboard)
- Chapter 5 - Windows and Time (Patient Monitoring / Infection detection application)
- Chapter 6 - Advanced State Management
- Chapter 7 - Processor API (Digital Twin / IoT application)
- Chapter 8 - Getting Started with ksqlDB
- Chapter 9 - Data Integration with ksqlDB and Kafka Connect
- Chapter 10 - Stream Processing Basics with ksqlDB (Netflix Change Tracking - Part I)
- Chapter 11 - Intermediate Stream Processing with ksqlDB (Netflix Change Tracking - Part II)
- Chapter 12 - The Road to Production
- Kafka Streams and ksqlDB greatly simplify the process of building stream processing applications
- As an added benefit, they are also both extremely fun to use
- Kafka is the fourth fastest growing tech skill mentioned in job postings from 2014-2019. Sharpening your skills in this area has career benefits
- By learning Kafka Streams and ksqlDB, you will be well prepared for tackling a wide-range of business problems, including: streaming ETL, data enrichment, anomaly detection, data masking, data filtering, and more
- Star this repo
- Follow @kafka_book on Twitter (character limits... sigh)
- Provide feedback on the book and code by either:
- filling out a 6 question survey, or
- emailing [email protected]
- Subscribe to an early preview of additional chapters from the website, kafka-streams-book.com
- Share the book, website, and/or code with your friends
For a comparison, check out the Confluent white paper titled, “Five Stages to Streaming Platform Adoption” , which presents a different perspective that encompasses five stages of their streaming maturity model with distinct criteria for each stage .
Kafka Streams is optimized for processing unbounded datasets quickly and efficiently, and is therefore a great solution for problems in low-latency, time-critical domains. A few example use cases include:
- Financial data processing ( Flipkart ), purchase monitoring, fraud detection
- Algorithmic trading
- Stock market/crypto exchange monitoring
- Real-time inventory tracking and replenishment ( Walmart )
- Event booking, seat selection ( Ticketmaster )
- Email delivery tracking and monitoring (Mailchimp)
- Video game telemetry processing (Activision, the publisher of Call of Duty )
- Search indexing ( Yelp )
- Geospatial tracking/calculations (e.g., distance comparison, arrival projections)
- Smart Home/IoT sensor processing (sometimes called AIOT, or the Artificial Intelligence of Things)
- Change data capture ( Redhat )
- Sports broadcasting/real-time widgets ( Gracenote )
- Real-time ad platforms ( Pinterest )
- Predictive healthcare, vitals monitoring ( Children’s Healthcare of Atlanta )
- Chat infrastructure ( Slack ), chat bots, virtual assistants
- Machine learning pipelines ( Twitter ) and platforms ( Kafka Graphs )
The list goes on and on, but the common characteristic across all of these examples is that they require (or at least benefit from) real-time decision making
or data processing. The spectrum of these use cases, and others you will encounter in the wild, is really quite fascinating. On one end of the spectrum, you may be processing streams at the hobbyist level by analyzing sensor output from a Smart Home device. However, you could also use Kafka Streams in a healthcare setting to monitor and react to changes in a trauma victim’s condition, as Children’s Healthcare of Atlanta has done.
Kafka Streams is also a great choice for building microservices on top of real-time event streams. It not only simplifies typical stream processing operations (filtering, joining, windowing, and transforming data), but as you will see in “Interactive Queries”, it is also capable of exposing the state of a stream using a feature called interactive queries
. The state of a stream could be an aggregation of some kind (e.g., the total number of views for each video in a streaming platform) or even the latest representation for a rapidly changing entity in your event stream (e.g., the latest stock price for a given stock symbol).
Now that you have some idea of who is using Kafka Streams and what kinds of use cases it is well suited for, let’s take a quick look at Kafka Streams’ architecture before we start writing any code.