spark-structured-streaming-on-iae-to-scylladb

Spark Structured Streaming from IBM Message Hub to ScyllaDB using Spark on IBM Analytics Engine

Introduction

The purpose of this project is to take the continuous data set produced to IBM Message Hub (Kafka) by dataset-generator and storing the data in IBM Compose ScyllaDB using Apache Spark Structured Streaming on IBM Analytics Engine.

!WARNING! This project only works with Apache Spark 2.2.x whereas IBM Analytics Engine has Apache Spark 2.3.x. This project will not yet run on IBM Analytics Engine. See here for details of the upstream issue.

Acknowledgements

This project is based on https://github.com/polomarcus/Spark-Structured-Streaming-Examples

Prerequisites

You have an IBM Cloud account
You have followed the instructions in the project kafka-producer-for-simulated-data to create a continuous stream of data on Kafka
You have an IBM Compose ScyllaDB instance running in IBM Cloud
You have an IBM Analytics Engine (1.1) instance running in IBM Cloud
You have SBT 1.2.1+ installed (instructions)
You have Cassandra cqsh command installed and configured (instructions

Optional

You have scala knowledge (you will only need this if you want to change the demo functionality)

Setup

Clone this project

git clone https://github.com/ibm-cloud-streaming-retail-demo/spark-structured-streaming-on-iae-to-scylladb
cd spark-structured-streaming-on-iae-to-scylladb/

Copy the template files:

cp ./jaas_mh.conf_template jaas_mh.conf
cp ./cassandra.properties_template cassandra.properties
cp ./messagehub.properties_template messagehub.properties

Edit jaas_mh.conf with your Message Hub username and password
Edit cassandra.properties with your ScyllaDB connection details
Edit messagehub.properties with your MessageHub connection details
Open a cqlsh session and paste the contents of schema.sql to create the scyllaDB schema.

Running

rm -rf checkpoint/ && SBT_OPTS="-XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=6G -Xmx6G" sbt clean run

Developing

Developed with Intellij IDEA

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
cassandra.properties_template		cassandra.properties_template
jaas_mh.conf_template		jaas_mh.conf_template
messagehub.properties_template		messagehub.properties_template
schema.cql		schema.cql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spark-structured-streaming-on-iae-to-scylladb

Introduction

Acknowledgements

Prerequisites

Optional

Setup

Running

Developing

About

Releases

Packages

Languages

License

ibm-cloud-streaming-retail-demo/spark-structured-streaming-on-iae-to-scylladb

Folders and files

Latest commit

History

Repository files navigation

spark-structured-streaming-on-iae-to-scylladb

Introduction

Acknowledgements

Prerequisites

Optional

Setup

Running

Developing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages