Welcome to the real-time graph ETL for modern data pipelines with Quine and Apache Cassandra workshop! In this two-hour workshop, we show how to combine a scalable database, Apache Cassandraā¢
, with a powerful real-time streaming graph processor, Quine
.
ā²ļø Duration : 2 hours
š Level : Intermediate
Using Astra DB, the cloud-based Cassandra-as-a-Service platform delivered by DataStax, we will cover the very first steps for every developer who wants to try to learn a new database: creating tables and CRUD operations.
Quine, from the team at thatDot, is a real-time streaming graph interpreter. A server-side program that consumes data, builds a graph structure, and runs live computation on that graph to answer questions or compute results, and then stream them out.
It doesn't matter if you join our workshop live or you prefer to do it at your own pace, we have you covered. In this repository, you'll find everything you need for this workshop:
- Objectives
- Frequently asked questions
- Materials for the Session
- Create your Database
- Setup Quine
- Graph Exploration
- Homework
- What's NEXT
1ļøā£ Give you an understanding of Quine streaming graph
2ļøā£ Show how to configure Quine to use Cassandra or Astra DB
3ļøā£ Provide an overview of Quine's Architecture
4ļøā£ Have you run a streaming scenario to detect an anomaly in the graph
š Have fun with an interactive session!
1ļøā£ Can I run this workshop on my computer?
There is nothing preventing you from running the workshop on your own machine. If you do so, you will need the following installed on your local system:
- git
- Java 11
2ļøā£ What other prerequisites are required?
- You will need enough *real estate* on screen, we will ask you to open a few windows and it would not fit on mobiles (tablets should be OK)
- You will need an Astra account: don't worry, we'll work through that in the following
- As "Intermediate level" we expect you to know what Java is. If you don't know what a streaming graph is, or haven't used a graph system before, that's ok...we'll cover that.
3ļøā£ Do I need to pay for anything for this workshop?
No. All tools and services we provide here are FREE. FREE not only during the session but also after.
4ļøā£ Will I get a certificate or badge if I attend this workshop?
Attending the session is not enough. You need to complete the homework detailed below and you will get a nice badge that you can share on LinkedIn or anywhere else *(open badge specification)*
It doesn't matter if you join our workshop live or you prefer to work at your own pace, we have you covered. In this repository, you'll find everything you need for this workshop:
ASTRA DB
is the simplest way to run Cassandra with zero operations at all - just push the button and get your cluster. No credit card required, 40M read/write operations and about 80GB storage monthly for free - sufficient to run small production workloads. If you end your credits the databases will pause, no charge
Leveraging the database creation guide, create a database. Right-Click the button with Open in a new TAB.
Field | Value |
---|---|
Database Name | workshops |
Keyspace Name | quine |
Regions | Select GOOGLE CLOUD , then Moncks Corner (us-east1) OR an Area close to you, then a region with no LOCK š icons, those are the region you can use for free. |
ā¹ļø Note: If you already have a database
workshops
, simply add a keyspacequine
using theAdd Keyspace
button on the bottom right hand corner of db dashboard page.
While the database is being created, you will also get a Security token:
save it somewhere safe, as it will be needed later in other workshops (In particular the string starting with AstraCS:...
.)
The status will change from Pending
to Active
when the database is ready, this will only take 2-3 minutes. You will also receive an email when it is ready.
ā ļø ImportantThe instructor will show you on screen how to create a token but will have to destroy to token immediately for security reasons.
These instructions were written using Java 11.10. To run Quine locally, follow the steps below. Or, you can run it in GitPod:
The Astra Shell tool will be included in the GitPod build. Once the environment has completed its initial build, you should see the following at the bottom of the terminal:
Open A NEW TERMINAL and run: astra setup
Open a new terminal by clicking the plus ( + ) icon on the upper-right corner of the current terminal window. Then run:
astra setup
You will be prompted to enter (paste) your token.
ā¢ Enter your token (starting with AstraCS) :
Once you it enter, you should see output similar to below:
[[email protected]]
> ASTRA_DB_APPLICATION_TOKEN=AstraCS:AAAAAAAA
>
> [What's NEXT ?]
> You are all set.(configuration is stored in ~/.astrarc) You can now:
> ā¢ Use any command, 'astra help' will get you the list
> ā¢ Try with 'astra db list'
> ā¢ Enter interactive mode using 'astra'
>
> Happy Coding!
Verify the setup with the following command which should list all the databases:
astra db list
Open a new terminal window and run Quine by executing:
./start.sh
Follow the Download Quine page to download the JAR. Choose/create a directory for Quine, and copy the JAR to another location:
mkdir -p ~/local/quine
cp quine-1.3.2.jar ~/local/quine/
Create a quine.conf
file inside the quine directory:
cd ~/local/quine
touch quine.conf
Edit the quine.conf
file:
quine.store {
# store data in an Apache Cassandra instance
type = cassandra
# the keyspace to use
keyspace = quine
should-create-keyspace = false
should-create-tables = true
replication-factor = 3
write-consistency = LOCAL_QUORUM
read-consistency = LOCAL_QUORUM
local-datacenter = ${ASTRA_DB_REGION}
write-timeout = "10s"
read-timeout = "10s"
}
datastax-java-driver {
advanced {
auth-provider {
class = PlainTextAuthProvider
username = "token"
password = ${ASTRA_DB_APP_TOKEN}"
}
}
basic {
cloud {
secure-connect-bundle = "/workspace/workshop-streaming-graph-quine/secure-connect-workshops.zip"
}
}
}
type = cassandra
- If the type is not specified, Quine defaults to use RocksDB.
should-create-keyspace = false
- Remember keyspaces can only be created in Astra via the dashboard.
replication-factor = 3
- Defaults to 1 if not set, which will not work with Astra DB.
write-consistency = LOCAL_QUORUM
- Minimum consistency level required by Astra.
read-consistency = LOCAL_QUORUM
- Minimum consistency level required by Astra.
local-datacenter = "us-east1"
- Set your Astra DB cloud region as the local DC.
username = "token"
- No need to mess with this. Just leave it as the literal word "token."
password
- A valid token for an Astra DB cluster.
secure-connect-bundle
- A valid, local file location of a downloaded Astra secure connect bundle. The driver gets the Astra DB hostname from the secure bundle, so there is no need to specify endpoints separately.
Download the "Password Spraying" recipe from Github. Move the resulting YAML file to your quine
directory.
mkdir ~/local/quine
cp password-spraying-workshop.yml ~/local/quine/
Then, download the sample data file" and move the JSON file to your quine
directory.
mkdir ~/local/quine
cp attempts.json ~/local/quine/
Alternatively, if you have cloned this repository and are working in the repo directory locally, you can run the get_quine.sh
script. If successful, the script will download the latest version of Quine and the sample data for this workshop, and the script will output something similar to what you see below.
āÆ ./get_quine.sh
HTTP request sent, awaiting response... 200 OK
Length: 220704180 (210M) [application/octet-stream]
Saving to: āquine-1.3.2.jarā
quine-1.3.2.jar 100%[=========================================================>] 210.48M 20.5MB/s in 22s
2022-10-06 11:13:39 (9.78 MB/s) - āquine-1.3.2.jarā saved [220704180/220704180]
HTTP request sent, awaiting response... 200 OK
Length: 82243423 (78M) [application/json]
Saving to: āattempts.jsonā
attempts.json 100%[=========================================================>] 78.43M 15.4MB/s in 5.0s
2022-10-06 11:13:49 (15.8 MB/s) - āattempts.jsonā saved [82243423/82243423]
To run Quine using the Password Spray recipe, invoke the JAR with Java, while passing the quine.conf
as a config.file
JVM parameter, while also specifying the recipe, like this:
cd ~/local/quine
java -Dconfig.file=quine.conf -jar quine-1.3.2.jar -r password-spraying-workshop.yml --force-config
If Quine starts correctly, it should produce output similar to below:
2022-06-15 15:11:52,666 WARN [NotFromActor] [s0-io-4] com.datastax.oss.driver.internal.core.cql.CqlRequestHandler - Query '[0 values] CREATE TABLE IF NOT EXISTS journals (quine_id blob,timestamp bigint,data blob,PRIMARY KEY(quine_id,timestamp)) WITH CLUSTERING ORDER BY (timestamp ASC) AND compaction={'class':'TimeWindowCompactionStrategy'}' generated server side warning(s): Ignoring provided values [compaction] as they are not supported for Table Properties (ignored values are: [additional_write_policy, bloom_filter_fp_chance, caching, cdc, compaction, compression, crc_check_chance, dclocal_read_repair_chance, extensions, gc_grace_seconds, id, max_index_interval, memtable_flush_period_in_ms, min_index_interval, nodesync, read_repair, read_repair_chance, speculative_retry])
Graph is ready!
Application state loaded.
Quine app web server available at http://0.0.0.0:8080
Quine should then start ingesting the data stream automatically, displaying its progress as it moves along.
If the output does not read:
Graph is ready!
Application state loaded.
Quine app web server available at http://locahost:8080
Then look for exceptions.
If you see this error:
Downloading Astra secure connect bundle...
Picked up JAVA_TOOL_OPTIONS: -Xmx3489m
[WARN ] - Database workshops has not been found
[WARN ] - Database 'workshops' has not been found.
[ERROR] - NOT_FOUND: Database 'workshops' has not been found.
Download of SCB failed!
...this means that your token is not associated with a database named 'workshops'. Edit the .env
file with the following GitPod command:
gp open .env
Enter the correct name for your database, and rerun start.sh
.
If you see an error:
com.datastax.oss.driver.api.core.servererrors.InvalidQueryException: Clustering key columns must exactly match columns in CLUSTERING ORDER BY directive
...this means that one of tables (likely the snapshots
table) failed to CREATE properly. From GitPod, you can fix this by using the Astra Shell. If your DB name is different from "workshops," replace it as appropriate.
astra db cqlsh workshops
Check to ensure the snapshots
table exists:
cqlsh> use quine;
cqlsh> desc quine;
If not, execute this CQL in Astra Shell (or cqlsh if you're running locally) to create it:
CREATE TABLE quine.snapshots (
quine_id blob,
timestamp bigint,
multipart_index int,
data blob,
multipart_count int,
PRIMARY KEY (quine_id, timestamp, multipart_index)
) WITH CLUSTERING ORDER BY (timestamp DESC, multipart_index ASC)
AND additional_write_policy = '99PERCENTILE'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.UnifiedCompactionStrategy'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair = 'BLOCKING'
AND speculative_retry = '99PERCENTILE';
If Quine starts, but you see a message which looks like this:
Standing Query STANDING-1 already exists
Standing Query STANDING-2 already exists
Ingest Stream INGEST-1 already exists
...that means that you'll need to clear the existing data before proceeding. You can do this by running the truncate_tables.sh
script from within GitPod.
Once you seen the message indicating that an anomaly has been found, you should be able to Ctrl+click or Command click to open the Quine Graph Explorer with a query for that pattern match. However, sometimes the new tab will open and the page load will fail due to an extra double-quote on the end of the query. Simply click on the query text well, and press End to go to the end of the line. If you see a double quote there, backspace over it.
You can now use Quine's visual graph explorer in a web browser, and create/traverse data with either Gremlin or Cypher: http://localhost:8080/
The attempts.json
contains the pattern for one password spraying attack. Quine will produce an alert in the console when the standing query matches the pattern. The alert will look similar to this:
2022-10-07 09:37:46,255 Standing query `alert` match: {"meta":{"isPositiveMatch":true,"resultId":"7242b979-03c2-2bc3-9879-13661e8359b5"},"data":{"QuineUILink":"Password Spraying Attack: http://localhost:8080/#MATCH%20(user)-[:ORIGINATED]-%3E(attempt1%20%7BoutcomeResult:%22FAILURE%22%7D)-[:NEXT]-%3E(attempt2%20%7BoutcomeResult:%22FAILURE%22%7D)-[:NEXT]-%3E(attempt3%20%7BoutcomeResult:%22FAILURE%22%7D)-[:NEXT]-%3E(attempt4%20%7BoutcomeResult:%22FAILURE%22%7D)-[:NEXT]-%3E(attempt5%20%7BoutcomeResult:%22SUCCESS%22%7D)%20WHERE%20id(attempt1)=%22cb73fb14-4686-3913-8cd8-7d4d608b53d5%22%20RETURN%20DISTINCT%20user%2Cattempt1%2Cattempt2%2Cattempt3%2Cattempt4%2Cattempt5"}}
When the alert arrives, hold down the command key on a MAC and right-click the link to open Quine.
You should be able to arrange the graph in your browser into a shape similar to this:
Explore the graph using the Exploration UI to follow the path of the attack in the event stream.
To stop Quine, you can simply hit Ctrl+c. Or, you can run the stop.sh
file from a terminal (locally or from within GitPod).
To submit the homework,
- Write a new recipe or extend an existing one and take SCREENSHOT(s).
- Visit the Quine documentation site
- Go through the Getting Started -> Quine Recipes tutorial
- Create a new recipe from scratch, or...
- Modify an existing recipe to do something new
- Email the SCREENSHOT(s) of Quine running your recipe in the browser and the recipe YAML file
EMAIL:
To: [email protected], [email protected]
Subject: Quine Homework
We've just scratched the surface of what you can do using Astra DB, built on Apache Cassandra. Go take a look at DataStax for Developers to see what else is possible. There's plenty to dig into!
Congratulations: you made to the end of today's workshop.
... and see you at our next workshop!
Sincerely yours, The DataStax Developers