Benchmarks for Jelly-JVM using Apache Jena and RDF4J.
See the Jelly website for more information about Jelly and documentation.
You need at least Java 17 to compile the benchmark code. To run the benchmarks, we strongly recommend using the latest stable version of a modern JRE, preferably GraalVM in JIT mode.
This code was tested only on Linux. It may work on other platforms, but this wasn't tested.
Use sbt to compile the project into a single JAR file:
sbt assembly
The resulting JAR will be placed in target/scala-3.*/benchmarks-assembly-*.jar
. Use the path to the built JAR as the second argument to the benchmark scripts.
The benchmarks can read RDF data from compressionMode-compressed Jelly files (preferred) or from compressed .tar.gz files. RiverBench datasets are distributed in both of these formats (see documentation), but Jelly is much faster to read from. You can manually download the datasets, but we recommend using the built-in download utility:
java -cp [path-to-benchmark-jar] eu.ostrzyciel.jelly.benchmark.runDownloadDatasetsUtility [profile] [version] [size] [output-directory]
[profile]
is the RiverBench profile to download, for examplestream-mixed
. See a list of available profiles here.[version]
is the version of RiverBench to use, for example2.1.0
. We strongly recommend using the latest stable version (notdev
!) to ensure reproducibility.[size]
is the size of the datasets to download, for example:10K
,100K
,1M
,full
. If you are not sure, usefull
. The benchmark scripts can use only a part of the downloaded dataset.[output-directory]
is the directory where the datasets will be downloaded.
The script will download the datasets as .jelly.gz
files.
The scripts
directory contains scripts that automate running the benchmarks.
The scripts assume you are using the stream-mixed-rdfstar
profile of RiverBench – if you are using a different profile, you will need to modify the DATASETS
variable in the scripts accordingly. Same goes for any changes to benchmark parameters.
./scripts/flat_size.sh [path-to-java-executable] [path-to-benchmark-jar] [directory-with-datasets]
- Runs the serialization size benchmark for flat RDF streams (streams of triples/quads). This benchmark runs with both Jena and RDF4J.
[path-to-java-executable]
is the path to the Java executable to use. For example:/usr/bin/java
.[path-to-benchmark-jar]
is the path to the compiled benchmark JAR (see: Compiling).[directory-with-datasets]
is the directory with the downloaded datasets (see: Downloading the datasets).
./scripts/grouped_size.sh [path-to-java-executable] [path-to-benchmark-jar] [directory-with-datasets]
- Runs the serialization size benchmark for grouped RDF streams (streams of graphs/datasets). This benchmark runs with both Jena and RDF4J.
./scripts/flat_ser_des.sh [path-to-java-executable] [path-to-benchmark-jar] [directory-with-datasets]
- Runs the serialization/deserialization throughput benchmark for flat RDF streams (streams of RDF triples or quads). This benchmark runs only with Jena.
./scripts/flat_ser_des_rdf4j.sh [path-to-java-executable] [path-to-benchmark-jar] [directory-with-datasets]
- Runs the serialization/deserialization throughput benchmark for flat RDF streams (streams of RDF triples or quads). This benchmark runs only with RDF4J.
./scripts/grouped_ser_des.sh [path-to-java-executable] [path-to-benchmark-jar] [directory-with-datasets]
- Runs the serialization/deserialization speed benchmark for grouped RDF streams (streams of RDF graphs or datasets). This benchmark runs only with Jena.
./scripts/grouped_ser_des_rdf4j.sh [path-to-java-executable] [path-to-benchmark-jar] [directory-with-datasets]
- Runs the serialization/deserialization speed benchmark for grouped RDF streams (streams of RDF graphs or datasets). This benchmark runs only with RDF4J.
./scripts/grpc_throughput.sh [path-to-java-executable] [path-to-benchmark-jar] [directory-with-datasets] [server-port]
- Runs the end-to-end throughput benchmark using gRPC, on Jena.
- If you want to emulate specific network conditions for your benchmark (e.g., limited bandwidth, increased latency), see the section on network emulation below.
[server-port]
is the port on which the gRPC server will listen and the gRPC client will connect to. If you want to use network emulation, set this to a port that you have configured in the network emulation script.
./scripts/grpc_latency.sh [path-to-java-executable] [path-to-benchmark-jar] [directory-with-datasets] [server-port]
- Runs the end-to-end latency benchmark using gRPC, on Jena.
./scripts/kafka_throughput.sh [path-to-java-executable] [path-to-benchmark-jar] [directory-with-datasets] [broker-port-for-producer] [broker-port-for-consumer]
- Runs the end-to-end throughput benchmark using Apache Kafka, on Jena.
- This script requires a running Kafka cluster. See the section on running Kafka below.
[kafka-broker-port]
is the port on which the Kafka broker is listening. If you want to use network emulation, set this to a port that you have configured in the network emulation script.
./scripts/kafka_latency.sh [path-to-java-executable] [path-to-benchmark-jar] [directory-with-datasets] [broker-port-for-producer] [broker-port-for-consumer]
- Runs the end-to-end latency benchmark using Apache Kafka, on Jena.
- This script requires a running Kafka cluster. See the section on running Kafka below.
For Kafka benchmarks, you need a local Kafka cluster. You can run it yourself, but we recommend using the provided Docker Compose configuration. You can find it along with further instructions in the scripts/kafka
directory. By default, the broker will listen on ports 9092, 9093, and 9094.
To emulate specific network conditions in gRPC and Kafka benchmarks, you can use the netem kernel module in Linux.
In our benchmarks we used the following configuration:
- Ports 9092 and 8420: no emulation
- Ports 9093 and 8421: 100 Mbit/s bandwidth, 10 ms latency
- Ports 9094 and 8422: 50 Mbit/s bandwidth, 15 ms latency
./scripts/set_netem.sh 10 100 9093 8421 2
./scripts/set_netem.sh 15 50 9094 8422 3
time nc -zw30 localhost 9092
time nc -zw30 localhost 9093
time nc -zw30 localhost 9094
The commands should show latencies of ~0ms, ~20ms, ~30ms (round-trip).
sudo tc qdisc del dev lo root
The code in this repository was written by Piotr Sowiński (Ostrzyciel).
This project is licensed under the Apache License 2.0. See the LICENSE file for details.