Build and Debug

Background

Druid contribution guidelines

Python

Druid requires Python (2?) plus pyyaml. Using Python 3 requires edits to two Python scripts.

Java

Druid officially uses Java 8. However, developers have found that everything except a few extensions work well with Java 11. If you launch Druid with a version other than 8 you'll be asked to set an environment variable:

export DRUID_SKIP_JAVA_CHECK=1

The tutorial appears to work fine with Java 11 and 14.

Java 8

Other products have newer dependencies. If you must use Java 8, then on the Mac, use jenv and brew to manage. See this post. Also, see this post for Java 8 in particular.

brew install jenv
brew tap adoptopenjdk/openjdk
brew install --cask adoptopenjdk8
jenv add /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home

Java Version in Eclipse

If you use the newest Eclipse, you must change the JVM. The newest comes configured to use Java 16 by default. However, that version is strict about module enforcement and you'll get the following exception:

...module java.base does not "opens java.lang" to unnamed module...

See this StackOverflow article.

The solution is to configure Eclipse to use Java 11 instead.

Build Scripts

mvn clean package -Pdist -Pskip-static-checks -Pskip-tests -Dmaven.javadoc.skip=true -T1.0C

Note that, if the Java version is not 8 (is, say, 11), the above build will appear to work, but the tar.gz file is not produced.

Or

# Druid aliases
alias druid-quick="mvn -T 8 -DskipTests -Dforbiddenapis.skip=true -Dcheckstyle.skip=true -Dpmd.skip=true -Dmaven.javadoc.skip=true -Danimal.sniffer.skip=true -Denforcer.skip=true -Dspotbugs.skip=true clean install"

Standalone Launch

See the documentation.

Create a directory for the Druid install, say ~/bin.

export DRUID_DEV=<path to druid>
export BIN_DIR=~/bin
export DRUID_VER=0.23.0-SNAPSHOT
export DRUID_HOME=$BIN_DIR/apache-druid-${DRUID_VER}
mkdir $BIN_DIR
cd $BIN_DIR
tar -xzf $DRUID_DEV/distribution/target/apache-druid-${DRUID_VER}-bin.tar.gz
cd $DRUID_HOME

It can be handy to put the environment variables in the shell startup script (.zshrc on the Mac), and the other commands in a script to run after each build.

Note: if using Juypter, it turns out that Juypter's default port (8888) is the same as Druid's default port. Change one of them. For Jupyter:

jupyter notebook --port 9000

If using Java 11, add the following line: $DRUID_HOME/conf/supervise/single-server/micro-quickstart.conf:

--add-exports=java.base/jdk.internal.ref=ALL-UNNAMED

To the following file:

$DRUID_HOME/conf/druid/single-server/micro-quickstart/router/jvm.config

The line may be needed for the other services as well. Druid will fail with a clear log message if you need to add it elsewhere.

And set the following:

export DRUID_SKIP_JAVA_CHECK=1

Launch locally:

cd $DRUID_HOME
./bin/start-micro-quickstart

Visit the UI: https://localhost:8888

Seems that this version includes the needed ZK and database.

Druid Configuration

Every system has its unique way to handle configuration. Configuration has to work in both the production and development environments. Druid runs each of its services as a distinct Java processes. The chain of events is:

$DRUID_HOME/bin/micro-quickstart - A shell script which invokes the service script with the configuration to run.
$DRUID_HOME/conf/supervise/single-server/micro-quickstart.conf - A Perl file which invokes each service:
broker bin/run-druid broker conf/druid/single-server/micro-quickstart

Each service is configured via a specific directory, pointed to by the above files. For example $DRUID_HOME/conf/druid/single-server/micro-quickstart. There is one directory per service. Within a directory, say broker, there are three files:

jvm.config - JVM config passed on the Java command line.
main.config - JVM command passed to select the "main" routine.
runtime.properties - Druid properties, passed as JVM -D settings, to configure Druid itself.

In production, a script assembles this information into a launch command. Our job, when running in the debugger, is to use the IDE to do this work.

Configure Eclipse

Druid must run as four processes (at least). We don't want to launch all four from the IDE. A trick from Gian is to start with a pre-built Druid: either one downloaded from the Druid project, or built locally. Let's assume we're using the one we built above. We start by running the "cluster" using the standard script as shown above. Ingest data and ensure all works properly.

First. while the micro-quickstart cluster is running, let's get the configuration we need.

ps aux | grep "java -server.*apache-druid" > /tmp/services.txt

In your IDE, create a launch configuration. These are the instructions for Eclipse:

Kind: Java Application
Name: Historical
Project: services (?)
Main Class: org.apache.druid.cli.Main
Program Arguments: server historical
Working Directory: the value of $DRUID_HOME
JVM Arguments: see below
Dependencies/Classpath: see below

JVM Arguments: Here we want to doctor up the arguments we captured above. Pull out the JVM arguments other than the class path and the standard Java setup. Here is an an example, double-check that this is valid in your case. Also, not that $DRUID_HOME is not available in Eclipse, fill in the actual path.

-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
--add-exports java.base/jdk.internal.perf=ALL-UNNAMED
--add-exports jdk.management/com.sun.management.internal=ALL-UNNAMED

Add the two lines if running in Java 11 or later.

Some other settings you might want to use:

-Dlog4j.configurationFile=<some path>/log4j2.xml

The command line we captured earlier identifies config files we need on the class path. Go to the Dependencies, tab, click on Classpath, then "Advanced", and select "Add External Folder". Repeat to add each of the following:

$DRUID_HOME/conf/druid/single-server/micro-quickstart/historical
$DRUID_HOME/conf/druid/single-server/micro-quickstart/_common

Oddly, the following listed on the captured command line don't seem to actually exist:

$DRUID_HOME/conf/druid/single-server/micro-quickstart/_common/hadoop-xml
$DRUID_HOME/conf/druid/single-server/micro-quickstart/../_common
$DRUID_HOME/conf/druid/single-server/micro-quickstart/../_common/hadoop-xml

Move these entries to the top of the classpath list to mimic the captured command line.

To verify that all is good, click "Show Command Line". Ignore the Eclipse-provided items in the class path. Ensure that the rest looks like the command line we captured above.

Launch from Eclipse

Shut down the cluster by typing ^C (control-C) in the console window where the cluster is running.

Now, find the supervise config file mentioned above, say $DRUID_HOME/conf/supervise/single-server/micro-quickstart.conf. Comment out the process we want to run, say historical:

#historical bin/run-druid historical conf/druid/single-server/micro-quickstart

Launch the cluster again:

cd $DRUID_HOME
./bin/start-micro-quickstart

Wait for the services to start, then use the "Services" tab in the Druid UI to ensure that all services except historical are running.

Use the launch configuration created earlier to launch our historical node.

Note: be sure to launch the Druid cluster first, then the debug historical. The Druid script will fail if the historical is already running.

Making Changes

As you change the code, you'll want to test those changes.

Single-Server Changes in IDE

The easiest case is when you change a single server, say a historical. Just make the change and relaunch the server from your IDE as above. This works as long as the API does not change, or if you don't change code that runs on two servers. (Some query code runs in both the broker and data nodes.)

Additional Debugging Config

Druid is pretty aggressive about the number of threads it wants to create. Unfortunately, during debugging, the large thread count obscures what we care about, and takes time to create resources we won't likely use. We can reduce the thread count by adding the following to _common/common.runtime.properties:

# For debugging

druid.broker.http.numMaxThreads=3
druid.global.http.numMaxThreads=3
druid.server.http.minThreads=2

The last line depends on work in progress not yet submitted to the master branch.

Multiple Servers in IDE

If you change code that has to be deployed in both the historical and broker, then an easy solution is to repeat what was done above, but also comment out the broker. Relaunch the cluster. Duplicate the debug config for the historical. Change the launch name to "Broker". Change the arguments:

server broker

Then, change the class path to point to the broker configuration. Remove the historical directory and add:

$DRUID_HOME/conf/druid/single-server/micro-quickstart/broker

Now, start the Druid cluster, then launch both the broker and historical within the IDE. Changes you make to common code are immediately available for testing.

Complete Rebuild

The crude-but-effective way to deploy any change is to completely rebuild Druid, delete your old install directory, install a new version and reingest data. Of course, this is also painfully slow.

Rebuild, Keep Data

We can refine the above by skipping the reingest step. To do this, rename the old install directory to a different name, say, druid-old. Install a new Druid as above. Then, copy over the required files from the old version:

export OLD_DRUID=<path to renamed old install>
cp $OLD_DRUID/conf/supervise/single-server/micro-quickstart.conf $DRUID_HOME/conf/supervise/single-server
mkdir $OLD_DRUID/var/druid
mv $OLD_DRUID/var/zk $DRUID_HOME/var
OLD_VAR_DRUID=$OLD_DRUID/var/druid
VAR_DRUID=$DRUID_HOME/var/druid
mkdir $DRUID_HOME/var/druid
mv $OLD_VAR_DRUID/segments $VAR_DRUID
mv $OLD_VAR_DRUID/segment-cache $VAR_DRUID
mv $OLD_VAR_DRUID/task $VAR_DRUID
mv $OLD_VAR_DRUID/metadata.db $VAR_DRUID

You may want to wrap the above in a script.

Preparing a PR

Start with just running checkstyle: it is far too painful to do it as part of a build:

mvn checkstyle:checkstyle

Run a full build. Be sure that no Druid cluster is running: it seems that the Curator tests will fail if Druid's "micro" cluster has already started a ZK.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly