-
Notifications
You must be signed in to change notification settings - Fork 0
Build and Debug
Druid requires Python (2?) plus pyyaml
. Using Python 3 requires edits to two Python scripts.
Druid officially uses Java 8. However, developers have found that everything except a few extensions work well with Java 11. If you launch Druid with a version other than 8 you'll be asked to set an environment variable:
export DRUID_SKIP_JAVA_CHECK=1
The tutorial appears to work fine with Java 11 and 14.
Other products have newer dependencies. If you must use Java 8, then on the Mac, use jenv
and brew
to manage. See this post. Also, see this post for Java 8 in particular.
brew install jenv
brew tap adoptopenjdk/openjdk
brew install --cask adoptopenjdk8
jenv add /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home
If you use the newest Eclipse, you must change the JVM. The newest comes configured to use Java 16 by default. However, that version is strict about module enforcement and you'll get the following exception:
...module java.base does not "opens java.lang" to unnamed module...
See this StackOverflow article.
The solution is to configure Eclipse to use Java 11 instead.
mvn clean package -Pdist -Pskip-static-checks -Pskip-tests -Dmaven.javadoc.skip=true -T1.0C
Note that, if the Java version is not 8 (is, say, 11), the above build will appear to work, but the tar.gz
file is not produced.
Or
# Druid aliases
alias druid-quick="mvn -T 8 -DskipTests -Dforbiddenapis.skip=true -Dcheckstyle.skip=true -Dpmd.skip=true -Dmaven.javadoc.skip=true -Danimal.sniffer.skip=true -Denforcer.skip=true -Dspotbugs.skip=true clean install"
See the documentation.
Create a directory for the Druid install, say ~/bin
.
export DRUID_DEV=<path to druid>
export BIN_DIR=~/bin
export DRUID_VER=0.23.0-SNAPSHOT
export DRUID_HOME=$BIN_DIR/apache-druid-${DRUID_VER}
mkdir $BIN_DIR
cd $BIN_DIR
tar -xzf $DRUID_DEV/distribution/target/apache-druid-${DRUID_VER}-bin.tar.gz
cd $DRUID_HOME
It can be handy to put the environment variables in the shell startup script (.zshrc
on the Mac), and the other commands in a script to run after each build.
Note: if using Juypter, it turns out that Juypter's default port (8888) is the same as Druid's default port. Change one of them. For Jupyter:
jupyter notebook --port 9000
If using Java 11, add the following line:
$DRUID_HOME/conf/supervise/single-server/micro-quickstart.conf
:
--add-exports=java.base/jdk.internal.ref=ALL-UNNAMED
To the following file:
$DRUID_HOME/conf/druid/single-server/micro-quickstart/router/jvm.config
The line may be needed for the other services as well. Druid will fail with a clear log message if you need to add it elsewhere.
And set the following:
export DRUID_SKIP_JAVA_CHECK=1
Launch locally:
cd $DRUID_HOME
./bin/start-micro-quickstart
Visit the UI: https://localhost:8888
Seems that this version includes the needed ZK and database.
Every system has its unique way to handle configuration. Configuration has to work in both the production and development environments. Druid runs each of its services as a distinct Java processes. The chain of events is:
-
$DRUID_HOME/bin/micro-quickstart
- A shell script which invokes theservice
script with the configuration to run. -
$DRUID_HOME/conf/supervise/single-server/micro-quickstart.conf
- A Perl file which invokes each service:
broker bin/run-druid broker conf/druid/single-server/micro-quickstart
Each service is configured via a specific directory, pointed to by the above files. For example $DRUID_HOME/conf/druid/single-server/micro-quickstart
. There is one directory per service. Within a directory, say broker
, there are three files:
-
jvm.config
- JVM config passed on the Java command line. -
main.config
- JVM command passed to select the "main" routine. -
runtime.properties
- Druid properties, passed as JVM-D
settings, to configure Druid itself.
In production, a script assembles this information into a launch command. Our job, when running in the debugger, is to use the IDE to do this work.
Druid must run as four processes (at least). We don't want to launch all four from the IDE. A trick from Gian is to start with a pre-built Druid: either one downloaded from the Druid project, or built locally. Let's assume we're using the one we built above. We start by running the "cluster" using the standard script as shown above. Ingest data and ensure all works properly.
First. while the micro-quickstart
cluster is running, let's get the configuration we need.
ps aux | grep "java -server.*apache-druid" > /tmp/services.txt
In your IDE, create a launch configuration. These are the instructions for Eclipse:
- Kind: Java Application
- Name:
Historical
- Project:
services
(?) - Main Class:
org.apache.druid.cli.Main
- Program Arguments:
server historical
- Working Directory: the value of
$DRUID_HOME
- JVM Arguments: see below
- Dependencies/Classpath: see below
JVM Arguments: Here we want to doctor up the arguments we captured above. Pull out the JVM arguments other than the class path and the standard Java setup. Here is an an example, double-check that this is valid in your case. Also, not that $DRUID_HOME
is not available in Eclipse, fill in the actual path.
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
--add-exports java.base/jdk.internal.perf=ALL-UNNAMED
--add-exports jdk.management/com.sun.management.internal=ALL-UNNAMED
Add the two lines if running in Java 11 or later.
Some other settings you might want to use:
-Dlog4j.configurationFile=<some path>/log4j2.xml
The command line we captured earlier identifies config files we need on the class path. Go to the Dependencies, tab, click on Classpath, then "Advanced", and select "Add External Folder". Repeat to add each of the following:
$DRUID_HOME/conf/druid/single-server/micro-quickstart/historical
$DRUID_HOME/conf/druid/single-server/micro-quickstart/_common
Oddly, the following listed on the captured command line don't seem to actually exist:
$DRUID_HOME/conf/druid/single-server/micro-quickstart/_common/hadoop-xml
$DRUID_HOME/conf/druid/single-server/micro-quickstart/../_common
$DRUID_HOME/conf/druid/single-server/micro-quickstart/../_common/hadoop-xml
Move these entries to the top of the classpath list to mimic the captured command line.
To verify that all is good, click "Show Command Line". Ignore the Eclipse-provided items in the class path. Ensure that the rest looks like the command line we captured above.
Shut down the cluster by typing ^C
(control-C) in the console window where the cluster is running.
Now, find the supervise
config file mentioned above, say $DRUID_HOME/conf/supervise/single-server/micro-quickstart.conf
. Comment out the process we want to run, say historical
:
#historical bin/run-druid historical conf/druid/single-server/micro-quickstart
Launch the cluster again:
cd $DRUID_HOME
./bin/start-micro-quickstart
Wait for the services to start, then use the "Services" tab in the Druid UI to ensure that all services except historical
are running.
Use the launch configuration created earlier to launch our historical node.
Note: be sure to launch the Druid cluster first, then the debug historical. The Druid script will fail if the historical is already running.
As you change the code, you'll want to test those changes.
The easiest case is when you change a single server, say a historical. Just make the change and relaunch the server from your IDE as above. This works as long as the API does not change, or if you don't change code that runs on two servers. (Some query code runs in both the broker and data nodes.)
Druid is pretty aggressive about the number of threads it wants to create. Unfortunately, during debugging, the large thread count obscures what we care about, and takes time to create resources we won't likely use. We can reduce the thread count by adding the following to _common/common.runtime.properties
:
# For debugging
druid.broker.http.numMaxThreads=3
druid.global.http.numMaxThreads=3
druid.server.http.minThreads=2
The last line depends on work in progress not yet submitted to the master branch.
If you change code that has to be deployed in both the historical and broker, then an easy solution is to repeat what was done above, but also comment out the broker. Relaunch the cluster. Duplicate the debug config for the historical. Change the launch name to "Broker". Change the arguments:
server broker
Then, change the class path to point to the broker configuration. Remove the historical
directory and add:
$DRUID_HOME/conf/druid/single-server/micro-quickstart/broker
Now, start the Druid cluster, then launch both the broker and historical within the IDE. Changes you make to common code are immediately available for testing.
The crude-but-effective way to deploy any change is to completely rebuild Druid, delete your old install directory, install a new version and reingest data. Of course, this is also painfully slow.
We can refine the above by skipping the reingest step. To do this, rename the old install directory to a different name, say, druid-old
. Install a new Druid as above. Then, copy over the required files from the old version:
export OLD_DRUID=<path to renamed old install>
cp $OLD_DRUID/conf/supervise/single-server/micro-quickstart.conf $DRUID_HOME/conf/supervise/single-server
mkdir $OLD_DRUID/var/druid
mv $OLD_DRUID/var/zk $DRUID_HOME/var
OLD_VAR_DRUID=$OLD_DRUID/var/druid
VAR_DRUID=$DRUID_HOME/var/druid
mkdir $DRUID_HOME/var/druid
mv $OLD_VAR_DRUID/segments $VAR_DRUID
mv $OLD_VAR_DRUID/segment-cache $VAR_DRUID
mv $OLD_VAR_DRUID/task $VAR_DRUID
mv $OLD_VAR_DRUID/metadata.db $VAR_DRUID
You may want to wrap the above in a script.
Start with just running checkstyle: it is far too painful to do it as part of a build:
mvn checkstyle:checkstyle
Run a full build. Be sure that no Druid cluster is running: it seems that the Curator tests will fail if Druid's "micro" cluster has already started a ZK.