-
Notifications
You must be signed in to change notification settings - Fork 0
Integration Test Analysis
The first thing to note about Druid integration tests is that they are a mess. The purpose of this page is to sort out that mess.
Integration tests live in the druid-integration-tests
module.
Maven defines a lifecycle for projects. Druid integration tests map into that lifecycle (somewhat incorrectly) as follows:
Maven Lifecycle Phase | Integration Test Mapping |
---|---|
process-resources |
copy ${project.build.outputDirectory}/wikipedia_hadoop_azure_input_index_task_template.json to ${project.build.outputDirectory}/wikipedia_hadoop_azure_input_index_task.sh
|
" | copy ${project.build.outputDirectory}/wikipedia_hadoop_s3_input_index_task_template.json to ${project.build.outputDirectory}/wikipedia_hadoop_s3_input_index_task.json
|
" | copy ${project.build.outputDirectory}/copy_resources_template.sh to target/gen-scripts/copy_resources.sh
|
... | |
pre-integration-test |
Sets a number of env vars, then runs build_run_cluster.sh to build Docker images and start them. |
integration-test |
verify goal: invokes the DruidTestRunnerFactory via TestNG. |
post-integration-test |
Sets a number of env vars, then runs stop_cluster.sh to stop the Docker containers. |
... | |
verify |
Verifies the results of the integration tests. |
Notes:
- WFT is going on with copying a JSON file to an sh file?
- The
process-resources
phase occurs early, it is not clear if those files are yet in the output directory. - The
copy_resources.sh
file was copied into the source tree. An in-flight PR moves it into thetarget
tree where all artifacts should live. - Since test output is verified in the
verify
phase, theREADME.md
for the integration tests says to run to theverify
phase.
The profiles which affect integration tests:
-
hadoop3
: Sets a number of Maven variables to include Hadoop 3 jars. -
integration-tests
: (druid-integration-test
) builds Docker images and runs tests. -
integration-test
: (distribution
) builds the integration test output directory viaintegration-test-assembly.xml
Notes:
- Note the two spellings of the profile:
integration-test
(singular) andintegration-tests
(plural). Theintegration-test
profile has the same name as a Maven phase, which is a bit confusing.
The pre-integration-test
phase invokes build_run_cluster.sh
which invokes the copied copy_resources_template.sh
which invokes another Maven build:
mvn -DskipTests -T1C -Danimal.sniffer.skip=true -Dcheckstyle.skip=true -Ddruid.console.skip=true -Denforcer.skip=true -Dforbiddenapis.skip=true -Dmaven.javadoc.skip=true -Dpmd.skip=true -Dspotbugs.skip=true install -Pintegration-test
Notes:
- The above line refers to
-Pintegration-test
(singular), butintegration-test
is a Maven phase. There is a profile calledintegration-tests
(plural) in the above list.
The pom.xml
file contains this section to copy certain resources to the output directory:
<plugin>
<artifactId>maven-resources-plugin</artifactId>
<groupId>org.apache.maven.plugins</groupId>
<configuration>
<outputDirectory>${project.build.outputDirectory}</outputDirectory>
<resources>
<resource>
<directory>script</directory>
<includes>copy_resources_template.sh</includes>
<filtering>true</filtering>
</resource>
<resource>
<directory>src/test/resources/hadoop/</directory>
<includes>*template.json</includes>
<filtering>true</filtering>
</resource>
<resource>
<directory>src/test/resources</directory>
<filtering>false</filtering>
</resource>
<resource>
<directory>src/main/resources</directory>
<filtering>false</filtering>
</resource>
</resources>
</configuration>
</plugin>
Some issues with the above:
- Per this documentation, the last two entries above are either wrong or unnecessary. The "standard" resources are copies automatically to the correct output locations.
- Per this documentation, and the earlier reference,
the list of resources must be within an
execution
section that specifies thecopy-resources
option and binds to a phase. Since that is done here, it is very likely that the entire section is a no-op. However, an inspection of the output suggests that the rules did run, but it is not clear in which phase. - According to this documentation the target
${project.build.outputDirectory}
is thetarget/classes
folder, which is decidedly not the place to put resources. - If the above ran (which it probably didn't) it would create two copies of the main resource, it would put the test resources into the main classes folder, and would add shell scripts to the classes folder. This is probably all wrong.
The Docker build process is convoluted and just plain wrong in many ways.
The integration-tests\pom.xml
file:
- Incorrect copies both test and compile resources into the compile target directory
- Adds other resources to the compile target directory (but not within a subdirectory, polluting the root name space as a result.)
- Launches the build process in the
pre-integration-test
phase:build_run_cluster.sh
. - (Somehow) copies
scripts/copy_resources_template.sh
togen-scripts/copy_resources.sh
(in the source tree, not target!)
build_run_cluster.sh
:
- Creates a shared directory (as in, shared into the container) at
~/shared
(note, this is outside of the build hierarchy!) - Creates a file called
docker_ip
within theintegration-tests/docker
source directory (not in the target directory!) - Creates keys (in the
integration-tests/docker/tls
source directory (should use target). - Runs
gen-script/copy-resources.sh
(copied above). - Runs
script/docker_build_containers.sh
- Runs
stop_cluster.sh
to stop the cluster. - Runs
script/docker_run_cluster.sh
to start the cluster - Runs
script/copy_hadoop_resources.sh
to do what the name suggests.
Issues:
- Incorrect use of resources.
- Unnecessary copying of a file.
- Place derived files in the source directory tree.
- Scripts are scattered in multiple locations.
The tests make use of Docker Compose (docker-compose
) to run the cluster. A single image is used for all services.
docker_compose_args.sh
defines function getComposeArgs()
to select YAML files to use based on
- DRUID_INTEGRATION_TEST_GROUP: one of the test groups
- DRUID_INTEGRATION_TEST_INDEXER: either
indexer
ormiddleManager
. (Theindexer
option supports a subset of tests.)
Actual configuration is done in a large number of docker/docker-compose*.yml
files. These files define some number of Druid services, sometimes running the same service twice on different ports. These files are similar to those described in the documentation. The basic idea is that there are environment variables that map to config
file entries, with a process to generate the files from the environment variables.
There is one Druid or external service (in the Docker sense) per container. Services include:
- Zookeeper plus Kafka
- Metadata storage (MySQL)
- Coordinator (one or two)
- Overlord (one or two)
- Broker
- Router
- Custom node role
- Etc.
Comments:
- Druid uses an ad-hoc way to define the various services. Compose provides a "profile" which is a simpler approach.
- Druid uses an ad-hoc assortment of environment variables to configure services, but Compose provides a "config" option which is more general, and an "env file" feature which may be a simpler way to set environment variables.
Suggestions:
- Use the test group name as a profile. Use profiles to enable service. Reduce the resulting number of YAML files and thus the large amount of duplication in the current design.
The distribution/pom.xml
file, for the integration-tests
profile, invokes a Druid command that appears to bypass Maven to populate
the local Maven repository with a selected set of dependencies. The class in question is PullDependencies
which explains the
implementation, but not purpose, of this effort.
The dependencies populated are mostly Druid build artifacts. Shouldn't these already be in the local repository as a result of the
install
action? Or, perhaps install
comes too late and so the integration tests need to pull the artifacts from a location other
than the build itself? If so, aren't we then pulling artifacts other than those we are building and trying to test?
The actual fact appears to be that Maven builds projects recursively. Each project is build through the entire lifecycle as shown by inspecting the details of a build:
[INFO] --- maven-install-plugin:2.3.1:install (default) @ druid-lookups-cached-global ---
[INFO] Installing /Users/paul/git/druid/extensions-core/lookups-cached-global/target/druid-lookups-cached-global-0.23.0-SNAPSHOT.jar to /Users/paul/.m2/repository/org/apache/druid/extensions/druid-lookups-cached-global/0.23.0-SNAPSHOT/druid-lookups-cached-global-0.23.0-SNAPSHOT.jar
[INFO] Installing /Users/paul/git/druid/extensions-core/lookups-cached-global/pom.xml to /Users/paul/.m2/repository/org/apache/druid/extensions/druid-lookups-cached-global/0.23.0-SNAPSHOT/druid-lookups-cached-global-0.23.0-SNAPSHOT.pom
The above is for one Druid project, others follow the same pattern. In order to compile, Maven must pull dependencies and install them into the local repository.
As a result, the local repository already contains all the needed artifacts. If the integration-tests
profile somehow found it
must use a tool to download then, then something is wrong. Perhaps the integration tests are build without the install step? Perhaps the module placement is wrong and modules are missing? Whatever the reason, the logic is wrong.
Furthermore, if the code above actually downloads Druid build artifacts from a repository, then the local Maven repository is corrupted: its contents are not what was built. A subsequent incremental build will put the build in an inconsistent state as some artifacts will be from a source other than the local source code. (It should go without saying that a corrupt build is seldom a helpful situation.)
Proposed change: remove the tool invocation. Determine how to use the artifacts from the current build.
The command line from the README.md
is:
mvn verify -P integration-tests
This seems to say:
- Run the
package
step using theintegration-tests
profile. - Run to the
verify
phase, which includes theintegration-test
phase which runs the tests. - Include the
verify
phase which checks the results of the integration tests.
See integration-tests/docker/Dockerfile
and scripts/docker_build_containers.sh
. The local working directory is $SHARED_DIR/docker
.
Typical command line:
docker build -t druid/cluster --build-arg ZK_VERSION --build-arg KAFKA_VERSION --build-arg CONFLUENT_VERSION --build-arg MYSQL_VERSION --build-arg MARIA_VERSION --build-arg MYSQL_DRIVER_CLASSNAME $SHARED_DIR/docker
This is:
-
-t druid/cluster
- Set a tag on the container. -
-build-arg ZK_VERSION
- Set a build argument of the formKEY=value
-
$SHARED_DIR/docker
- Specify the build context (the working directory for the build.)
Note the incorrect form for the -build-arg
, the correct form is ZK_VERSION=$ZK_VERSION
.
Thus, the context is ~/shared/docker
which contains:
Dockerfile docker-compose.yml
base-setup.sh docker_ip
client_tls druid.sh
docker-compose.base.yml environment-configs
docker-compose.cli-indexer.yml ldap-configs
docker-compose.druid-hadoop.yml run-mysql.sh
docker-compose.high-availability.yml schema-registry
docker-compose.ldap-security.yml service-supervisords
docker-compose.query-error-test.yml supervisord.conf
docker-compose.query-retry-test.yml test-data
docker-compose.schema-registry-indexer.yml tls
docker-compose.schema-registry.yml wiki-simple-lookup.json
docker-compose.security.yml
The prior scripts set up this directory.
The Dockerfile
itself:
- Base image is the target JDK version:
FROM openjdk:$JDK_VERSION as druidbase
- Generates a single Docker image called
druidbase
- Requires a set of argument:
- JDK_VERSION
- KAFKA_VERSION (not used)
- ZK_VERSION (not used)
- APACHE_ARCHIVE_MIRROR_HOST
- MYSQL_VERSION
- MARIA_VERSION
- MYSQL_DRIVER_CLASSNAME
- CONFLUENT_VERSION
- Uses a combination of files copied in, and downloaded files modified
- Starts and stops MSQL twice
- Once to create the metadata store (surprisingly, we have no existing script for this)
- A second time to run Druid to create metastore tables.
- Performs some operations in a script, others via Docker RUN commands.
- Uses Perl (!) to adjust Kafka properties
- Exposes a great number of ports, including those used by Druid and ZK.
- Entry point "work dir" is
/var/lib/druid
- Has a complex entry point rather than just using a script.
- Resides in
distribution/docker
- Does its own build inside a docker image (!)
- Uses a different set of bases than used for testing.
The Docker compose files configure the cluster. There is one file for test group. In fact, the test group
is defined by the Compose file it uses. For example: docker-compile.high-availability.yml
is the file
for the high-availability
test group.
Each of these files depends on a base file: docker-compose.base.yml
which configures the
common services and sets up defaults for the Druid services.
The test containers do not use the out-of-the-box Druid configs or launch scripts: it is all bespoke. The process is:
- A
druid.sh
file exists in the public Docker to allow configs to be set via environment variables. - That file was cloned, and extended, in integration tests to add functionality.
- The Dockerfile wraps that script to create TLS keys and to register sample data.
- A set of serviced scripts do the work of the launch.
For launch:
- Generate TLS keys for the server instance.
- Use the env vars (set in the docker compose files) to edit a bespoke set of config files in
/tmp/conf
. - Launch MySQL and install some S3 keys, etc. for sample data, then shut it down. (Not that this is done for every service, so it is done multiple times per cluster.)
- Set up some config variables used by Druid to point to the configuration files.
- Launch supervisiord with the service launch script.
- In
druid.conf
, assemble the command line from 7 different environment variables.
Expanded:
- Each test group is run separately as a distinct Maven job and requires a full build of Druid (actually two builds, as above.)
Each provides a set of options as described in
README.md
: "-Doverride.config.path=<PATH_TO_FILE>
with your Cloud credentials". - The
integration-tests/pom.xml
launches thebuild_run_cluster.sh
script to build and run the cluster. Much of this is explained above. For the config file:<DRUID_INTEGRATION_TEST_OVERRIDE_CONFIG_PATH>${override.config.path}</DRUID_INTEGRATION_TEST_OVERRIDE_CONFIG_PATH>
- The above builds the containers and the shared directories, as explained above.
- The above calls
integration-tests/script/docker_run_cluster.sh
to start the cluster. - 'docker_run_cluster.sh' sources
script/docker_compose_args.sh
to map from test group name to Docker compose file and other args. - Checks for the existence of
DRUID_INTEGRATION_TEST_OVERRIDE_CONFIG_PATH
for certain tests.
Missing:
- How are supervisord.confs placed in the container?
- The
pom.xml
file is "one pass": it starts the cluster and runs tests. How do we get specific test group launches?
Druid configuration is difficult to work with in normal times, and Docker makes the problem worse. Basically, Druid config consists of a set of static files. The distribution includes a variety of configurations. Tests want to override various properties. But, Druid has no form of configuration inheritance, forcing the code into a variety of ad-hoc solutions.
Config layers, in order of priority.
DRUID_INTEGRATION_TEST_OVERRIDE_CONFIG_PATH
- Base compose environment variables
- Group-specific compose environment variables.
- Hard-coded defaults.
The Docker compose files define a set of services e.g. docker-compose.cli-indexer.yaml
.
For each service, the compose files reference one or more "environment configs." Example:
druid-overlord:
extends:
file: docker-compose.base.yml
service: druid-overlord
environment:
- DRUID_INTEGRATION_TEST_GROUP=${DRUID_INTEGRATION_TEST_GROUP}
depends_on:
- druid-metadata-storage
- druid-zookeeper-kafka
- These configs define specially-encoded environment variables
The images use supervisor
to run processes. An important feature
of Supervisor is that the set of program(s) to run is defined statically as a set of files. This is
a challenge when creating a single image that can run multiple kinds of services. Druid works around
this by only mounting the service configs needed for a given container. This works, but has much
redundancy. See this article for
more background.
There are three service other than Druid:
- ZooKeeper
- Kafka
- MySQL
It seems that ZK and Kafka are run together in one container, MySQL in another.
- Each resides in
/usr/local/<proj>
. - A Supervisord script launches the service. Example for Kafka:
[program:kafka]
command=/usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
priority=0
stdout_logfile=/shared/logs/kafka.log
-
docker-compose.base.yml
defines the container:
druid-zookeeper-kafka:
image: org.apache.druid/test:${DRUID_VERSION}
container_name: druid-zookeeper-kafka
...
volumes:
- ${HOME}/shared:/shared
- ./service-supervisords/zookeeper.conf:/usr/lib/druid/conf/zookeeper.conf
- ./service-supervisords/kafka.conf:/usr/lib/druid/conf/kafka.conf
env_file:
- ./environment-configs/common
-
docker-compose.<group>.yaml
defines the actual service as a reference:
druid-zookeeper-kafka:
extends:
file: ../../docker/docker-compose.base.yml
service: druid-zookeeper-kafka
- The
environment-configs/common
file defines Druid config: it is not clear why (or if) it is needed for non-Druid service. - The entry point (check) does (what) to launch
supervisord
. - The common config file,
/etc/supervisor/conf.d/supervisord.conf
includes any config files in/usr/lib/druid/conf/*.conf
:
[supervisord]
nodaemon=true
logfile = /shared/logs/supervisord.log
[include]
files = /usr/lib/druid/conf/*.conf
Several scripts start MySQL, do stuff, and shut it down. The problem here is that the database is not shared: creating it in one image doesn't actually affect the DB created in the actual MySQL image.
-
docker-compose.*.yml
files list all Druid properties as env vars. - Compose inheritance merges test-specific settings with base settings.
-
environment-configs/common
has common properties for all services, and JVM args - These files are composed in the compose files:
env_file:
- ./environment-configs/common
- ./environment-configs/overlord
- ${OVERRIDE_ENV}
- (What does OVERRIDE_DEV do? Where is it set?)
- The configs set the location of the shared folders for logs.
- The configs identify the names of the dependent services, such as MySQL or ZK.
- The service specific env files, such as
coordinator
, identify the service:
DRUID_SERVICE=coordinator
- Docker compose, and Docker, then set these within the environment of the container.
- The entrypoint uses bits of
druid.sh
to setup the configuration:
ENTRYPOINT /tls/generate-server-certs-and-keystores.sh \
&& . /druid.sh \
# Create druid service config files with all the config variables
&& setupConfig \
# Some test groups require pre-existing data to be setup
&& setupData \
# Export the service config file path to use in supervisord conf file
&& export DRUID_SERVICE_CONF_DIR="$(. /druid.sh; getConfPath ${DRUID_SERVICE})" \
# Export the common config file path to use in supervisord conf file
&& export DRUID_COMMON_CONF_DIR="$(. /druid.sh; getConfPath _common)" \
# Run Druid service using supervisord
&& exec /usr/bin/supervisord -c /etc/supervisor/conf.d/supervisord.conf
-
setupConfig
parses the environment variables to create the config files:
setupConfig()
{
echo "$(date -Is) configuring service $DRUID_SERVICE"
# We put all the config in /tmp/conf to allow for a
# read-only root filesystem
mkdir -p /tmp/conf/druid
COMMON_CONF_DIR=$(getConfPath _common)
SERVICE_CONF_DIR=$(getConfPath ${DRUID_SERVICE})
mkdir -p $COMMON_CONF_DIR
mkdir -p $SERVICE_CONF_DIR
touch $COMMON_CONF_DIR/common.runtime.properties
touch $SERVICE_CONF_DIR/runtime.properties
setKey $DRUID_SERVICE druid.host $(resolveip -s $HOSTNAME)
setKey $DRUID_SERVICE druid.worker.ip $(resolveip -s $HOSTNAME)
# Write out all the environment variables starting with druid_ to druid service config file
# This will replace _ with . in the key
env | grep ^druid_ | while read evar;
do
# Can't use IFS='=' to parse since var might have = in it (e.g. password)
val=$(echo "$evar" | sed -e 's?[^=]*=??')
var=$(echo "$evar" | sed -e 's?^\([^=]*\)=.*?\1?g' -e 's?_?.?g')
setKey $DRUID_SERVICE "$var" "$val"
done
}
- The Supervisor launch script combines the information to actually launch Druid:
[program:druid-service]
command=java %(ENV_SERVICE_DRUID_JAVA_OPTS)s %(ENV_COMMON_DRUID_JAVA_OPTS)s -cp %(ENV_DRUID_COMMON_CONF_DIR)s:%(ENV_DRUID_SERVICE_CONF_DIR)s:%(ENV_DRUID_DEP_LIB_DIR)s org.apache.druid.cli.Main server %(ENV_DRUID_SERVICE)s
redirect_stderr=true
priority=100
autorestart=false
stdout_logfile=%(ENV_DRUID_LOG_PATH)s
- Based on the long-obsolete TestNG Documentation returns a 404.
-
IntegrationTestingConfig
configuration of the cluster, etc.-
ConfigFileConfigProvider
creates an instance from a pile of information from a JSON config file.
-
- Injected into the test.
- Startup is via the TestNG class
ITestRunnerFactory
and its subclassDruidTestRunnerFactory
-
testng.xml
provides the list of tests which reside insrc/test
org.apache.druid.tests
and below. -
TestNGGroup
lists the test groups. -
SuiteListener
does suite (group?) specific setup.-
DruidTestModuleFactory
defines an injector.
-
-
DruidTestModule
seems to be the only test-specific module.- Uses a
Properties
-style config file, with keys underdruid.test.config
. - Properties seem to be passed in from the
pom.xml
file. - Creates a dummy self node for tests.
- Uses a
ls ~/shared
docker hadoop-dependencies logs tasklogs
druid hadoop_xml storage wikiticker-it
~/shared/docker
Dockerfile docker-compose.yml
base-setup.sh docker_ip
client_tls druid.sh
docker-compose.base.yml environment-configs
docker-compose.cli-indexer.yml ldap-configs
docker-compose.druid-hadoop.yml run-mysql.sh
docker-compose.high-availability.yml schema-registry
docker-compose.ldap-security.yml service-supervisords
docker-compose.query-error-test.yml supervisord.conf
docker-compose.query-retry-test.yml test-data
docker-compose.schema-registry-indexer.yml tls
docker-compose.schema-registry.yml wiki-simple-lookup.json
docker-compose.security.yml