-
Notifications
You must be signed in to change notification settings - Fork 0
Single Process Property Files
This page discusses how Druid manages properties and how the behavior must change to support a Single Process Druid.
Druid allows configuration via a set of Java properties-style configuration files using Guice mechanisms:
public class GuiceInjectors
{
public static Collection<Module> makeDefaultStartupModules()
{
return ImmutableList.of(
...
new PropertiesModule(Arrays.asList("common.runtime.properties", "runtime.properties")),
Here:
-
common.runtime.properties
contains properties common to all Druid services (and to other Druid commands) -
runtime.properties
contains properties specific to a single service, such as historical.
This is done by listing the property file directories as part of the class path. (See https://github.com/paul-rogers/druid/wiki/Build-and-Debug#configure-eclipse.) To run a historical node, the following appear on the class path:
$DRUID_HOME/conf/druid/single-server/micro-quickstart/historical
$DRUID_HOME/conf/druid/single-server/micro-quickstart/_common
The _common
folder contains the common.runtime.properties
file with the common properties.
Within historical
we have runtime.properties
. We can now see an issue with running multiple services: since the name runtime.properties
is shared by all services, we allow only one service's config directory to appear in the class path. This means that the service chosen via the server
CLI option is entirely dependent on the configuration directory added to the class path. In fact, we could even omit the service name and infer it from the class path directory, or from a property within the runtime.properties
file. In fact, the launcher script does something like this. Consider the main.config
file:
org.apache.druid.cli.Main server historical
This allows the launcher script to pass along the proper command line option for the class path we've selected.
One possible solution to a single-process Druid would be to combine all properties into a single file. In fact, Druid supports this mode with the druid.properties.file
system property: <druid> -Ddruid.properties.file=my.properties
. In this case we'd like the properties to be scoped by service type so different services can live in the same file. Some are:
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
Unfortunately, many are not. From historical/runtime.properties
:
druid.service=druid/historical
druid.plaintextPort=8083
The result is that we cannot mix properties for different services together in a single properties collection: each must have its own distinct properties map.
A key question to ask is when properties are used: do we need the service-specific properties before we start running the service itself? This is a difficult question since all startup modules have visibility to the both properties files listed above. This means we must look, case-by-case, to see which properties are actually used where. The Extensions page discusses one use case.
We must ensure that no "startup" module refers to configuration in the service-specific configuration file.
The JsonConfigProvider
class is used throughout the code to "provide a singleton value of some type from Properties
bound in Guice." Basically, it maps a property prefix string to a singleton object defined in Guice.
An example might be if the
DruidServerConfig
class werepublic class DruidServerConfig { @JsonProperty @NotNull public String hostname = null; @JsonProperty @Min(1025) public int port = 8080; }And your
Properties
object had in itdruid.server.hostname=0.0.0.0 druid.server.port=3333
Then this would bind a singleton instance of a
DruidServerConfig
object withhostname = "0.0.0.0"
andport = 3333
.
Here is a typical example:
public class ServerModule implements Module
{ ...
public void configure(Binder binder)
{
JsonConfigProvider.bind(binder, ZK_PATHS_PROPERTY_BASE, ZkPathsConfig.class);
JsonConfigProvider.bind(binder, "druid", DruidNode.class, Self.class);
}
Druid uses the Config-Magic implementation to create some config objects via the ConfigProvider
(https://github.com/apache/druid/blob/master/core/src/main/java/org/apache/druid/guice/ConfigProvider.java) class. Config Magic seems to be a non-Guice way to create and populate config objects from a Properties
object.
ConfigProvider
appears to be a Guice-compatible wrapper around config magic which also allows substitutions of property values.
Example: the module:
public class DruidProcessingConfigModule implements Module
{
@Override
public void configure(Binder binder)
{
ConfigProvider.bind(binder, DruidProcessingConfig.class, ImmutableMap.of("base_path", "druid.processing"));
}
}
The configured class:
public abstract class DruidProcessingConfig ...
{
@Config({"druid.computation.buffer.size", "${base_path}.buffer.sizeBytes"})
public HumanReadableBytes intermediateComputeSizeBytesConfigured()
{
return DEFAULT_PROCESSING_BUFFER_SIZE_BYTES;
}
Here, the @Config
annotation says to first look at druid.computation.buffer.size
. If not found, look at ${base_path}.buffer.sizeBytes
. If still not found, use the default. Notice the substitution above.
Config Magic does not inject values like Guice. Instead, it generates a new class (or object) that provides the values. The example shows an interface as the item that defines the properties, then the config factory creates a concrete class. The documentation is a decade old. Seems that the version Druid uses also allow the input to be a class that provides concrete implementations for the values which are used if no property value can be found. That is, Config Magic won't generate a method for a property value that is undefined, and for which a base class implementation exists.
The Druid flow is:
- The startup module
PropertiesModule
load properties as described above. - The startup
ConfigModule
creates the Config MagicConfigurationObjectFactory
from the above properties. -
DruidSecondaryModule
copies the factory from the startup injector into the service injector. - Service modules call
ConfigProvider
to bind a config object to some Guice key. - Users of the config obtain properties from Guice injection of the above config object.
One result is that objects configured this way are singletons that start with service-specific values, then fall back to default values. Any such singleton is a challenge for a multi-service server.
The ConfigModule
can be split. The ConfigurationObjectFactory
can be created in the proposed root injector and not then recreated in the DruidSecondaryModule
.
The DruidNode
class describes a single service. In the current Druid, there is only a single node per process, so the DruidNode
represents the identity of the process as well. DruidNode
is JSON-serializable, and is build from properties under the druid
key in the runtime.properties
file:
druid.service=druid/broker
druid.plaintextPort=8082
The plumbing is is that the service runtime.properties
resides on the class path, is loaded into Properties
set in Guice, from which the DruidNode
is created via Guice and the JsonConfigurator
.
An annotation, @Self
, indicates the DruidNode
for the process itself. The ServerModule
provides the binding:
JsonConfigProvider.bind(binder, "druid", DruidNode.class, Self.class);
Each service provides a service name and default ports. For example, in CliBroker
:
binder.bindConstant().annotatedWith(Names.named("serviceName")).to(
TieredBrokerConfig.DEFAULT_BROKER_SERVICE_NAME
);
binder.bindConstant().annotatedWith(Names.named("servicePort")).to(8082);
binder.bindConstant().annotatedWith(Names.named("tlsServicePort")).to(8282);
binder.bindConstant().annotatedWith(PruneLoadSpec.class).to(true);
These values become the defaults for the ``DruidNode`:
@JsonCreator
public DruidNode(
@JacksonInject @Named("serviceName") @JsonProperty("service") String serviceName,
...
@JacksonInject @Named("servicePort") @JsonProperty("port") Integer port,
@JacksonInject @Named("tlsServicePort") @JsonProperty("tlsPort") Integer tlsPort,
The ObjectMapper
used to deserialize the DruidNode
has a set of injectable values obtained from Guice, as specified in DruidSecondaryModule
:
@VisibleForTesting
public static void setupJackson(Injector injector, final ObjectMapper mapper)
{
mapper.setInjectableValues(new GuiceInjectableValues(injector));
setupAnnotationIntrospector(mapper, new GuiceAnnotationIntrospector());
}
The above three properties are used only in one place: to create the Druid node. The "serviceName"
key is used by BasicSecurityDruidModule
to learn the kind of the current node:
private static boolean isCoordinator(Injector injector)
{
final String serviceName;
try {
serviceName = injector.getInstance(Key.get(String.class, Names.named("serviceName")));
}
catch (Exception e) {
return false;
}
return "druid/coordinator".equals(serviceName);
}
The @Self
annotation is used in Guice injection to get the node of the current machine. It is used:
- In a number of places to identify the "parent node" for some child task. (See
HadoopDruidIndexerConfig
,ForkingTaskRunner
,TaskMaster
,CliIndexer
, etc.) - In tests such as
DatasourcePathSpecTest
- In authentication, such as
KerberosAuthenticator
- In leader selection, such as
DruidLeaderSelectorProvider
,K8sDruidLeaderSelector
,CuratorDruidLeaderSelector
- To identify the current service, as in
MovingAverageQueryModule
- For metadata:
StorageNodeModule
- For misc. tasks in
QueryResource
,DruidAvaticaProtobufHandler
- To know the current node in
EmitterModule,
DruidCoordinator,
SelfDiscoveryResource`. - To set up the web server ports in
JettyServerModule
In single-process, multiple "druid nodes" exist in a single process: we break the 1:1 assumption between processes and nodes. Thus, there is not "the" DruidNode
, but a set of them. It is better to think of the DruidNode
as being associated with a web app, and there one or more web apps per process.
Because of this, it may make sense to more closely bind DruidNode
creation to the service, and have a service-specific injector implementation for Jackson. In this model:
- The properties manager captures "generic" properties into service-specific buckets, perhaps rewriting the names.
- The service defines a provider for its own
DruidNode
, keyed by the service annotation. - The service provides a Jackson injector which maps the generic names to the service-specific property names.
- The startup injector (used to configure modules, including the service modules) provides some kind of service configure helper that handles different "modes" (classic, single-process.)
- When operating in "classic" mode, the
DruidNode
is bound toSelf
as normal.
The @Self
annotation tells us where the DruidNode
is used (see above). Each usage will likely require individual attention. In general:
- Announce the set of services running in a node.
- Security is associated with a process identity separate from the node. (Security does not need ports.)
- Child nodes must be provided with which service is the parent, not just "the current one."
- Test change depend entirely on the purpose of the test.
The configuration files are likely to be the most intrusive change. It appears that, today, a user can configure any Druid property in either the common configuration file or in the service-specific file. Once we run multiple services, this can no longer be true since the service-specific configuration must be visible only to the specific service, and not to the common startup logic.
Since Druid today assumes that the service configuration file appears on the class path, we must change this for multi-service. Perhaps we can instead add the path to the root folder, such as $DRUID_HOME/single-server/micro-quickstart
. If we do that, then Druid will run in two distinctive configurations:
- Single-service mode: current class path with entires for common and service-specific configuration.
- Bundled mode: class path contains the root configuration directory, code must work out the paths to the various files.
However, the _common
folder contains the log4j2.xml
file as well, read by the logger. So, it seems that _common
must remain on the class path, but the service directory can be inferred.
To work around the above limitations, we must work with the current property structure, but we cannot put the service configuration files on the class path. Instead, we must load them for each service individually.
For bundled mode:
- If
druid.properties.file
is used, the file must contain all properties, including those for specific services. Do not read the class path config files. - If the normal mechanism is to be used, include only
_common
on the class path.- The
common.runtime.properties
file must exist. If not, raise an error. - Load
common.runtime.properties
at startup. Use these properties to configure extensions, etc. - Remember the directory in which
common.runtime.properties
. Assume this is the root of the property file tree. - When we launch a service, and create its specific injector, create a new set of properties which include the service-specific properties.
- The
For normal operation, we retain the current behavior, except we tidy up handling of the druid.properties.file
system property.
- If
druid.properties.file
is used, the file must contain all properties, including those for desired service. Do not load any class path configuration files. - If the normal mechanism is to be used, include both
_common
and the service directory on the class path. - Load both files at startup time.
- When creating the specific service injector, reuse the startup properties.
Druid reports properties to the log on startup:
2021-11-19T21:10:52,420 INFO [main] org.apache.druid.cli.CliBroker - Starting up with processors[12], memory[268,435,456], maxMemory[4,294,967,296]. Properties follow.
2021-11-19T21:10:52,421 INFO [main] org.apache.druid.cli.CliBroker - * awt.toolkit: sun.lwawt.macosx.LWCToolkit
2021-11-19T21:10:52,421 INFO [main] org.apache.druid.cli.CliBroker - * druid.broker.cache.populateCache: false
...
Note, however, that startup has to be cross-service, but each service will have its own properties. This relates to the other topics for teasing apart per-server and per-service issues.
Some key issues to overcome include:
- Properties reside in multiple files in multiple directories.
- Names vary: there is a combined
coordinator-overlord
name when those two services are combined. - Names are duplicated:
druid.service
anddruid.plaintextPort
for example. - Properties are pooled and injected into services which need them.
- Properties are defined within Druid itself:
servicePort
andtlsServicePort
. Such names are duplicated across services so the one-and-only Jetty knows where to find them.
This leads to a possible solution: a revised property abstraction.
- Instead of binding
Properties
to Guice's property injector, bind a new class. - Require that all non-global properties are prefixed by service.
druid.service
becomesdruid.broker.service
, etc. - The property abstraction manages loading of properties in various ways, and adds the service prefix based on directories.
- Have the property system resolve names based on some rules.
Some of this works only once the Jetty-based web server can handle multiple connectors.
How can the above be done step-by-step?
- Replace
Properties
with a new abstraction that, for now, just mimicsProperties
. - Add logic to create prefixed names from unprefixed names.
- Allow both prefixed and unprefixed names for service-specific properties.
- Add service prefixes where easily done now. Note the challenges.
- Resolve challenges.
- Modify property service to warn when accessing non-prefixed names. Resolve issues.
- Enforce prefixed names.