Skip to content

Single Process Property Files

Paul Rogers edited this page Dec 16, 2021 · 6 revisions

This page discusses how Druid manages properties and how the behavior must change to support a Single Process Druid.

Property Configuration

Druid allows configuration via a set of Java properties-style configuration files using Guice mechanisms:

public class GuiceInjectors
{
  public static Collection<Module> makeDefaultStartupModules()
  {
    return ImmutableList.of(
        ...
        new PropertiesModule(Arrays.asList("common.runtime.properties", "runtime.properties")),

Here:

  • common.runtime.properties contains properties common to all Druid services (and to other Druid commands)
  • runtime.properties contains properties specific to a single service, such as historical.

This is done by listing the property file directories as part of the class path. (See https://github.com/paul-rogers/druid/wiki/Build-and-Debug#configure-eclipse.) To run a historical node, the following appear on the class path:

$DRUID_HOME/conf/druid/single-server/micro-quickstart/historical
$DRUID_HOME/conf/druid/single-server/micro-quickstart/_common

The _common folder contains the common.runtime.properties file with the common properties.

Within historical we have runtime.properties. We can now see an issue with running multiple services: since the name runtime.properties is shared by all services, we allow only one service's config directory to appear in the class path. This means that the service chosen via the server CLI option is entirely dependent on the configuration directory added to the class path. In fact, we could even omit the service name and infer it from the class path directory, or from a property within the runtime.properties file. In fact, the launcher script does something like this. Consider the main.config file:

org.apache.druid.cli.Main server historical

This allows the launcher script to pass along the proper command line option for the class path we've selected.

Property Naming

One possible solution to a single-process Druid would be to combine all properties into a single file. In fact, Druid supports this mode with the druid.properties.file system property: <druid> -Ddruid.properties.file=my.properties. In this case we'd like the properties to be scoped by service type so different services can live in the same file. Some are:

druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true

Unfortunately, many are not. From historical/runtime.properties:

druid.service=druid/historical
druid.plaintextPort=8083

The result is that we cannot mix properties for different services together in a single properties collection: each must have its own distinct properties map.

Property Visibility

A key question to ask is when properties are used: do we need the service-specific properties before we start running the service itself? This is a difficult question since all startup modules have visibility to the both properties files listed above. This means we must look, case-by-case, to see which properties are actually used where. The Extensions page discusses one use case.

We must ensure that no "startup" module refers to configuration in the service-specific configuration file.

JsonConfigProvider

The JsonConfigProvider class is used throughout the code to "provide a singleton value of some type from Properties bound in Guice." Basically, it maps a property prefix string to a singleton object defined in Guice.

An example might be if the DruidServerConfig class were

  public class DruidServerConfig
  {
    @JsonProperty @NotNull public String hostname = null;
    @JsonProperty @Min(1025) public int port = 8080;
  }

And your Properties object had in it

  druid.server.hostname=0.0.0.0
  druid.server.port=3333

Then this would bind a singleton instance of a DruidServerConfig object with hostname = "0.0.0.0" and port = 3333.

Here is a typical example:

public class ServerModule implements Module
{ ...
  public void configure(Binder binder)
  {
    JsonConfigProvider.bind(binder, ZK_PATHS_PROPERTY_BASE, ZkPathsConfig.class);
    JsonConfigProvider.bind(binder, "druid", DruidNode.class, Self.class);
  }

Config Magic

Druid uses the Config-Magic implementation to create some config objects via the ConfigProvider(https://github.com/apache/druid/blob/master/core/src/main/java/org/apache/druid/guice/ConfigProvider.java) class. Config Magic seems to be a non-Guice way to create and populate config objects from a Properties object.

ConfigProvider appears to be a Guice-compatible wrapper around config magic which also allows substitutions of property values.

Example: the module:

public class DruidProcessingConfigModule implements Module
{

  @Override
  public void configure(Binder binder)
  {
    ConfigProvider.bind(binder, DruidProcessingConfig.class, ImmutableMap.of("base_path", "druid.processing"));
  }
}

The configured class:

public abstract class DruidProcessingConfig ...
{
  @Config({"druid.computation.buffer.size", "${base_path}.buffer.sizeBytes"})
  public HumanReadableBytes intermediateComputeSizeBytesConfigured()
  {
    return DEFAULT_PROCESSING_BUFFER_SIZE_BYTES;
  }

Here, the @Config annotation says to first look at druid.computation.buffer.size. If not found, look at ${base_path}.buffer.sizeBytes. If still not found, use the default. Notice the substitution above.

Config Magic does not inject values like Guice. Instead, it generates a new class (or object) that provides the values. The example shows an interface as the item that defines the properties, then the config factory creates a concrete class. The documentation is a decade old. Seems that the version Druid uses also allow the input to be a class that provides concrete implementations for the values which are used if no property value can be found. That is, Config Magic won't generate a method for a property value that is undefined, and for which a base class implementation exists.

The Druid flow is:

  • The startup module PropertiesModule load properties as described above.
  • The startup ConfigModule creates the Config Magic ConfigurationObjectFactory from the above properties.
  • DruidSecondaryModule copies the factory from the startup injector into the service injector.
  • Service modules call ConfigProvider to bind a config object to some Guice key.
  • Users of the config obtain properties from Guice injection of the above config object.

One result is that objects configured this way are singletons that start with service-specific values, then fall back to default values. Any such singleton is a challenge for a multi-service server.

Possible Revisions

The ConfigModule can be split. The ConfigurationObjectFactory can be created in the proposed root injector and not then recreated in the DruidSecondaryModule.

DruidNode

The DruidNode class describes a single service. In the current Druid, there is only a single node per process, so the DruidNode represents the identity of the process as well. DruidNode is JSON-serializable, and is build from properties under the druid key in the runtime.properties file:

druid.service=druid/broker
druid.plaintextPort=8082

The plumbing is is that the service runtime.properties resides on the class path, is loaded into Properties set in Guice, from which the DruidNode is created via Guice and the JsonConfigurator.

An annotation, @Self, indicates the DruidNode for the process itself. The ServerModule provides the binding:

    JsonConfigProvider.bind(binder, "druid", DruidNode.class, Self.class);

Each service provides a service name and default ports. For example, in CliBroker:

          binder.bindConstant().annotatedWith(Names.named("serviceName")).to(
              TieredBrokerConfig.DEFAULT_BROKER_SERVICE_NAME
          );
          binder.bindConstant().annotatedWith(Names.named("servicePort")).to(8082);
          binder.bindConstant().annotatedWith(Names.named("tlsServicePort")).to(8282);
          binder.bindConstant().annotatedWith(PruneLoadSpec.class).to(true);

These values become the defaults for the ``DruidNode`:

@JsonCreator
  public DruidNode(
      @JacksonInject @Named("serviceName") @JsonProperty("service") String serviceName,
      ...
      @JacksonInject @Named("servicePort") @JsonProperty("port") Integer port,
      @JacksonInject @Named("tlsServicePort") @JsonProperty("tlsPort") Integer tlsPort,

The ObjectMapper used to deserialize the DruidNode has a set of injectable values obtained from Guice, as specified in DruidSecondaryModule:

  @VisibleForTesting
  public static void setupJackson(Injector injector, final ObjectMapper mapper)
  {
    mapper.setInjectableValues(new GuiceInjectableValues(injector));
    setupAnnotationIntrospector(mapper, new GuiceAnnotationIntrospector());
  }

The above three properties are used only in one place: to create the Druid node. The "serviceName" key is used by BasicSecurityDruidModule to learn the kind of the current node:

private static boolean isCoordinator(Injector injector)
  {
    final String serviceName;
    try {
      serviceName = injector.getInstance(Key.get(String.class, Names.named("serviceName")));
    }
    catch (Exception e) {
      return false;
    }

    return "druid/coordinator".equals(serviceName);
  }

The @Self annotation is used in Guice injection to get the node of the current machine. It is used:

  • In a number of places to identify the "parent node" for some child task. (See HadoopDruidIndexerConfig, ForkingTaskRunner, TaskMaster, CliIndexer, etc.)
  • In tests such as DatasourcePathSpecTest
  • In authentication, such as KerberosAuthenticator
  • In leader selection, such as DruidLeaderSelectorProvider, K8sDruidLeaderSelector, CuratorDruidLeaderSelector
  • To identify the current service, as in MovingAverageQueryModule
  • For metadata: StorageNodeModule
  • For misc. tasks in QueryResource, DruidAvaticaProtobufHandler
  • To know the current node in EmitterModule, DruidCoordinator, SelfDiscoveryResource`.
  • To set up the web server ports in JettyServerModule

Revisions

In single-process, multiple "druid nodes" exist in a single process: we break the 1:1 assumption between processes and nodes. Thus, there is not "the" DruidNode, but a set of them. It is better to think of the DruidNode as being associated with a web app, and there one or more web apps per process.

Because of this, it may make sense to more closely bind DruidNode creation to the service, and have a service-specific injector implementation for Jackson. In this model:

  • The properties manager captures "generic" properties into service-specific buckets, perhaps rewriting the names.
  • The service defines a provider for its own DruidNode, keyed by the service annotation.
  • The service provides a Jackson injector which maps the generic names to the service-specific property names.
  • The startup injector (used to configure modules, including the service modules) provides some kind of service configure helper that handles different "modes" (classic, single-process.)
  • When operating in "classic" mode, the DruidNode is bound to Self as normal.

The @Self annotation tells us where the DruidNode is used (see above). Each usage will likely require individual attention. In general:

  • Announce the set of services running in a node.
  • Security is associated with a process identity separate from the node. (Security does not need ports.)
  • Child nodes must be provided with which service is the parent, not just "the current one."
  • Test change depend entirely on the purpose of the test.

Candidate Revisions

The configuration files are likely to be the most intrusive change. It appears that, today, a user can configure any Druid property in either the common configuration file or in the service-specific file. Once we run multiple services, this can no longer be true since the service-specific configuration must be visible only to the specific service, and not to the common startup logic.

Since Druid today assumes that the service configuration file appears on the class path, we must change this for multi-service. Perhaps we can instead add the path to the root folder, such as $DRUID_HOME/single-server/micro-quickstart. If we do that, then Druid will run in two distinctive configurations:

  • Single-service mode: current class path with entires for common and service-specific configuration.
  • Bundled mode: class path contains the root configuration directory, code must work out the paths to the various files.

However, the _common folder contains the log4j2.xml file as well, read by the logger. So, it seems that _common must remain on the class path, but the service directory can be inferred.

Workable Solution

To work around the above limitations, we must work with the current property structure, but we cannot put the service configuration files on the class path. Instead, we must load them for each service individually.

For bundled mode:

  • If druid.properties.file is used, the file must contain all properties, including those for specific services. Do not read the class path config files.
  • If the normal mechanism is to be used, include only _common on the class path.
    • The common.runtime.properties file must exist. If not, raise an error.
    • Load common.runtime.properties at startup. Use these properties to configure extensions, etc.
    • Remember the directory in which common.runtime.properties. Assume this is the root of the property file tree.
    • When we launch a service, and create its specific injector, create a new set of properties which include the service-specific properties.

For normal operation, we retain the current behavior, except we tidy up handling of the druid.properties.file system property.

  • If druid.properties.file is used, the file must contain all properties, including those for desired service. Do not load any class path configuration files.
  • If the normal mechanism is to be used, include both _common and the service directory on the class path.
  • Load both files at startup time.
  • When creating the specific service injector, reuse the startup properties.

Other Issues

Druid reports properties to the log on startup:

2021-11-19T21:10:52,420 INFO [main] org.apache.druid.cli.CliBroker - Starting up with processors[12], memory[268,435,456], maxMemory[4,294,967,296]. Properties follow.
2021-11-19T21:10:52,421 INFO [main] org.apache.druid.cli.CliBroker - * awt.toolkit: sun.lwawt.macosx.LWCToolkit
2021-11-19T21:10:52,421 INFO [main] org.apache.druid.cli.CliBroker - * druid.broker.cache.populateCache: false
...

Note, however, that startup has to be cross-service, but each service will have its own properties. This relates to the other topics for teasing apart per-server and per-service issues.

Alternative: Enhanced Property System

Some key issues to overcome include:

  • Properties reside in multiple files in multiple directories.
  • Names vary: there is a combined coordinator-overlord name when those two services are combined.
  • Names are duplicated: druid.service and druid.plaintextPort for example.
  • Properties are pooled and injected into services which need them.
  • Properties are defined within Druid itself: servicePort and tlsServicePort. Such names are duplicated across services so the one-and-only Jetty knows where to find them.

This leads to a possible solution: a revised property abstraction.

  • Instead of binding Properties to Guice's property injector, bind a new class.
  • Require that all non-global properties are prefixed by service. druid.service becomes druid.broker.service, etc.
  • The property abstraction manages loading of properties in various ways, and adds the service prefix based on directories.
  • Have the property system resolve names based on some rules.

Some of this works only once the Jetty-based web server can handle multiple connectors.

Evolution

How can the above be done step-by-step?

  • Replace Properties with a new abstraction that, for now, just mimics Properties.
  • Add logic to create prefixed names from unprefixed names.
  • Allow both prefixed and unprefixed names for service-specific properties.
  • Add service prefixes where easily done now. Note the challenges.
  • Resolve challenges.
  • Modify property service to warn when accessing non-prefixed names. Resolve issues.
  • Enforce prefixed names.
Clone this wiki locally