Single Process Guice Configuration

This page discusses how Druid uses Guice to configure services and how the behavior must change to support a Single Process Druid.

Guice Basics

Druid uses Google Guice to manage dependencies. (Tutorial). Key points:

Configuration is done via an Injector. The "universe" of bindings available to an application is defined by that Injector. Fortunately, it appears Guice allows the creation of multiple, independent Injectors.
A Module is a collection of bindings.

The Grapher extension visualizes the dependency graph.

The module inspection API allows inspecting the contents of modules.

Druid Injection "Phases"

Guice configuration in Druid happens in two phases.

Startup phase: only enough configuration is provided to prepare the CLI commands.
Service phase: substantial additional configuration is added specific to each service, as well as selected resources from the startup phase.

The challenge in this project is that, if we want to run multiple services, we must perform the service phase independently for each service.

Startup Injector

Configuration starts with a "startup injector":

    final Injector injector = GuiceInjectors.makeStartupInjector();

The startup injector provides a set of service-independent modules defined in GuiceInjectors. Since these are independent of services, we can safely ignore these modules for this project. Next, the code connects the CLI mechanism with the startup Guice configuration:

      final Runnable command = cli.parse(args);
      injector.injectMembers(command);
      command.run();

Note that the code here and elsewhere is simplified: error handling and other non-essential cruft is removed.

The injectMembers method works around the fact that the CLI created the instance without dependency injection: we ask Guice to inject dependencies into member variables after creation. At this point, all we can inject are the startup modules mentioned above. Hence, dependency injection here is still independent of the service. For example, for the base ServerRunnable class:

    @Inject
    @Self
    private DruidNode druidNode;

    @Inject
    private DruidNodeAnnouncer announcer;

    @Inject
    private ServiceAnnouncer legacyAnnouncer;

    @Inject
    private Lifecycle lifecycle;

    @Inject
    private Injector injector;

The list of entries in the startup injector is short:

Key[type=com.google.inject.Stage, annotation=[none]]: InstanceBindingImpl
Key[type=com.google.inject.Injector, annotation=[none]]: ProviderInstanceBindingImpl
Key[type=java.util.logging.Logger, annotation=[none]]: ProviderInstanceBindingImpl
Key[type=com.fasterxml.jackson.databind.ObjectMapper, annotation=[none]]: LinkedBindingImpl
Key[type=com.fasterxml.jackson.databind.ObjectMapper, [email protected]]: ProviderMethodProviderInstanceBindingImpl
Key[type=com.fasterxml.jackson.databind.ObjectMapper, [email protected]]: ProviderMethodProviderInstanceBindingImpl
Key[type=com.fasterxml.jackson.databind.ObjectMapper, [email protected]]: ProviderMethodProviderInstanceBindingImpl
Key[type=java.util.Properties, annotation=[none]]: InstanceBindingImpl
Key[type=javax.validation.Validator, annotation=[none]]: InstanceBindingImpl
Key[type=org.skife.config.ConfigurationObjectFactory, annotation=[none]]: ProviderMethodProviderInstanceBindingImpl
Key[type=com.google.common.base.Supplier<org.apache.druid.common.config.NullValueHandlingConfig>, annotation=[none]]: ProviderInstanceBindingImpl
Key[type=org.apache.druid.common.config.NullValueHandlingConfig, annotation=[none]]: ProviderInstanceBindingImpl
Key[type=com.google.common.base.Supplier<org.apache.druid.math.expr.ExpressionProcessingConfig>, annotation=[none]]: ProviderInstanceBindingImpl
Key[type=org.apache.druid.math.expr.ExpressionProcessingConfig, annotation=[none]]: ProviderInstanceBindingImpl
Key[type=com.google.common.base.Supplier<org.apache.druid.guice.ExtensionsConfig>, annotation=[none]]: ProviderInstanceBindingImpl
Key[type=org.apache.druid.guice.ExtensionsConfig, annotation=[none]]: ProviderInstanceBindingImpl
Key[type=com.google.common.base.Supplier<org.apache.druid.guice.ModulesConfig>, annotation=[none]]: ProviderInstanceBindingImpl
Key[type=org.apache.druid.guice.ModulesConfig, annotation=[none]]: ProviderInstanceBindingImpl
Key[type=org.apache.druid.guice.JsonConfigurator, annotation=[none]]: ConstructorBindingImpl
Key[type=org.apache.druid.guice.DruidSecondaryModule, annotation=[none]]: ConstructorBindingImpl

Note the Properties key. See Properties for more information.

Jersey Configuration

Jerseys appears to be a global collection of JSR 311 implementations used to (define the REST API?)

Service Command Execution

The following steps occur to start a service:

The CLI creates a command (runnable) for that service, but without Guice configuration.
Main injects dependencies into that command and runs the command.
The command creates a new per-service injector with two sets of resourcs:
- A set common to all services (but not used by other Druid CLI commands)
- A set unique to each service
The above process registers (what) with the LifeCycle class.
A miracle occurs, and the service starts running. (Details obviously needed.)

The GuiceRunnable[https://github.com/apache/druid/blob/master/services/src/main/java/org/apache/druid/cli/GuiceRunnable.java] class is the base class for all service commands. It receives the startup injector during command injection:

  @Inject
  public void configure(Injector injector)
  {
    this.baseInjector = injector;
  }

Note that we inject the startup injector twice: once here and once as a member variable. Not sure if this is a bug or a feature.

Service Configuration

When the command is run, we do the detailed work of building the per-service injector, populating the lifecycle, and starting the service. This all happens in ServerRunnable.run():

  public void run()
  {
    final Injector injector = makeInjector();
    final Lifecycle lifecycle = initLifecycle(injector);
    lifecycle.join();
  }

The above basically boils down to three steps:

Service configuration: makeInjector()
Start the service: initLifecycle(injector)
Wait for the service to complete: lifecycle.join()

Of these, the last is a bit tricky because each service will manage a set of threads, and all must be launched and waited upon rather than just one today.

Server Module Configuration

In GuiceRunnable.makeInjector():

  public Injector makeInjector()
  {
    return Initialization.makeInjectorWithModules(baseInjector, getModules());
  }

The baseInjector is the startup injector we saw injected above. The called method is:

  public static Injector makeInjectorWithModules(final Injector baseInjector, Iterable<? extends Module> modules)
  {
    final ModuleList defaultModules = new ModuleList(baseInjector);
    ...
    return Guice.createInjector(Modules.override(intermediateModules).with(extensionModules.getModules()));
  }

The second parameter is a list of modules specific to the service we want to run. The bulk of the method:

Defines a list of modules common to each service.
Overrides above with the per-service modules.
Overrides the above list with extension modules.
Excludes any modules marked for exclusion in the properties file.
Creates a per-service injector.

The above process creates a new injector separate from the startup injector. Most, but not all, resources in the startup injector end up in the per-service injector.

Although Guice offers injector inheritance, the above code does not use it. (It is not clear if this is due to the Guice feature being added after the above code was written, or if Druid needs behavior different from that offered by Guice.)

The last line creates a new injector which essentially copies all modules from the startup injector, and adds a large set shared by all services. The good news is that each service gets its own injector, and so its own set of mappings. The bad news is that, if we allow multiple services to run, the common services will be duplicated across services, possibly resulting in multiple copies of objects which should be singletons.

The resulting injector contains many hundreds of keys, far too many to list here. It is easier to list those startup modules which do not appear in the per-service injector:

com.google.common.base.Supplier<org.apache.druid.common.config.NullValueHandlingConfig>
com.google.common.base.Supplier<org.apache.druid.guice.ExtensionsConfig>
com.google.common.base.Supplier<org.apache.druid.guice.ModulesConfig>
com.google.common.base.Supplier<org.apache.druid.math.expr.ExpressionProcessingConfig>
org.apache.druid.common.config.NullValueHandlingConfig
org.apache.druid.guice.DruidSecondaryModule
org.apache.druid.guice.ExtensionsConfig
org.apache.druid.guice.ModulesConfig
org.apache.druid.math.expr.ExpressionProcessingConfig

Note also that three resource above seem out of place: ExpressionProcessingConfig, NullValueHandlingConfig, ExpressionProcessingConfig: these seem more like runtime things than startup things. Another mystery.

`DruidSecondaryModule`

Note that, in the list of resource differences above, the Properties class does not appear in the list of differences, which means it was somehow added to the per-service injector. How? The answer is the DruidSecondaryModule class.

public class DruidSecondaryModule implements Module
{
  ...

  @Inject
  public DruidSecondaryModule(
      Properties properties,
      ConfigurationObjectFactory factory,
      @Json ObjectMapper jsonMapper,
      @JsonNonNull ObjectMapper jsonMapperOnlyNonNullValueSerialization,
      @Smile ObjectMapper smileMapper,
      Validator validator
  )
  ...

  @Override
  public void configure(Binder binder)
  {
    binder.install(new DruidGuiceExtensions());
    binder.bind(Properties.class).toInstance(properties);
    binder.bind(ConfigurationObjectFactory.class).toInstance(factory);
    binder.bind(ObjectMapper.class).to(Key.get(ObjectMapper.class, Json.class));
    binder.bind(Validator.class).toInstance(validator);
    binder.bind(JsonConfigurator.class);
  }

Essentially, this module uses dependencies to capture those resources from the startup injector which should appear in the "secondary" per-service injector. This result in an asymmetry: items that are provided as modules in startup are bound as specific resources here. Nevertheless, we see how the Properties object makes its way from the startup injector to the secondary injector.

`ModuleList`

The ModuleList class is a bit more than a list:

Maintain a list of modules
Exclude modules marked for exclusion in the properties file.
Inject dependencies into each module (using the startup injector)
For Druid modules, gather Jackson modules.

Module list grabs a few modules from the startup injector for its own use. However, it does not seem to copy startup injector modules to the new module list.

    public ModuleList(Injector baseInjector)
    {
      this.baseInjector = baseInjector;
      this.modulesConfig = baseInjector.getInstance(ModulesConfig.class);
      this.jsonMapper = baseInjector.getInstance(Key.get(ObjectMapper.class, Json.class));
      this.smileMapper = baseInjector.getInstance(Key.get(ObjectMapper.class, Smile.class));
      this.modules = new ArrayList<>();
    }

Module Overrides

Druid makes heavy use of module "overrides": allowing one bit of code to replace resources defined by another bit. In particular:

The startup injector defines Jackson configuration which is ad-hoc overridden in DruidSecondaryModule by providing revised versions of the resource available in the startup injector.
The common.runtime.properties file defines properties which can be overridden by the service-specific runtime.properties file.
The Initialization class defines a set of "default" modules which can be overridden by the service-specific getModules() list.
Extensions can override any of the default or per-service modules.
Configuration properties can remove any module defined in the default, per-service, and extension modules.

In an ideal world, overrides would be dynamic: each layer defines its resources, and the lower level overrides some of them, while adding new ones. Fortunately, the Properties system works that way. Unfortunately, Guice doesn't (probably for good reasons having to do with dependencies.)

DruidSecondaryModule is a special case: it seems to reconfigure (?) Jackson based on resources that have become available in the per-service configuration. This presents a challenge similar to the one described next.

Guice provides injector inheritance, but "No key may be bound by both an injector and one of its ancestors." Instead, Guice provides Modules.override which "creates a module that overlays override modules over the given modules." Thus, overrides are done "statically" via the list of modules, not "dynamically" in the injector.

The default modules appear to include many that should be created once: security, storage, etc. These would seem to be candidates for the common, shared injector. However, any module can be overridden by a service or an extension. This results in a (perhaps hypothetical) case in which the default resources define a singleton which is defined differently for, say, the broker vs. the historical. The module is shared, but the resource is specific. Since the broker (say) depends on the default behavior, and the historical (say) depends on its own override, the default behavior cannot appear in a common parent injector: we'd end up with a key conflict. Instead, the common resource has to be defined in the broker-specific injector.

However, in the normal case (security and storage again), we probably do only want one shared instance. The question is: how can we know the difference between these two cases? We'd want to create the common modules before we create the per-service modules. But, we need to know the set of per-service modules to know which common modules to push into the per-service list.

This is the truly hard problem for this project!

Service-Specific Modules

The getModules() defines modules unique to each service. For example, for the Historical:

  protected List<? extends Module> getModules()
  {
    return ImmutableList.of(
        new DruidProcessingModule(),
        new QueryableModule(),
        new QueryRunnerFactoryModule(),
        new JoinableFactoryModule(),
        binder -> {
          binder.bindConstant().annotatedWith(Names.named("serviceName")).to("druid/historical");
          binder.bindConstant().annotatedWith(Names.named("servicePort")).to(8083);
          binder.bindConstant().annotatedWith(Names.named("tlsServicePort")).to(8283);
          ...
          binder.bind(ServerManager.class).in(LazySingleton.class);
          binder.bind(SegmentManager.class).in(LazySingleton.class);
          binder.bind(ZkCoordinator.class).in(ManageLifecycle.class);
          ...
          Jerseys.addResource(binder, QueryResource.class);
          Jerseys.addResource(binder, SegmentListerResource.class);
          Jerseys.addResource(binder, HistoricalResource.class);
          LifecycleModule.register(binder, QueryResource.class);
          ...
          },
        ...
    );

There is quite a bit going on: only a few samples are shown that show the issues we must consider.

Bindings

First, we are advised to "think of Guice as being a map". We see some examples of this basic concept where we bind constants to keys:

          binder.bindConstant().annotatedWith(Names.named("serviceName")).to("druid/historical");

If we run multiple services, then clearly we need a distinct map for each service to manage such bindings.

In other cases, we rely on some indirection. For example, for the segment walker:

          binder.bind(ServerManager.class).in(LazySingleton.class);

Which is implicitly bound to the QuerySegmentWalker interface in:

public class QueryLifecycleFactory
{
  ...

  @Inject
  public QueryLifecycleFactory(
      final QueryToolChestWarehouse warehouse,
      final QuerySegmentWalker texasRanger,

As a result, in the Historical, we bind QuerySegmentWalker to ServerManager. But, in the Broker:

          binder.bind(CachingClusteredClient.class).in(LazySingleton.class);

Note that there is no explicit knowledge that we're binding to the interface: Guice evidently figures it out on the fly. This seems to use Guice linked bindings. Again, this is not an issue if each service has its own injector (Guice map).

List of Modules

The returned ModuleList holds the modules for the service. One of those modules is for the service itself and is defined inline via a lambda:

        binder -> {
          binder.bindConstant().annotatedWith(Names.named("serviceName")).to("druid/historical");
          ...

Druid Guice Extensions

Druid adds several extensions to the basic Guice story. Most of these live in the ModuleList class. ModuleList works with two injectors, the explicit baseInjector, and the to-be-created service injector. Essentially, each module added to ModuleList is injected with dependencies defined in the baseInjector, giving a module four stages of configuration:

Empty constructor (no configuration done here)
Member dependency injection from the base (startup) injector.
Guice-called configure() method in which the module adds its configuration to the target service injector.
Guice injects dependencies into other objects using the bindings defined in the module.

In addition, the module list allows modules to be excluded via a configuration setting in ModulesConfig.

The "default" (cross-service) modules are added to a ModuleList directly. The per-service modules are added first to a List, then copied into a ModuleList, ensuring that all modules go through the above process.

Coordinator-as-Overlord Option

Druid allows the coordinator and overlord to run within a single process by setting the druid.coordinator.asOverlord.enabled property. This is done by launching Druid with server coordinator and using the configuration property to also request running as Overlord. Module configuration within the coordinator does:

Create a "wrapper" module with all the coordinator modules.
If running as Overlord, create another wrapper module with all the Overlord modules.
Exclude the common LookupSerdeModule module.
Overlord properties are copied into the coordinator's runtime.properties file, and the config directory is called coordinator-overlord.
Both the Overlord and Coordinator endpoints are bound to the Coordinator's port (there is no separate port for the Overlord.)

This approach raises questions:

Why create the "wrapper" modules rather than just adding the service modules to the single big list?
Why remove the LookupSerdeModule rather than relying on the override logic to do this for us?
How is the service name and port bound if we can't use the global servicePort and tlsServicePort bindings?

Single-Injector Solution

The multi-inector approach (see below) turns out to be rather too invasive. An alternative is to entertain a single injector solution: all services are configured via a single "uber" injector. Some challenges to resolve for this approach are:

The same key is used to bind different resources for different services. The current design relies on different configs in different processes to resolve conflicts in these "global" bindings.
The same "properties" key is used to bind different configuration for different services. Each service reuses the same property names (such as druid.service), and again relies on different processes to disambiguate the otherwise-duplicated names. (See [Properties|Single Process Property Files]] for a possible solution.)

Evolution

Steps to move in this direction:

Revert ill-fated split injector approach.
Instrument the Guice override mechanism to capture conflicts.
Apply the above to the combined coordinator/overlord config. (Which, fortunately, is the default for micro-quickstart.)
Refactor coordinator and overlord to build a single module list.
Let override remove the duplicate module.
Modify to run the two services on distinct ports. (Needs Jetty, announcer support first.)

Alternative - Per-Server and Per-Service Configuration

In a nutshell, the Initialization.makeInjectorWithModules() method must be split. The modules listed in that method are common to all services, while the modules defined in the modules parameter (created by GuiceRunnable.getModules()) must be defined per service.

Fortunately, Guice 2.0 and later defines the idea of a child injector. The rough design is then to use three injectors:

           initialization injector
                     ^
                     |
               server injector 
               ^            ^
               |            |
  broker injector          historical injector

In this way, objects which are different for each service go into the service injector; those which are for the server (and shared across multiple services) go into the server injector. The initialization injector exists. The server injector would be created as a child to hold the modules from makeInjectorWithModules() while the service injector is a child of the server injector which holds the modules created in getModules(). (The server injector is not absolutely necessary, presenting it just simplifies the discussion.)

The trick is that some seemingly-shared resources (such as the LifecycleModule) hold state for a single service and so must be adjusted so that it can live in the service injector. (Which other modules are similar?)

Service-Specific Injection

The code already creates a new injector for the service to run. It uses an ad-hoc way to share objects and modules with the startup injector.

Determine if the current structure will work for for bundled services.
Add the per-service properties configuration.

Investigation Results - Not Practical

This approach, while elegant, did not turn out to work in practice because of the way Druid is designed:

The Lifecycle is a shared server-level resource. As services are configured, they add items to the lifecycle, which registers them in a Guice multi-binding. However, if the key has been registered in a base injector, it cannot be changed in a service-specific child injector. And, if it hasn't been registered in the base injector, it seems some code will fail. (Details needed.)
Modules depend on one another. Service-specific modules inject resources that common modules need. This can't be done if the common injector is finalized before the service-specific injectors. To create the common injector, the Guice must have visibility to the service resources. Yet, in a split-injector model, service-specific resources are not visible to the common injector.

In short, Druid is not designed for a split injector: there are too many dependencies. Going down this route is likely possible, but would entail rather more work than is practical.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly