Skip to content

Metadata Storage

Paul Rogers edited this page Apr 29, 2022 · 1 revision

Basics

Uses Apache DBCP to cache DB connections.

MetadataStorageConnectorConfig - configuration for metadata storage, but only for Derby?

MetadataStorageTablesConfig - configuration for table names.

The druid.metadata.storage.type property is set to the name of the preferred supported DB. SQLMetadataStorageDruidModule binds the default choice for many classes to this property, so that Guice will return those associated with the property value. The logic seems to be that if a class is bound to a DB-specfic key, then that class is used. Else, the default ("derby") value is used. Kind of odd. I think what this means is that a non-SQL solution could replace all code that uses the DB. SQL-related code can replace the lower-level bits.

SQLMetadataStorageDruidModule - Base module for a SQL-based metadata store. Defines the common SQL manager and provider classes, leaving the derived types to define the DB-specific classes:

  • MetadataStorage
  • `MetadataStorageProvider
  • MetadataStorageConnector
  • SQLMetadataConnector
  • MetadataStorageActionHandlerFactory

MetadataStorage - base class for...?

DerbyMetadataStorage - concrete implementation for Derby.

DerbyMetadataStorageProvider - provider for the above.

DerbyMetadataStorageDruidModule - Guice module to enable Derby.

But, where do we get the concrete implementations for MySQL, etc?

NoopMetadataStorageProvider - provider of a do-nothing metadata store.

CreateTables is a CLI command that creates tables, again using ad-hoc code. No tests.

  • MetadataStorageConnector - interface for each DB connector. Handles a motley collection of low-level DB issues.
    • SQLMetadataConnector - only concrete subclass
      • DerbyConnector - "production" Derby connector
        • TestDerbyConnector - test version of the above
      • PostgreSQLConnector
      • TestSQLMetadataConnector for testing

Where is the MySQL version?

While the SQLMetadataConnector contains code to create tables, the DB-specific implementations are not required to use it: they could create tables another way.

Testing

No tests for DerbyMetadataStorageDruidModule, MetadataStorageConnectorConfig, DerbyMetadataStorage, DerbyMetadataStorageProvider. It seems that tests use TestDerbyConnector instead.

Testing of table creation handles as a test-specific one-off in SQLMetadataConnectorTest.

SQLMetadataConnectorTest performs a light-weight test of MetadataStorageConnector.

Usage

IndexerSQLMetadataStorageCoordinator - Creates the data source table, pending segments table and segment table and does a variety of actions.

SqlSegmentsMetadataManagerProvider - Creates the segment table and does a variety of actions.

Each of the above does table creation on the lifecyle start method. That is, table creation/upgrade is not centralized: it is distributed. Since there is no effective upgrade, it is not tested.

JacksonConfigManagerModule: handles "dynamic configuration"? Not unit tested. ConfigManager tested in ConfigManagerTest.

Catalog Design

Start with the current structure, and see what we learn.

  • CatalogManager to create the tables and manage the catalog.
  • CatalogModule to create the required singletons.

Tackling the Problem

  • Create unit tests for code without tests.
    • CreateTables
    • Production derby connector
  • Create a DB driver of some sort to handle table creation and upgrade.

Database Service

Idea is to collect database-related logic in one place. Configured in Guice using a lifecycle. Handles DB creation and evolution.

  • Holds a MetadataStorageConnector
  • Creates DB stuff on lifecycle start.
  • Gracefully exits the DB on lifecycle stop.

Create the wrapper and test. Then, migrate each usage one by one.

Create a test fixture which builds the DB from Guice, configured to use an in-memory Derby.

ConfigManager

Change to take an injected database service.

Modify ConfigManagerTest to perform real testing against a real DB.

Clone this wiki locally