Name		Name	Last commit message	Last commit date
parent directory ..
coldstart		coldstart
images		images
.terraform.lock.hcl		.terraform.lock.hcl
README.md		README.md
akv.tf		akv.tf
cold_start_metastore.tf		cold_start_metastore.tf
main.tf		main.tf
outputs.tf		outputs.tf
providers.tf		providers.tf
secrets.tf		secrets.tf
sqlserver.tf		sqlserver.tf
terraform.tfvars		terraform.tfvars
variables.tf		variables.tf
vnet.tf		vnet.tf
workspace.tf		workspace.tf

README.md

ADB workspace with external hive metastore

Credits to [email protected] and [email protected] for notebook logic for database initialization steps. This architecture will be deployed:

Get Started:

This template will complete 99% process for external hive metastore deployment with Azure Databricks, using hive version 3.1.0. The last 1% step is just to run only once a pre-deployed Databricks job to initialize the external hive metastore. After successful deployment, your cluster can connect to external hive metastore (using azure sql database).

On your local machine:

Clone this repository to local.
Update terraform.tfvars file and provide values to each defined variable. Some variabes may have default values defined in variables.tf file.
For step 2, variables for db_username and db_password, you can also use your environment variables: terraform will automatically look for environment variables with name format TF_VAR_xxxxx.

export TF_VAR_db_username=yoursqlserveradminuser

export TF_VAR_db_password=yoursqlserveradminpassword
Init terraform and apply to deploy resources:

terraform init

terraform apply

Now we log into the Databricks workspace, such that you are added into the workspace (since the user identity deployed workspace and have at least Contributor role on the workspace, upon lauching workspace, user identity will be added as workspace admin).

Once logged into workspace, the final step is to manually trigger the pre-deployed job.

Go to databricks workspace - Job - run the auto-deployed job only once; this is to initialize the database with metastore schema.

Then you can verify in a notebook:

We can also check inside the sql db (metastore), we've successfully linked up cluster to external hive metastore and registered the table here:

Now you can config all other clusters to use this external metastore, using the same spark conf and env variables of cold start cluster.

Notes: Migrate from your existing managed metastore to external metastore

Refer to tutorial: https://kb.databricks.com/metastore/create-table-ddl-for-metastore.html

dbs = spark.catalog.listDatabases()
for db in dbs:
    f = open("your_file_name_{}.ddl".format(db.name), "w")
    tables = spark.catalog.listTables(db.name)
    for t in tables:
        DDL = spark.sql("SHOW CREATE TABLE {}.{}".format(db.name, t.name))
        f.write(DDL.first()[0])
        f.write("\n")
    f.close()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adb-external-hive-metastore

adb-external-hive-metastore

README.md

ADB workspace with external hive metastore

Get Started:

Notes: Migrate from your existing managed metastore to external metastore

Files

adb-external-hive-metastore

Directory actions

More options

Directory actions

More options

Latest commit

History

adb-external-hive-metastore

Folders and files

parent directory

README.md

ADB workspace with external hive metastore

Get Started:

Notes: Migrate from your existing managed metastore to external metastore