- Raise an exception when schema contains '.'. (#222)
- Containing a catalog in
schema
is not allowed anymore. - Need to explicitly use
catalog
instead.
- Containing a catalog in
- Support Python 3.11 (#233)
- Support
incremental_predicates
(#161) - Apply connection retry refactor, add defaults with exponential backoff (#137)
- Quote by Default (#241)
- Avoid show table extended command. (#231)
- Use show table extended with table name list for get_catalog. (#237)
- Fix copy into macro when passing
expression_list
. (#223) - Partially revert to fix the case where schema config contains uppercase letters. (#224)
- Show and log a warning when schema contains '.'. (#221)
- Support python model through run command API, currently supported materializations are table and incremental. (dbt-labs/dbt-spark#377, #126)
- Enable Pandas and Pandas-on-Spark DataFrames for dbt python models (dbt-labs/dbt-spark#469, #181)
- Support job cluster in notebook submission method (dbt-labs/dbt-spark#467, #194)
- In
all_purpose_cluster
submission method, a confighttp_path
can be specified in Python model config to switch the cluster where Python model runs.def model(dbt, _): dbt.config( materialized='table', http_path='...' ) ...
- In
- Use builtin timestampadd and timestampdiff functions for dateadd/datediff macros if available (#185)
- Implement testing for a test for various Python models (#189)
- Implement testing for
type_boolean
in Databricks (dbt-labs/dbt-spark#471, #188) - Add a macro to support COPY INTO (#190)
- Apply "Initial refactoring of incremental materialization" (#148)
- Now dbt-databricks uses
adapter.get_incremental_strategy_macro
instead ofdbt_spark_get_incremental_sql
macro to dispatch the incremental strategy macro. The overwrittendbt_spark_get_incremental_sql
macro will not work anymore.
- Now dbt-databricks uses
- Better interface for python submission (dbt-labs/dbt-spark#452, #178)
- Explicitly close cursors (#163)
- Upgrade databricks-sql-connector to 2.0.5 (#166)
- Embed dbt-databricks and databricks-sql-connector versions to SQL comments (#167)
- Support Python 3.10 (#158)
- Add grants to materializations (dbt-labs/dbt-spark#366, dbt-labs/dbt-spark#381)
- Add
connection_parameters
for databricks-sql-connector connection parameters (#135)- This can be used to customize the connection by setting additional parameters.
- The full parameters are listed at Databricks SQL Connector for Python.
- Currently, the following parameters are reserved for
dbt-databricks
. Please use the normal credential settings instead.- server_hostname
- http_path
- access_token
- session_configuration
- catalog
- schema
- Incremental materialization updated to not drop table first if full refresh for delta lake format, as it already runs create or replace table (dbt-labs/dbt-spark#286, dbt-labs/dbt-spark#287)
- Update
SparkColumn.numeric_type
to returndecimal
instead ofnumeric
, since SparkSQL exclusively supports the former (dbt-labs/dbt-spark#380) - Make minimal changes to support dbt Core incremental materialization refactor (dbt-labs/dbt-spark#402, dbt-labs/dbt-spark#394, #136)
- Add new basic tests
TestDocsGenerateDatabricks
andTestDocsGenReferencesDatabricks
(#134) - Set upper bound for
databricks-sql-connector
when Python 3.10 (#154)- Note that
databricks-sql-connector
does not officially support Python 3.10 yet.
- Note that
- Support for Databricks CATALOG as a DATABASE in DBT compilations (#95, #89, #94, #105)
- Setting an initial catalog with
session_properties
is deprecated and will not work in the future release. Please usecatalog
ordatabase
to set the initial catalog. - When using catalog,
spark_build_snapshot_staging_table
macro will not be used. If trying to override the macro,databricks_build_snapshot_staging_table
should be overridden instead.
- Setting an initial catalog with
- Block taking jinja2.runtime.Undefined into DatabricksAdapter (#98)
- Avoid using Cursor.schema API when database is None (#100)
- Drop databricks-sql-connector 1.0 (#108)
- Add support for Delta constraints (#71)
- Port testing framework changes from dbt-labs/dbt-spark#299 and dbt-labs/dbt-spark#314 (#70)
- Make internal macros use macro dispatch pattern (#72)
- Support for setting table properties as part of a model configuration (#33, #49)
- Get the session_properties map to work (#57)
- Bump up databricks-sql-connector to 1.0.1 and use the Cursor APIs (#50)
- Inherit from dbt-spark for backward compatibility with spark-utils and other dbt packages (#32, #35)
- Add SQL Endpoint specific integration tests (#45, #46)
- Make the connection use databricks-sql-connector (#3, #7)
- Make the default file format 'delta' (#14, #16)
- Make the default incremental strategy 'merge' (#23)
- Remove unnecessary stack trace (#10)
- Incremental materialization corrected to respect
full_refresh
config, by usingshould_full_refresh()
macro (#260, #262)
- Add support for Apache Hudi (hudi file format) which supports incremental merge strategies (#187, #210)
- Refactor seed macros: remove duplicated code from dbt-core, and provide clearer logging of SQL parameters that differ by connection method (#249, #250)
- Replace
sample_profiles.yml
withprofile_template.yml
, for use with newdbt init
(#247)
- Remove official support for python 3.6, which is reaching end of life on December 23, 2021 (dbt-core#4134, #253)
- Add support for structured logging (#251)
- Fix
--store-failures
for tests, by suppressing irrelevant error incomment_clause()
macro (#232, #233) - Add support for
on_schema_change
config in incremental models:ignore
,fail
,append_new_columns
. Forsync_all_columns
, removing columns is not supported by Apache Spark or Delta Lake (#198, #226, #229) - Add
persist_docs
call to incremental model (#224, #234)
- Enhanced get_columns_in_relation method to handle a bug in open source deltalake which doesnt return schema details in
show table extended in databasename like '*'
query output. This impacts dbt snapshots if file format is open source deltalake (#207) - Parse properly columns when there are struct fields to avoid considering inner fields: Issue (#202)
- Add
unique_field
to better understand adapter adoption in anonymous usage tracking (#211)
- @harryharanb (#207)
- @SCouto (#204)
- Add pyodbc import error message to dbt.exceptions.RuntimeException to get more detailed information when running
dbt debug
(#192) - Add support for ODBC Server Side Parameters, allowing options that need to be set with the
SET
statement to be used (#201) - Add
retry_all
configuration setting to retry all connection issues, not just when the_is_retryable_error
function determines (#194)
- @JCZuurmond (#192)
- @jethron (#201)
- @gregingenii (#194)
- Fix column-level
persist_docs
on Delta tables, add tests (#180)
- Allow user to specify
use_ssl
(#169) - Allow setting table
OPTIONS
usingconfig
(#171) - Add support for column-level
persist_docs
on Delta tables (#84, #170)
- Cast
table_owner
to string to avoid errors generating docs (#158, #159) - Explicitly cast column types when inserting seeds (#139, #166)
- Parse information returned by
list_relations_without_caching
macro to speed up catalog generation (#93, #160) - More flexible host passing, https:// can be omitted (#153)
- @friendofasquid (#159)
- @franloza (#160)
- @Fokko (#165)
- @rahulgoyal2987 (#169)
- @JCZuurmond (#171)
- @cristianoperez (#170)
- Update serialization calls to use new API in dbt-core
0.19.1b2
(#150)
- Incremental models have
incremental_strategy: append
by default. This strategy adds new records without updating or overwriting existing records. For that, usemerge
orinsert_overwrite
instead, depending on the file format, connection method, and attributes of your underlying data. dbt will try to raise a helpful error if you configure a strategy that is not supported for a given file format or connection. (#140, #141)
- Capture hard-deleted records in snapshot merge, when
invalidate_hard_deletes
config is set (#109, #126)
- Users of the
http
andthrift
connection methods need to install extra requirements:pip install dbt-spark[PyHive]
(#109, #126)
- Enable
CREATE OR REPLACE
support when using Delta. Instead of dropping and recreating the table, it will keep the existing table, and add a new version as supported by Delta. This will ensure that the table stays available when running the pipeline, and you can track the history. - Add changelog, issue templates (#119, #120)
- Handle case of 0 retries better for HTTP Spark Connections (#132)
- @danielvdende (#132)
- @Fokko (#125)
- Allows users to specify
auth
andkerberos_service_name
(#107) - Add support for ODBC driver connections to Databricks clusters and endpoints (#116)