From b61d83a5837274a522d35126610ebeb271e1d957 Mon Sep 17 00:00:00 2001 From: "Talla, Mohan" Date: Tue, 3 Dec 2024 14:33:32 +0530 Subject: [PATCH 1/3] Updated setup and config pages --- .../connect-data-platform/teradata-setup.md | 20 ++++++++-- .../resource-configs/teradata-configs.md | 37 ++++++++++++------- 2 files changed, 41 insertions(+), 16 deletions(-) diff --git a/website/docs/docs/core/connect-data-platform/teradata-setup.md b/website/docs/docs/core/connect-data-platform/teradata-setup.md index f4ffbe37f35..6c9d2f46a41 100644 --- a/website/docs/docs/core/connect-data-platform/teradata-setup.md +++ b/website/docs/docs/core/connect-data-platform/teradata-setup.md @@ -95,7 +95,6 @@ Parameter | Default | Type | Description `browser_tab_timeout` | `"5"` | quoted integer | Specifies the number of seconds to wait before closing the browser tab after Browser Authentication is completed. The default is 5 seconds. The behavior is under the browser's control, and not all browsers support automatic closing of browser tabs. `browser_timeout` | `"180"` | quoted integer | Specifies the number of seconds that the driver will wait for Browser Authentication to complete. The default is 180 seconds (3 minutes). `column_name` | `"false"` | quoted boolean | Controls the behavior of cursor `.description` sequence `name` items. Equivalent to the Teradata JDBC Driver `COLUMN_NAME` connection parameter. False specifies that a cursor `.description` sequence `name` item provides the AS-clause name if available, or the column name if available, or the column title. True specifies that a cursor `.description` sequence `name` item provides the column name if available, but has no effect when StatementInfo parcel support is unavailable. -`connect_failure_ttl` | `"0"` | quoted integer | Specifies the time-to-live in seconds to remember the most recent connection failure for each IP address/port combination. The driver subsequently skips connection attempts to that IP address/port for the duration of the time-to-live. The default value of zero disables this feature. The recommended value is half the database restart time. Equivalent to the Teradata JDBC Driver `CONNECT_FAILURE_TTL` connection parameter. `connect_timeout` | `"10000"` | quoted integer | Specifies the timeout in milliseconds for establishing a TCP socket connection. Specify 0 for no timeout. The default is 10 seconds (10000 milliseconds). `cop` | `"true"` | quoted boolean | Specifies whether COP Discovery is performed. Equivalent to the Teradata JDBC Driver `COP` connection parameter. `coplast` | `"false"` | quoted boolean | Specifies how COP Discovery determines the last COP hostname. Equivalent to the Teradata JDBC Driver `COPLAST` connection parameter. When `coplast` is `false` or omitted, or COP Discovery is turned off, then no DNS lookup occurs for the coplast hostname. When `coplast` is `true`, and COP Discovery is turned on, then a DNS lookup occurs for a coplast hostname. @@ -110,7 +109,7 @@ Parameter | Default | Type | Description `log` | `"0"` | quoted integer | Controls debug logging. Somewhat equivalent to the Teradata JDBC Driver `LOG` connection parameter. This parameter's behavior is subject to change in the future. This parameter's value is currently defined as an integer in which the 1-bit governs function and method tracing, the 2-bit governs debug logging, the 4-bit governs transmit and receive message hex dumps, and the 8-bit governs timing. Compose the value by adding together 1, 2, 4, and/or 8. `logdata` | | string | Specifies extra data for the chosen logon authentication method. Equivalent to the Teradata JDBC Driver `LOGDATA` connection parameter. `logon_timeout` | `"0"` | quoted integer | Specifies the logon timeout in seconds. Zero means no timeout. -`logmech` | `"TD2"` | string | Specifies the logon authentication method. Equivalent to the Teradata JDBC Driver `LOGMECH` connection parameter. Possible values are `TD2` (the default), `JWT`, `LDAP`, `KRB5` for Kerberos, or `TDNEGO`. +`logmech` | `"TD2"` | string | Specifies the logon authentication method. Equivalent to the Teradata JDBC Driver `LOGMECH` connection parameter. Possible values are `TD2` (the default), `JWT`, `LDAP`, `BROWSER`, `KRB5` for Kerberos, or `TDNEGO`. `max_message_body` | `"2097000"` | quoted integer | Specifies the maximum Response Message size in bytes. Equivalent to the Teradata JDBC Driver `MAX_MESSAGE_BODY` connection parameter. `partition` | `"DBC/SQL"` | string | Specifies the database partition. Equivalent to the Teradata JDBC Driver `PARTITION` connection parameter. `request_timeout` | `"0"` | quoted integer | Specifies the timeout for executing each SQL request. Zero means no timeout. @@ -210,7 +209,8 @@ For using cross-DB macros, teradata-utils as a macro namespace will not be used, ##### hash - `Hash` macro needs an `md5` function implementation. Teradata doesn't support `md5` natively. You need to install a User Defined Function (UDF): + `Hash` macro needs an `md5` function implementation. Teradata doesn't support `md5` natively. You need to install a User Defined Function (UDF) and optionally specify `md5_udf` [variable](https://docs.getdbt.com/docs/build/project-variables).
+ If not specified the code defaults to using `GLOBAL_FUNCTIONS.hash_md5`. See below instructions on how to install the custom UDF: 1. Download the md5 UDF implementation from Teradata (registration required): https://downloads.teradata.com/download/extensibility/md5-message-digest-udf. 1. Unzip the package and go to `src` directory. 1. Start up `bteq` and connect to your database. @@ -228,6 +228,12 @@ For using cross-DB macros, teradata-utils as a macro namespace will not be used, ```sql GRANT EXECUTE FUNCTION ON GLOBAL_FUNCTIONS TO PUBLIC WITH GRANT OPTION; ``` + Instruction on how to add md5_udf variable in dbt_project.yml for custom hash function: + ```yaml + vars: + md5_udf: Custom_database_name.hash_method_function + ``` + ##### last_day `last_day` in `teradata_utils`, unlike the corresponding macro in `dbt_utils`, doesn't support `quarter` datepart. @@ -241,6 +247,14 @@ dbt-teradata 1.8.0 and later versions support unit tests, enabling you to valida ## Limitations +### Browser Authentication +When running a dbt job with logmech set to "browser", the initial authentication opens a browser window where you must enter your username and password.
+After authentication, this window remains open, requiring you to manually switch back to the dbt console.
+For every subsequent connection, a new browser tab briefly opens, displaying the message "TERADATA BROWSER AUTHENTICATION COMPLETED," and silently reuses the existing session.
+However, the focus stays on the browser window, so you’ll need to manually switch back to the dbt console each time.
+This behavior is the default functionality of the teradatasql driver and cannot be avoided at this time.
+To prevent session expiration and the need to re-enter credentials, ensure the authentication browser window stays open until the job is complete. + ### Transaction mode Both ANSI and TERA modes are now supported in dbt-teradata. TERA mode's support is introduced with dbt-teradata 1.7.1, it is an initial implementation. diff --git a/website/docs/reference/resource-configs/teradata-configs.md b/website/docs/reference/resource-configs/teradata-configs.md index 89a2ff76fba..08b442e5b62 100644 --- a/website/docs/reference/resource-configs/teradata-configs.md +++ b/website/docs/reference/resource-configs/teradata-configs.md @@ -348,6 +348,18 @@ If a user sets some key-value pair with value as `'{model}'`, internally this `' - For example, if the model the user is running is `stg_orders`, `{model}` will be replaced with `stg_orders` in runtime. - If no `query_band` is set by the user, the default query_band used will be: ```org=teradata-internal-telem;appname=dbt;``` +## Unit Testing +* Unit testing is supported in dbt-teradata, allowing users to write and execute unit tests using the dbt test command. + * For detailed guidance, refer to the dbt documentation. + +* QVCI must be enabled in the database to run unit tests for views. + * Additional details on enabling QVCI can be found in the General section. + * Without QVCI enabled, unit test support for views will be limited. + * Users might encounter the following database error when testing views without QVCI enabled: + ``` + * [Teradata Database] [Error 3706] Syntax error: Data Type "N" does not match a Defined Type name. + ``` + ## valid_history incremental materialization strategy _This is available in early access_ @@ -361,26 +373,27 @@ In temporal databases, valid time is crucial for applications like historical re unique_key='id', on_schema_change='fail', incremental_strategy='valid_history', - valid_from='valid_from_column', - history_column_in_target='history_period_column' + valid_period='valid_period_col', + use_valid_to_time='no', ) }} ``` The `valid_history` incremental strategy requires the following parameters: -* `valid_from` — Column in the source table of **timestamp** datatype indicating when each record became valid. -* `history_column_in_target` — Column in the target table of **period** datatype that tracks history. +* `unique_key`: The primary key of the model (excluding the valid time components), specified as a column name or list of column names. +* `valid_period`: Name of the model column indicating the period for which the record is considered to be valid. The datatype must be `PERIOD(DATE)` or `PERIOD(TIMESTAMP)`. +* `use_valid_to_time`: Wether the end bound value of the valid period in the input is considered by the strategy when building the valid timeline. Use 'no' if you consider your record to be valid until changed (and supply any value greater to the begin bound for the end bound of the period - a typical convention is `9999-12-31` of ``9999-12-31 23:59:59.999999`). Use 'yes' if you know until when the record is valid (typically this is a correction in the history timeline). The valid_history strategy in dbt-teradata involves several critical steps to ensure the integrity and accuracy of historical data management: * Remove duplicates and conflicting values from the source data: * This step ensures that the data is clean and ready for further processing by eliminating any redundant or conflicting records. - * The process of removing duplicates and conflicting values from the source data involves using a ranking mechanism to ensure that only the highest-priority records are retained. This is accomplished using the SQL RANK() function. -* Identify and adjust overlapping time slices: - * Overlapping time periods in the data are detected and corrected to maintain a consistent and non-overlapping timeline. -* Manage records needing to be overwritten or split based on the source and target data: + * The process of removing primary key duplicates (ie. two or more records with the same value for the `unique_key` and BEGIN() bond of the `valid_period` fields) in the dataset produced by the model. If such duplicates exist, the row with the lowest value is retained for all non-primary-key fields (in the order specified in the model) is retained. Full-row duplicates are always de-duplicated. +* Identify and adjust overlapping time slices (if use_valid_to_time='yes): + * Overlapping time periods in the data are corrected to maintain a consistent and non-overlapping timeline. To do so, the valid period end bound of a record is adjusted to meet the begin bound of the next record with the same `unique_key` value and overlapping `valid_period` value if any. +* Manage records needing to be adjusted, deleted or split based on the source and target data: * This involves handling scenarios where records in the source data overlap with or need to replace records in the target data, ensuring that the historical timeline remains accurate. -* Utilize the TD_NORMALIZE_MEET function to compact history: - * This function helps to normalize and compact the history by merging adjacent time periods, improving the efficiency and performance of the database. +* Compact history: + * Normalize and compact the history by merging records of adjacent time periods withe same value, optimizing database storage and performance. We use the function TD_NORMALIZE_MEET for this purpose. * Delete existing overlapping records from the target table: * Before inserting new or updated records, any existing records in the target table that overlap with the new data are removed to prevent conflicts. * Insert the processed data into the target table: @@ -416,9 +429,7 @@ These steps collectively ensure that the valid_history strategy effectively mana ``` -:::info -The target table must already exist before running the model. Ensure the target table is created and properly structured with the necessary columns, including a column that tracks the history with period datatype, before running a dbt model. -::: + ## Common Teradata-specific tasks * *collect statistics* - when a table is created or modified significantly, there might be a need to tell Teradata to collect statistics for the optimizer. It can be done using `COLLECT STATISTICS` command. You can perform this step using dbt's `post-hooks`, e.g.: From 53e9f07179b3ef18e507030942ae34b89dd2da91 Mon Sep 17 00:00:00 2001 From: Mohan Talla Date: Tue, 10 Dec 2024 07:03:46 +0530 Subject: [PATCH 2/3] Update website/docs/docs/core/connect-data-platform/teradata-setup.md Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> --- website/docs/docs/core/connect-data-platform/teradata-setup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/core/connect-data-platform/teradata-setup.md b/website/docs/docs/core/connect-data-platform/teradata-setup.md index 6c9d2f46a41..774f5b5e070 100644 --- a/website/docs/docs/core/connect-data-platform/teradata-setup.md +++ b/website/docs/docs/core/connect-data-platform/teradata-setup.md @@ -209,7 +209,7 @@ For using cross-DB macros, teradata-utils as a macro namespace will not be used, ##### hash - `Hash` macro needs an `md5` function implementation. Teradata doesn't support `md5` natively. You need to install a User Defined Function (UDF) and optionally specify `md5_udf` [variable](https://docs.getdbt.com/docs/build/project-variables).
+ `Hash` macro needs an `md5` function implementation. Teradata doesn't support `md5` natively. You need to install a User Defined Function (UDF) and optionally specify `md5_udf` [variable](/docs/build/project-variables).
If not specified the code defaults to using `GLOBAL_FUNCTIONS.hash_md5`. See below instructions on how to install the custom UDF: 1. Download the md5 UDF implementation from Teradata (registration required): https://downloads.teradata.com/download/extensibility/md5-message-digest-udf. 1. Unzip the package and go to `src` directory. From 5b17cf4192425993e66035bdb0e5dd2b6ffe2370 Mon Sep 17 00:00:00 2001 From: Mohan Talla Date: Tue, 10 Dec 2024 07:03:56 +0530 Subject: [PATCH 3/3] Update website/docs/reference/resource-configs/teradata-configs.md Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> --- website/docs/reference/resource-configs/teradata-configs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/teradata-configs.md b/website/docs/reference/resource-configs/teradata-configs.md index 08b442e5b62..8debd1c79ae 100644 --- a/website/docs/reference/resource-configs/teradata-configs.md +++ b/website/docs/reference/resource-configs/teradata-configs.md @@ -348,7 +348,7 @@ If a user sets some key-value pair with value as `'{model}'`, internally this `' - For example, if the model the user is running is `stg_orders`, `{model}` will be replaced with `stg_orders` in runtime. - If no `query_band` is set by the user, the default query_band used will be: ```org=teradata-internal-telem;appname=dbt;``` -## Unit Testing +## Unit testing * Unit testing is supported in dbt-teradata, allowing users to write and execute unit tests using the dbt test command. * For detailed guidance, refer to the dbt documentation.