Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DBT-350] fix for dbt errors in CDSW #94

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

[DBT-350] fix for dbt errors in CDSW #94

wants to merge 7 commits into from

Conversation

tovganesh
Copy link
Contributor

@tovganesh tovganesh commented Sep 28, 2022

Describe your changes

Issue Synopsis:
For dbt projects in a CDSW environment, two classes of errors were observed -

  1. A temporary table was dropped, but was not reflected in the metastore
  2. A regular view / table was dropped before re-creating or issuing a alter table rename, but the same was not reflected in the metatore

Solution:
Issue a invalidate metadata object where relevant. Since in the dbt flow, the relevant object for which invalidate metadata statement is issued, the object may not exist, the error for same is caught and ignored.

Internal Jira ticket number or external issue link

DBT-350 (Internal)

Testing procedure/screenshots(if appropriate):

For a valid EDH profile, issue debug, run --full-refresh and run. Sample runs are as follows:

(dev-dbt-impala) ganesh.venkateshwara@ganesh dbtdemo % dbt debug 07:57:58  Running with dbt=1.1.2
dbt version: 1.1.2
python version: 3.9.12
python path: /Users/ganesh.venkateshwara/code/venv/dev/dev-dbt-impala/bin/python os info: macOS-12.6-arm64-arm-64bit
Using profiles.yml file at /Users/ganesh.venkateshwara/.dbt/profiles.yml Using dbt_project.yml file at /Users/ganesh.venkateshwara/code/dbt-examples/dbtdemo/dbt_project.yml

Configuration:
  profiles.yml file [OK found and valid]
  dbt_project.yml file [OK found and valid]

Required dependencies:
 - git [OK found]

Connection:
  host: westeros.edh.cloudera.com
  port: 21050
  schema: p_strategy
  username: None
  Connection test: [OK connection ok]

All checks passed!

(dev-dbt-impala) ganesh.venkateshwara@ganesh dbtdemo % dbt run --full-refresh 
07:08:42  Running with dbt=1.1.2
07:08:42  Found 3 models, 4 tests, 0 snapshots, 0 analyses, 187 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics 07:08:42
07:09:27  Concurrency: 1 threads (target='dev_impala_kerberos') 07:09:27
07:09:27  1 of 3 START table model p_strategy.my_first_dbt_model ......................... [RUN] 
07:10:04  1 of 3 OK created table model p_strategy.my_first_dbt_model .................... [OK in 36.10s] 
07:10:04  2 of 3 START incremental model p_strategy.my_incremental_model ................. [RUN] 
07:11:22  2 of 3 OK created incremental model p_strategy.my_incremental_model ............ [OK in 78.03s] 
07:11:22  3 of 3 START table model p_strategy.my_second_dbt_model ........................ [RUN] 
07:11:59  3 of 3 OK created table model p_strategy.my_second_dbt_model ................... [OK in 37.87s] 
07:12:02
07:12:02  Finished running 2 table models, 1 incremental model in 199.11s. 07:12:02
07:12:02  Completed successfully
07:12:02
07:12:02  Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3

(dev-dbt-impala) ganesh.venkateshwara@ganesh dbtdemo % dbt run 
07:12:40  Running with dbt=1.1.2
07:12:40  Found 3 models, 4 tests, 0 snapshots, 0 analyses, 187 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics 07:12:40
07:13:27  Concurrency: 1 threads (target='dev_impala_kerberos') 07:13:27
07:13:27  1 of 3 START table model p_strategy.my_first_dbt_model ......................... [RUN] 
07:14:02  1 of 3 OK created table model p_strategy.my_first_dbt_model .................... [OK in 34.96s] 
07:14:02  2 of 3 START incremental model p_strategy.my_incremental_model ................. [RUN] 
07:14:31  2 of 3 OK created incremental model p_strategy.my_incremental_model ............ [OK in 29.24s]
07:14:33  3 of 3 START table model p_strategy.my_second_dbt_model ........................ [RUN] 
07:15:13  3 of 3 OK created table model p_strategy.my_second_dbt_model ................... [OK in 39.58s] 
07:15:15
07:15:15  Finished running 2 table models, 1 incremental model in 154.85s. 
07:15:15
07:15:15  Completed successfully
07:15:15
07:15:15  Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3

Checklist before requesting a review

  • I have performed a self-review of my code
  • I have formatted my added/modified code to follow pep-8 standards
  • I have checked suggestions from python linter to make sure code is of good quality.

Issue Synopsis:
For dbt projects on internal EDH cluster, two classes of errors were observed -
1. A temporary table was dropped, but was not reflected in the metastore
2. A regular view / table was dropped before recrating or issuing a alter table rename, but the same was not reflected in the metatore

Solution:
Issue a invalidate metadata <object> where relevant.
Since in the dbt flow, the relevent <object> for which invalidate metadata statement is issued, the object may not exist, the error for
same is caught and ignored.

Internal Jira ticket number or external issue link
https://jira.cloudera.com/browse/DBT-350

Testing procedure/screenshots(if appropriate):
For a valid EDH profile, issue debug, run --full-refresh and run. Sample runs are as follows:
(dev-dbt-impala) ganesh.venkateshwara@ganesh dbtdemo % dbt debug
07:57:58  Running with dbt=1.1.2
dbt version: 1.1.2
python version: 3.9.12
python path: /Users/ganesh.venkateshwara/code/venv/dev/dev-dbt-impala/bin/python
os info: macOS-12.6-arm64-arm-64bit
Using profiles.yml file at /Users/ganesh.venkateshwara/.dbt/profiles.yml
Using dbt_project.yml file at /Users/ganesh.venkateshwara/code/dbt-examples/dbtdemo/dbt_project.yml

Configuration:
  profiles.yml file [OK found and valid]
  dbt_project.yml file [OK found and valid]

Required dependencies:
 - git [OK found]

Connection:
  host: westeros.edh.cloudera.com
  port: 21050
  schema: p_strategy
  username: None
  Connection test: [OK connection ok]

All checks passed!

(dev-dbt-impala) ganesh.venkateshwara@ganesh dbtdemo % dbt run --full-refresh
07:08:42  Running with dbt=1.1.2
07:08:42  Found 3 models, 4 tests, 0 snapshots, 0 analyses, 187 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics
07:08:42
07:09:27  Concurrency: 1 threads (target='dev_impala_kerberos')
07:09:27
07:09:27  1 of 3 START table model p_strategy.my_first_dbt_model ......................... [RUN]
07:10:04  1 of 3 OK created table model p_strategy.my_first_dbt_model .................... [OK in 36.10s]
07:10:04  2 of 3 START incremental model p_strategy.my_incremental_model ................. [RUN]
07:11:22  2 of 3 OK created incremental model p_strategy.my_incremental_model ............ [OK in 78.03s]
07:11:22  3 of 3 START table model p_strategy.my_second_dbt_model ........................ [RUN]
07:11:59  3 of 3 OK created table model p_strategy.my_second_dbt_model ................... [OK in 37.87s]
07:12:02
07:12:02  Finished running 2 table models, 1 incremental model in 199.11s.
07:12:02
07:12:02  Completed successfully
07:12:02
07:12:02  Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3

(dev-dbt-impala) ganesh.venkateshwara@ganesh dbtdemo % dbt run
07:12:40  Running with dbt=1.1.2
07:12:40  Found 3 models, 4 tests, 0 snapshots, 0 analyses, 187 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics
07:12:40
07:13:27  Concurrency: 1 threads (target='dev_impala_kerberos')
07:13:27
07:13:27  1 of 3 START table model p_strategy.my_first_dbt_model ......................... [RUN]
07:14:02  1 of 3 OK created table model p_strategy.my_first_dbt_model .................... [OK in 34.96s]
07:14:02  2 of 3 START incremental model p_strategy.my_incremental_model ................. [RUN]
07:14:31  2 of 3 OK created incremental model p_strategy.my_incremental_model ............ [OK in 29.24s]
07:14:33  3 of 3 START table model p_strategy.my_second_dbt_model ........................ [RUN]
07:15:13  3 of 3 OK created table model p_strategy.my_second_dbt_model ................... [OK in 39.58s]
07:15:15
07:15:15  Finished running 2 table models, 1 incremental model in 154.85s.
07:15:15
07:15:15  Completed successfully
07:15:15
07:15:15  Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3

Checklist before requesting a review
- [X] I have performed a self-review of my code
- [X] I have formatted my added/modified code to follow pep-8 standards
- [X] I have checked suggestions from python linter to make sure code is of good quality.
@tovganesh tovganesh changed the title fix for dbt errors on EDH fix for dbt errors in CDSW Oct 6, 2022
@tovganesh tovganesh marked this pull request as ready for review October 6, 2022 15:43
@tovganesh tovganesh changed the title fix for dbt errors in CDSW [DBT-350] fix for dbt errors in CDSW Oct 7, 2022
@myloginid myloginid requested review from niteshy and removed request for TapasSenapati, geethamurthy and sujitkp-blr May 14, 2023 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants