Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add quotes to field if numeric in safe_cast macro #892

Closed
wants to merge 4 commits into from

Conversation

graciegoheen
Copy link

@graciegoheen graciegoheen commented Jan 29, 2024

resolves #891

Problem

The safe_cast macro currently uses the try_cast function in snowflake, which has the known limitation:

Only works for string expressions.

This means that this does not work: select try_cast(1 as NUMBER);
Screenshot 2024-01-29 at 3 15 38 PM

Even though all of these do:

select try_cast(10000000000000000000000000000000000000 as NUMBER);
select try_cast('1' as NUMBER);
select try_cast(to_numeric(1) as NUMBER) as my_col;
select cast(1 as NUMBER); 

Other warehouses support safe_cast for all data types (select try_cast(1 as NUMBER); works in BigQuery and postgres). For safe_cast to be an effective cross-db macro, it should have consistent behavior regardless of the warehouse.

This PR extends safe_cast to allow for fields that are type numeric by wrapping the field in quotes if it is a number:
select try_cast(1 as NUMBER); -> select try_cast('1' as NUMBER);

Testing

I have tested this by adding this macro to my example unit testing project. With the updated macro this unit test:

unit_tests:
  - name: test_valid_email_address # this is the unique name of the test
    description: my favorite unit test
    model: dim_wizards # name of the model I'm unit testing
    given: # the mock data for your inputs
      - input: ref('stg_wizards')
        rows:
          - {WIZARD_ID: 1, EMAIL: [email protected],     EMAIL_TOP_LEVEL_DOMAIN: example.com}
          - {WIZARD_ID: 2, EMAIL: [email protected],     EMAIL_TOP_LEVEL_DOMAIN: unknown.com}
          - {WIZARD_ID: 3, EMAIL: badgmail.com,         EMAIL_TOP_LEVEL_DOMAIN: gmail.com}
          - {WIZARD_ID: 4, EMAIL: missingdot@gmailcom,  EMAIL_TOP_LEVEL_DOMAIN: gmail.com}
      - input: ref('top_level_email_domains')
        rows:
          - {TLD: example.com}
          - {TLD: gmail.com}
      - input: ref('stg_worlds')
        rows: []
    expect: # the expected output given the inputs above
      rows:
        - {WIZARD_ID: 1, IS_VALID_EMAIL_ADDRESS: true}
        - {WIZARD_ID: 2, IS_VALID_EMAIL_ADDRESS: false}
        - {WIZARD_ID: 3, IS_VALID_EMAIL_ADDRESS: false}
        - {WIZARD_ID: 4, IS_VALID_EMAIL_ADDRESS: false}

creates the expected SQL:

with dbt_internal_unit_test_actual AS (
  select
    WIZARD_ID,IS_VALID_EMAIL_ADDRESS, 'actual' as actual_or_expected
  from (
    with  __dbt__cte__stg_wizards as (

-- Fixture for stg_wizards
select 
    
    try_cast('1' as NUMBER)
 AS WIZARD_ID, try_cast(null as VARCHAR) AS WIZARD_NAME, 
    
    try_cast('[email protected]' as VARCHAR)
 AS EMAIL, 
    
    try_cast('example.com' as VARCHAR)
 AS EMAIL_TOP_LEVEL_DOMAIN, try_cast(null as VARCHAR) AS PHONE_NUMBER, try_cast(null as NUMBER) AS WORLD_ID
union all
select 
    
    try_cast('2' as NUMBER)
 AS WIZARD_ID, try_cast(null as VARCHAR) AS WIZARD_NAME, 
    
    try_cast('[email protected]' as VARCHAR)
 AS EMAIL, 
    
    try_cast('unknown.com' as VARCHAR)
 AS EMAIL_TOP_LEVEL_DOMAIN, try_cast(null as VARCHAR) AS PHONE_NUMBER, try_cast(null as NUMBER) AS WORLD_ID
union all
select 
    
    try_cast('3' as NUMBER)
 AS WIZARD_ID, try_cast(null as VARCHAR) AS WIZARD_NAME, 
    
    try_cast('badgmail.com' as VARCHAR)
 AS EMAIL, 
    
    try_cast('gmail.com' as VARCHAR)
 AS EMAIL_TOP_LEVEL_DOMAIN, try_cast(null as VARCHAR) AS PHONE_NUMBER, try_cast(null as NUMBER) AS WORLD_ID
union all
select 
    
    try_cast('4' as NUMBER)
 AS WIZARD_ID, try_cast(null as VARCHAR) AS WIZARD_NAME, 
    
    try_cast('missingdot@gmailcom' as VARCHAR)
 AS EMAIL, 
    
    try_cast('gmail.com' as VARCHAR)
 AS EMAIL_TOP_LEVEL_DOMAIN, try_cast(null as VARCHAR) AS PHONE_NUMBER, try_cast(null as NUMBER) AS WORLD_ID
),  __dbt__cte__stg_worlds as (

-- Fixture for stg_worlds
select try_cast(null as NUMBER) AS WORLD_ID, try_cast(null as VARCHAR) AS WORLD_NAME
    limit 0
),  __dbt__cte__top_level_email_domains as (

-- Fixture for top_level_email_domains
select 
    
    try_cast('example.com' as VARCHAR)
 AS TLD
union all
select 
    
    try_cast('gmail.com' as VARCHAR)
 AS TLD
), wizards as (

    select * from __dbt__cte__stg_wizards

),

worlds as (

    select * from __dbt__cte__stg_worlds

),

accepted_email_domains as (

    select * from __dbt__cte__top_level_email_domains

),

check_valid_emails as (

    select  
        wizards.wizard_id,
        wizards.wizard_name,
        wizards.email,
        wizards.phone_number,
        wizards.world_id,

		coalesce (regexp_like(
            wizards.email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
        )
        = true
        and accepted_email_domains.tld is not null,
        false) as is_valid_email_address

    from wizards
    left join accepted_email_domains
        on wizards.email_top_level_domain = lower(accepted_email_domains.tld)

)

select
    check_valid_emails.wizard_id,
    check_valid_emails.wizard_name,
    check_valid_emails.email,
    check_valid_emails.is_valid_email_address,
    check_valid_emails.phone_number,
    worlds.world_name
from check_valid_emails
left join worlds
    on check_valid_emails.world_id = worlds.world_id
  ) _dbt_internal_unit_test_actual
),
-- Build expected result
dbt_internal_unit_test_expected AS (
  select
    WIZARD_ID, IS_VALID_EMAIL_ADDRESS, 'expected' as actual_or_expected
  from (
    select 
    
    try_cast('1' as NUMBER)
 AS WIZARD_ID, 
    
    try_cast('True' as BOOLEAN)
 AS IS_VALID_EMAIL_ADDRESS
union all
select 
    
    try_cast('2' as NUMBER)
 AS WIZARD_ID, 
    
    try_cast('False' as BOOLEAN)
 AS IS_VALID_EMAIL_ADDRESS
union all
select 
    
    try_cast('3' as NUMBER)
 AS WIZARD_ID, 
    
    try_cast('False' as BOOLEAN)
 AS IS_VALID_EMAIL_ADDRESS
union all
select 
    
    try_cast('4' as NUMBER)
 AS WIZARD_ID, 
    
    try_cast('False' as BOOLEAN)
 AS IS_VALID_EMAIL_ADDRESS
  ) _dbt_internal_unit_test_expected
)
-- Union actual and expected results
select * from dbt_internal_unit_test_actual
union all
select * from dbt_internal_unit_test_expected

Note: Ideally, this should be resolved by snowflake accepting select try_cast(1 as NUMBER);, but this is a work-around in the meantime.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

Note: I would like to add an integration test for this change in functionality, but don't know the best place for that.

@graciegoheen graciegoheen requested a review from a team as a code owner January 29, 2024 22:20
@cla-bot cla-bot bot added the cla:yes label Jan 29, 2024
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the dbt-snowflake contributing guide.

@graciegoheen
Copy link
Author

@graciegoheen graciegoheen requested a review from jtcohen6 January 29, 2024 22:35
@graciegoheen
Copy link
Author

@MichelleArk should i close this in favor of your fix?

@MichelleArk
Copy link
Contributor

Yes! This change was pulled into #896 here.

@mikealfare mikealfare deleted the fix/try_cast_for_numbers branch July 17, 2024 23:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

don't require double quotes around numbers in unit testing yml
3 participants