Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] duplicates in effectivity satelites #231

Open
pdaszczynski opened this issue Apr 24, 2024 · 4 comments
Open

[BUG] duplicates in effectivity satelites #231

pdaszczynski opened this issue Apr 24, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@pdaszczynski
Copy link

pdaszczynski commented Apr 24, 2024

Describe the bug
When for two last records in stage for one src_dfk has not been changed src_sfk and between previous and current data processing for this particular src_dfk nothing changed eff_sat always insert one additional record to table.

Environment

dbt version: 1.7.8
automate_dv version: 0.10.2
Database/Platform: postgres

To Reproduce
Steps to reproduce the behavior:

  1. Insert to stage like below, I have deleted *hk columns to not do the mess here
updated_date organisation_id customer_id start_date end_date effective_from load_date
2024-04-12 7 9491 2024-04-12 2024-04-19 2024-04-12 2024-04-24 08:18:21.882583 +00:00
2024-04-19 5 9491 2024-04-19 2024-04-21 2024-04-19 2024-04-24 08:18:21.882583 +00:00
2024-04-21 5 9491 2024-04-21 9999-12-31 2024-04-21 2024-04-24 08:18:21.882583 +00:00
  1. create effectivity satelite
{{ config(materialized='incremental')  }}
{%- set source_model = "stage_organisation_customer" -%}
{%- set src_pk = "CUSTOMER_ORGANISATION_HK" -%}
{%- set src_dfk = "CUSTOMER_HK"       -%}
{%- set src_sfk = "ORGANISATION_HK"         -%}
{%- set src_start_date = "START_DATE" -%}
{%- set src_end_date =   "END_DATE"     -%}
{%- set src_eff = "EFFECTIVE_FROM"    -%}
{%- set src_ldts = "LOAD_DATE"    -%}
{%- set src_source = "RECORD_SOURCE"  -%}
{{ automate_dv.eff_sat(src_pk=src_pk, src_dfk=src_dfk, src_sfk=src_sfk,
                       src_start_date=src_start_date,
                       src_end_date=src_end_date,
                       src_eff=src_eff, src_ldts=src_ldts,
                       src_source=src_source,
                       source_model=source_model) }}
  1. run dbt couple of times
  2. See result in effectivity satelite
customer_hk organisation_hk start_date end_date load_date
003dd617-c12d-444f-f9c8-0f717c3fa982 8f14e45f-ceea-167a-5a36-dedd4bea2543 2024-04-12 2024-04-19 2024-04-24 07:50:07.914253 +00:00
003dd617-c12d-444f-f9c8-0f717c3fa982 e4da3b7f-bbce-2345-d777-2b0674a318d5 2024-04-19 2024-04-21 2024-04-24 07:50:07.914253 +00:00
003dd617-c12d-444f-f9c8-0f717c3fa982 e4da3b7f-bbce-2345-d777-2b0674a318d5 2024-04-21 9999-12-31 2024-04-24 07:50:07.914253 +00:00
003dd617-c12d-444f-f9c8-0f717c3fa982 e4da3b7f-bbce-2345-d777-2b0674a318d5 2024-04-21 9999-12-31 2024-04-24 08:02:21.036765 +00:00
003dd617-c12d-444f-f9c8-0f717c3fa982 e4da3b7f-bbce-2345-d777-2b0674a318d5 2024-04-19 2024-04-21 2024-04-24 08:02:28.049750 +00:00
003dd617-c12d-444f-f9c8-0f717c3fa982 e4da3b7f-bbce-2345-d777-2b0674a318d5 2024-04-21 9999-12-31 2024-04-24 08:02:37.983116 +00:00

Expected behavior
Records like below should not been added all the time when I run dbt

customer_hk organisation_hk start_date end_date load_date
003dd617-c12d-444f-f9c8-0f717c3fa982 e4da3b7f-bbce-2345-d777-2b0674a318d5 2024-04-21 9999-12-31 2024-04-24 08:02:21.036765 +00:00
003dd617-c12d-444f-f9c8-0f717c3fa982 e4da3b7f-bbce-2345-d777-2b0674a318d5 2024-04-19 2024-04-21 2024-04-24 08:02:28.049750 +00:00
003dd617-c12d-444f-f9c8-0f717c3fa982 e4da3b7f-bbce-2345-d777-2b0674a318d5 2024-04-21 9999-12-31 2024-04-24 08:02:37.983116 +00:00

Additional context
Not sure but I think this is because there is also same load_date for records and in final query partition by and order in this shape is not enough

latest_records AS (
    SELECT * FROM (
        SELECT b.CUSTOMER_ORGANISATION_HK, b.CUSTOMER_HK, b.ORGANISATION_HK, b.START_DATE, b.END_DATE, b.EFFECTIVE_FROM, b.LOAD_DATE, b.RECORD_SOURCE,
               ROW_NUMBER() OVER (
                    PARTITION BY b.CUSTOMER_ORGANISATION_HK
                    ORDER BY b.LOAD_DATE DESC
               ) AS row_num
        FROM "dbt_dv"."dv"."eff_sat_customer_organisation" AS b
        where customer_hk in (select customer_hk from stage.stage_organisation_customer where customer_id in (9491))
    ) AS inner_rank
    WHERE row_num = 1
)

AB#5346

@pdaszczynski pdaszczynski added the bug Something isn't working label Apr 24, 2024
@DVAlexHiggs
Copy link
Member

DVAlexHiggs commented Apr 24, 2024

Hello!

Thanks for this report. If possible can you please edit your post and format your data as tables and code as code snippets using markdown? This would make it easier to read and understand what the issue is.

If not we can edit your post for you however this will delay the process.

Thanks!

@pdaszczynski
Copy link
Author

Hi Alex,
I have updated as you requested.

@DVAlexHiggs
Copy link
Member

DVAlexHiggs commented Apr 24, 2024

Thank you.
We believe this could be related to #221 but will need to investigate it further as this should have only affected standard satellites.

@ImBharathwaj
Copy link

Hi @DVAlexHiggs
We are also facing the same issue while running dbt models in incremental manner. This issue persists even after upgrading automate_dv to 0.10.2. When can we expect the fix for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants