Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

needs to use {{this}} format for incremental #3341

Closed
wants to merge 18 commits into from

Conversation

andrewhong5297
Copy link
Collaborator

@andrewhong5297 andrewhong5297 commented May 14, 2023

Brief comments on the purpose of your changes:

We were getting duplicates with this always running with one day of history, I've changed it to only run if the block_time is in the next day.

For Dune Engine V2

I've checked that:

General checks:

  • I tested the query on dune.com after compiling the model with dbt compile (compiled queries are written to the target directory)
  • I used "refs" to reference other models in this repo and "sources" to reference raw or decoded tables
  • if adding a new model, I added a test
  • the filename is unique and ends with .sql
  • each sql file is a select statement and has only one view, table or function defined
  • column names are lowercase_snake_cased
  • if adding a new model, I edited the dbt project YAML file with new directory path for both models and seeds (if applicable)
  • if wanting to expose a model in the UI (Dune data explorer), I added a post-hook in the JINJA config to add metadata (blockchains, sector/project, name and contributor Dune usernames)

Pricing checks:

  • coin_id represents the ID of the coin on coinpaprika.com
  • all the coins are active on coinpaprika.com (please remove inactive ones)

Join logic:

  • if joining to base table (i.e. ethereum transactions or traces), I looked to make it an inner join if possible

Incremental logic:

  • I used is_incremental & not is_incremental jinja block filters on both base tables and decoded tables
    • where block_time >= date_trunc("day", now() - interval '1 week')
  • if joining to base table (i.e. ethereum transactions or traces), I applied join condition where block_time >= date_trunc("day", now() - interval '1 week')
  • if joining to prices view, I applied join condition where minute >= date_trunc("day", now() - interval '1 week')

@andrewhong5297
Copy link
Collaborator Author

Expected a schema version of "https://schemas.getdbt.com/dbt/manifest/v6.json" in manifest.json, but found "https://schemas.getdbt.com/dbt/manifest/v8.json". Are you running with a different version of dbt?

hmm, seems like something is wrong with config? cc @jeff-dude

Copy link
Collaborator

@Hosuke Hosuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{{this}} provides a more precise incremental updating option.
Original idea is having an overlapping region when incremental update, to ensure 100% coverage of new incoming records, and dbt uses unique_key in the config block to filter out the duplicated/existed records when updating.

@Hosuke Hosuke added the dune team created by dune team label May 15, 2023
@@ -24,7 +24,7 @@ WITH
FROM {{ source('solana','account_activity') }}
WHERE tx_success
{% if is_incremental() %}
AND block_time >= date_trunc("day", now() - interval '1 day')
AND block_time > (SELECT max(day) FROM {{this}})
Copy link
Collaborator

@0xRobin 0xRobin May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably use >= here to not drop later incremental updates from the same day?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since day is derived from date_trunc(...), it represents the lower bound of the time, I think > will include all events in that date after UTC 00:00.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to update on same day, only next day. To prevent duplicates.

There is the edge case of missing an address if their balance changes later that day but not the next day.

But ignoring that case for now!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah @Hosuke you're right! I was confused but indeed because we compare block_time with the day timestamp this works and there's no real difference between > and >=. 👍

@0xRobin
Copy link
Collaborator

0xRobin commented May 15, 2023

CI issues should be resolved now, I've triggered a re-run

@jeff-dude
Copy link
Member

can you confirm the failing test on duplicates?

@jeff-dude jeff-dude added the WIP work in progress label May 16, 2023
@andrewhong5297
Copy link
Collaborator Author

can you confirm the failing test on duplicates?

Looking

@andrewhong5297
Copy link
Collaborator Author

the test schema is gone, but I really can't see any reason there would be duplicates given we're using a window function.

@andrewhong5297
Copy link
Collaborator Author

@jeff-dude I'm like 100% confident there should be no duplicates - can we merge?

@jeff-dude
Copy link
Member

@jeff-dude I'm like 100% confident there should be no duplicates - can we merge?

if there are dupes here, there'll be dupes in prod. have you tried compiling the model, then placing in CTE and running group by/having > 1 on the unique fields? you can likely start with a subset of data for faster results, then open up more if nothing returns

@Hosuke
Copy link
Collaborator

Hosuke commented May 20, 2023

I think I know the cause of the bug,
there is a tiny time diff in updated_at(1st time in full-refresh, 2nd time in incremental refresh).
Perhaps the engine is hard to compare NULL value for token_mint_address.
Here is an example:
https://dune.com/queries/2493263?d=1

@Hosuke
Copy link
Collaborator

Hosuke commented May 20, 2023

I tried to fix the problem by giving a default empty string '' to NULL token_mint_address, and currently empty string is not equal to NULL in some versions of spark.

I actually found out this problem is an opening issue in dbt: dbt-labs/dbt-adapters#159

@andrewhong5297
Copy link
Collaborator Author

hmm odd. I've run a group by check and got no nulls https://dune.com/queries/2479876?d=1&q=v2+&sidebar=query-explorer

@Hosuke Hosuke added ready-for-final-review and removed WIP work in progress labels May 20, 2023
@andrewhong5297
Copy link
Collaborator Author

thank you @Hosuke !!

@jeff-dude
Copy link
Member

I tried to fix the problem by giving a default empty string '' to NULL token_mint_address, and currently empty string is not equal to NULL in some versions of spark.

I actually found out this problem is an opening issue in dbt: dbt-labs/dbt-adapters#159

great catch! we've seen this elsewhere before too. it's a good thing to keep in mind, that unique keys can't be null or the merge statement fails the join condition, leading to duplicate inserts. i wonder if this means we need to apply not null tests to each unique key column 🤔

@jeff-dude
Copy link
Member

jeff-dude commented May 24, 2023

i'm trying to figure out why the ci test action is picking up previously merged PR spells still. we've had the successful run in prod that should in theory make sure those don't run anymore. i've raised a ticket internally to investigate

we're getting timeouts due to all of that running at the same time

@jeff-dude
Copy link
Member

seems like we're getting failures where not null tests were applied (assuming the unique keys)

@jeff-dude jeff-dude added WIP work in progress in review Assignee is currently reviewing the PR and removed ready-for-final-review labels May 26, 2023
@andrewhong5297
Copy link
Collaborator Author

seems like we're getting failures where not null tests were applied (assuming the unique keys)

Oh token_mint_address and token_balance can be null, those tests should he removed. Just address and balance should have not null tests.

@jeff-dude
Copy link
Member

what's the move here?

@jeff-dude
Copy link
Member

jeff-dude commented Jun 20, 2023

closing PRs marked as wip state, prior to the freeze announced and tracked here, we can reopen and reformat post-freeze

@jeff-dude jeff-dude closed this Jun 20, 2023
@Hosuke Hosuke deleted the solana_daily_balances_fix branch April 25, 2024 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dune team created by dune team in review Assignee is currently reviewing the PR WIP work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants