Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate existing resources #25

Closed
wants to merge 3 commits into from
Closed

Conversation

BeadW
Copy link
Contributor

@BeadW BeadW commented Dec 6, 2022

It is common to have definitions of resources in yaml files that don't match the opinionated nature of this tool. It will now handle migration. Goes somewhat to #11

…t match the opinionated nature of this tool. It will now handle migration.
@BeadW
Copy link
Contributor Author

BeadW commented Dec 6, 2022

I am pondering how to add some tests to this. Any guidance would be appreciated.

@BeadW
Copy link
Contributor Author

BeadW commented Dec 6, 2022

Something else of note. Currently this breaks if more than one thread is used. I expect this is due to the handling of the existing yaml files when they have more than one definition and multiple threads start operating on them. I need to give some consideration to the best path forward on that.

@robastel
Copy link
Member

robastel commented Dec 8, 2022

@BeadW Thank you for submitting this PR! I appreciate you doing that. I want to let you know that I see it, and that I intend to find the time to review it in the coming days. I will also provide guidance on testing and will give some thought to your comment about multiple threads.

…threw an error. Added a check that the columns key was present before trying to use it
@robastel
Copy link
Member

robastel commented Jan 3, 2023

Hello @BeadW, I apologize for the delay. I have had some time to look through the proposed changes and to think about your comments regarding multiple threads and testing.

To provide some background, which you may have already discovered yourself, the motivation for using multiple threads comes from the _get_columns function in properties.py. This function, which is run every time a property file is created/updated, triggers a dbt run-operation command that queries the data warehouse to retrieve the list of columns for the associated resource.

If we could perform the migration without running _get_columns, I wouldn't care about supporting threads.

What are your thoughts on the following setup?:

  • We create a new dbt-invoke properties.migrate command
    • This command simply moves the relevant chunks of yaml from an existing multiple-resource-property-file to new single-resource-property-files.
    • This command does not do any updating of the new single-resource-property-files.
      • Because of this, there is no need to worry about threads.
      • I also think it would be nice to have the migration logic separate in terms of future ease of maintenance.
  • Once the migration is complete, dbt-invoke properties.update (a.k.a. dbt-invoke properties) can be used as normal to perform any necessary updates to the new single-resource-property-files.

In terms of testing, we could include a multiple-resource-property-file in tests/test_property_files, use it to perform a migration, and check that the resulting post-migration files match our expectations.

I've put together PR #26 which incorporates these ideas. I would love to have your feedback.

@BeadW
Copy link
Contributor Author

BeadW commented Jan 3, 2023

Hi @robastel thanks for putting in all this effort. Really appreciate it.

I think your logic here is pretty solid. I like the idea of having the migrate command be separate and can understand how that removes the bottleneck.

The method of testing is in line with what I was thinking after I gave it some more thought so that is great.

I think the implementation makes a couple of assumptions that might reduce the usefulness and that I believe were addressed by my implementation.

PR #26 works (so far as I can tell) on the assumption that because there are two schools of thought people have adopted one or the other. One thing I found within our own repos (we have a lot of dbt happening) is that sometimes people have just made it up as they go along. Stuff like one yaml file per directory, one yaml file per set of similar models etc.

Personally I think it is worth supporting the idea that "yaml might be in any state but we should be able to migrate to one yaml per model". I believe my implementation did that (I might be missing something though).

I also think this opens up a future state where we can run dbt-invoke properties and have it handle more complex or differing structures of there is value in that.

Do you think there is some hybrid of the two approaches we could achieve where perhaps we aim for the separation like you have achieved and support more complex use cases as I believe I did?

Kind Regards
Brad

@robastel
Copy link
Member

robastel commented Jan 5, 2023

Brad, thank you for the thoughtful feedback!

That is a great point about my assumption that projects would not mix the two schools of thought or even take a totally different approach. I noticed that dbt Labs themselves now fall somewhere in between, using one yaml per folder (https://docs.getdbt.com/guides/best-practices/how-we-structure/5-the-rest-of-the-project#yaml-in-depth). I think you have made an excellent suggestion to try to find a hybrid of our two approaches that would both separate the migration logic and have the ability to handle more complex structures.

My first thought is to take a look into how dbt itself locates and parses property files. Maybe we can import those functions and use them directly. What do you think? I'm open to any other ideas too!

:return: Str or None
"""
node_unique_id = resource_dict['unique_id']
patch_path = _read_manifest(ctx.config['target_path'])['nodes'][node_unique_id]['patch_path']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robastel this section of the manifest is the result of dbt having discovered the matching yaml for that node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is created when the dbt compile takes place from memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, great, I forgot you already did that!

Logistically, how would you like to go about working on our hybrid approach?

I'm happy to take an attempt at combining our two approaches. Also happy to review if you feel like you want to take a shot at it. I'm going to sign off for today, but I'll keep in touch!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have finished work for the week now. Happy to take a stab at this Monday if you haven’t already done it. 🙂

@BeadW
Copy link
Contributor Author

BeadW commented Jan 10, 2023

Hi @robastel I have done some work to bring the two approaches together. Testing locally with the sqllite db seemed to work fine. If you create a random set of model resource definitions in different files with strange names it migrates them properly. I am off work the rest of this week but happy to discuss further if required when I return. (one error I encountered was a cyclic dependency error when doing a dbt run for the orders object) That doesn't seem related to this change though it seems like the definition of that view is referring to itself. I put the changes against your branch so it was easier to review.

@robastel
Copy link
Member

Closing this PR -- thanks @BeadW for your work here which you combined with my approach in #27, which was subsequently merged to the main branch in #26 (and has been released as part of dbt-invoke 0.2.1).

@robastel robastel closed this Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants