Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-97] [Bug] dbt cli with an empty command takes more than a second to run #4627

Closed
1 task done
ajmarcus opened this issue Jan 26, 2022 · 3 comments
Closed
1 task done
Assignees
Labels
bug Something isn't working performance

Comments

@ajmarcus
Copy link

ajmarcus commented Jan 26, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When I run the dbt command in my terminal without additional arguments, it takes more than a second to print the help text:

% time dbt
...output...
dbt  1.67s user 0.13s system 98% cpu 1.828 total

Looks like this is a related issue with percieved run performance for specific commands:
#4625

Expected Behavior

Basic operations with the dbt cli would take less than one second to run.

Steps To Reproduce

Install dbt:

pip install dbt-core

Run dbt:

% time dbt
...output...
dbt  1.67s user 0.13s system 98% cpu 1.828 total

Relevant log output

This could deflinitely be an issue with my machine or setup. To see if I could find more information I used py-spy to generate a flamegraph using these commands:

git clone '[email protected]:dbt-labs/dbt-core.git'
cd dbt-core
pip install py-spy
sudo py-spy record -o profile_cli_empty.svg -- python core/scripts/dbt

There is nothing conclusive, but looks like there is a lot of time spent with calls to importlib:

https://gist.githubusercontent.com/ajmarcus/b077f4c8a94582df74c39258325addf5/raw/9af82bb55f5b322fb10bc0a959c20b175e689c3c/profile_cli_empty.svg

Which reminded me that Python eagerly loads imported dependencies. importlib does provide a lazy import loader but it comes with this warning:

https://python.readthedocs.io/en/latest/library/importlib.html#importlib.util.LazyLoader

For projects where startup time is critical, this class allows for potentially minimizing the cost of loading a module if it is never used. For projects where startup time is not essential then use of this class is heavily discouraged due to error messages created during loading being postponed and thus occurring out of context.

Environment

- OS: Mac 12.0.1
- Python: 3.7.8
- dbt: 1.0.1

What database are you using dbt with?

bigquery

Additional Context

profile_cli_empty

@ajmarcus ajmarcus added bug Something isn't working triage labels Jan 26, 2022
@github-actions github-actions bot changed the title [Bug] dbt cli with an empty command takes more than a second to run [CT-97] [Bug] dbt cli with an empty command takes more than a second to run Jan 26, 2022
@ajmarcus
Copy link
Author

Haven't worked with flamegraphs before so took me some time to understand this one. I think a good amount of the runtime is importing the classes from dbt.task that are used to set_defaults for the subparsers:

https://github.com/dbt-labs/dbt-core/blob/v1.0.1/core/dbt/main.py#L18-L33
https://github.com/dbt-labs/dbt-core/blob/v1.0.1/core/dbt/main.py#L358

Since the defaults are only used if a subcommand is being called, was thinking we could improve performance by only importing the relevant dbt.task. For example:

  1. If the user uses the command dbt build we will only import dbt.task.build as build_task
  2. If the user uses the command dbt clean we will only import dbt.task.clean as clean_task
  3. If the user uses the command dbt without a subcommand we will not import any classes from dbt.task

If that sounds like a good approach, I'd be happy to create an example PR. As with any performance optimizations, I'm sure there will be tradeoffs to my approach so no worries if the code is never merged. Curious to learn more about dbt's internals. :)

@nathaniel-may
Copy link
Contributor

Thanks for the detailed analysis @ajmarcus! Anecdotally, I definitely notice this myself when I call dbt --version and I think you're right that we can probably benefit from a quick fix. Thankfully we're planning on making larger changes to the structure of our CLI to make it more reusable which may also solve these performance issues outlined here. Thank you for offering to put up a PR but I think in this case I think we should save the effort since it is likely to be addressed soon anyway. You can follow that work once it starts in earnest on two these tickets:
#4179
#2013

Because of this, I'm going to close this ticket but explicitly mention it in the other tickets as additional context for why this work is important. Thanks so much for providing all the details here.

@ajmarcus
Copy link
Author

Sounds great and makes sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance
Projects
None yet
Development

No branches or pull requests

3 participants