Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lru_cache to module_available #8716

Closed

Conversation

eivindjahren
Copy link
Contributor

@eivindjahren eivindjahren commented Feb 6, 2024

Our application creates many small netcdf3 files: https://github.com/equinor/ert/blob/9c2b60099a54eeb5bb40013acef721e30558a86c/src/ert/storage/local_ensemble.py#L593 .

A significant time in xarray.backends.common.py:AbstractWriteableDataStore.set_variables is spent on common.py:is_dask_collection as it checks for the presence of the module dask which takes about 0.3 ms.

This time becomes significant in the case of many small files. This PR uses lru_cache to avoid rechecking for the presence of dask as it should not change for the lifetime of the application.

  • Closes #xxxx
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

A significant time in
xarray.backends.common.py:AbstractWriteableDataStore.set_variables is
spent on common.py:is_dask_collection as it checks for the presence
of the module dask.

This time becomes significant in the case of many small files.
Copy link

welcome bot commented Feb 6, 2024

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

@max-sixty max-sixty added the plan to merge Final call for comments label Feb 6, 2024
@max-sixty
Copy link
Collaborator

Seems very reasonable, any objections from anyone? Otherwise will merge

Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have assumed that module lookup is super fast but never tested it.
Thanks, looks like a good addition!

@mathause
Copy link
Collaborator

mathause commented Feb 7, 2024

This will change the behavior when a module is installed while python is running (not a blocker, just something to be aware of).

@eivindjahren
Copy link
Contributor Author

This will change the behavior when a module is installed while python is running (not a blocker, just something to be aware of).

I just checked and installing a package while python is running both stops import statements from throwing ImportException and changes the return value find_spec.

@eivindjahren
Copy link
Contributor Author

I would have assumed that module lookup is super fast but never tested it. Thanks, looks like a good addition!

It isn't unreasonably slow, but in one instance we called api.py:1219(to_netcdf) 13634 times, taking 75 seconds and 38 of those seconds were spent on find_spec. Let me get some more numbers for you.

@headtr1ck
Copy link
Collaborator

I don't think we should care about someone installing packages while python is running.
The same argument could be made for someone uninstalling packages? I would assume lots of code would break in such a case...

@eivindjahren
Copy link
Contributor Author

eivindjahren commented Feb 7, 2024

I apologize for the confusion, but this was actually the wrong module_available that I added a cache for. I have created a new PR that adds the cache in the correct place: #8717 , mind the functions do the same thing.

@eivindjahren
Copy link
Contributor Author

@headtr1ck I added some more statistics about what is going on in the other PR.

@eivindjahren
Copy link
Contributor Author

Apologize for the confusion, but I unfortunately made an error while preparing the PR and didn't include the caching of the correct module_available function. Please go to #8717 for the correct commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants