Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip slow sampler tests #187

Merged
merged 1 commit into from
Oct 11, 2021
Merged

Skip slow sampler tests #187

merged 1 commit into from
Oct 11, 2021

Conversation

adamjstewart
Copy link
Collaborator

The parallel data loader sampler tests are extremely slow on macOS/Windows. See #102 for details. This PR skips them for normal test usage, but keeps them for longer integration testing. Even without these tests we still have 100% test coverage for the samplers.

Before

$ time pytest tests/samplers
...
real	7m30.759s
user	2m18.260s
sys	0m52.247s

After

$ time pytest tests/samplers
...
real	0m4.402s
user	0m3.223s
sys	0m1.057s

@adamjstewart adamjstewart added testing Continuous integration testing samplers Samplers for indexing datasets labels Oct 9, 2021
@calebrob6
Copy link
Member

If I understand this correctly, our dataloaders are extremely slow on Windows and macOS?

If so, this seems like a really big issue that would make torchgeo practically worthless in these cases. Am I missing something?

@adamjstewart
Copy link
Collaborator Author

See https://discuss.pytorch.org/t/data-loader-multiprocessing-slow-on-macos/131204, this is an issue for all of PyTorch on macOS/Windows, it has nothing to do with TorchGeo.

Although our DataLoader are also useless on macOS/Windows, see #184. That's something we can do something about, I don't think we can do anything about the speed issues. There's a lot of overhead if you have to pickle the entire environment.

@calebrob6
Copy link
Member

https://discuss.pytorch.org/t/data-loader-multiprocessing-slow-on-macos/131204 this has no responses. How do people use DataLoaders at all on Windows/macOS then?

@adamjstewart
Copy link
Collaborator Author

I think it's just a 10-15 sec overhead per epoch, so it's not that bad, we just get hit particularly hard in our unit tests because we run it many times.

@calebrob6
Copy link
Member

Ah okay, that makes a lot more sense, thanks for explaining!

Would parameterizing the tests with num_workers 0 and 1 be sufficient, and, if so, would that speed the tests up to something more acceptable?

@adamjstewart
Copy link
Collaborator Author

1 is almost as bad as 2, anything in parallel incurs the same startup overhead.

@calebrob6
Copy link
Member

Only testing 1 instead of both 1 and 2 would cut times by over 50% (from the PyTorch discussion link it looks like "1" is 9 seconds and "2" is 14 seconds) -- do we need to test both?

Copy link
Member

@calebrob6 calebrob6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't need to test both, then removing "2" makes sense to me. If there's another reason why we should keep it around, or we do need to test both, then this is also fine.

@adamjstewart
Copy link
Collaborator Author

We don't necessarily need to test 2, I think it's functionally very similar to 1. Either way, we'll still want to skip these tests by default and only run them as integration tests. The tests only take a few seconds on Linux, the only issue is on macOS/Windows. I think we're only going to run the integration tests on Linux anyway.

@adamjstewart adamjstewart merged commit 9631a8b into main Oct 11, 2021
@adamjstewart adamjstewart deleted the tests/samplers branch October 11, 2021 15:11
@adamjstewart adamjstewart added this to the 0.1.0 milestone Nov 20, 2021
yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
samplers Samplers for indexing datasets testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants