You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When trying to to benchmark the data loader, if I run with num_workers > 0, it crashes with the following error message:
Traceback (most recent call last):
File "benchmark.py", line 299, in <module>
main(args)
File "benchmark.py", line 181, in main
for i, batch in enumerate(dataloader):
File "/Users/Adam/.spack/.spack-env/view/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
data = self._next_data()
File "/Users/Adam/.spack/.spack-env/view/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/Users/Adam/.spack/.spack-env/view/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/Users/Adam/.spack/.spack-env/view/lib/python3.8/site-packages/torch/_utils.py", line 425, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/Users/Adam/.spack/.spack-env/view/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/Users/Adam/.spack/.spack-env/view/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/Users/Adam/.spack/.spack-env/view/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/Users/Adam/torchgeo/torchgeo/datasets/geo.py", line 719, in __getitem__
if not query.intersects(self.bounds):
File "/Users/Adam/torchgeo/torchgeo/datasets/geo.py", line 760, in bounds
minx = max([ds.bounds[0] for ds in self.datasets])
File "/Users/Adam/torchgeo/torchgeo/datasets/geo.py", line 760, in <listcomp>
minx = max([ds.bounds[0] for ds in self.datasets])
File "/Users/Adam/torchgeo/torchgeo/datasets/geo.py", line 135, in bounds
return BoundingBox(*self.index.bounds)
File "/Users/Adam/torchgeo/torchgeo/datasets/utils.py", line 213, in __new__
raise ValueError(f"Bounding box is invalid: 'minx={minx}' > 'maxx={maxx}'")
ValueError: Bounding box is invalid: 'minx=1.7976931348623157e+308' > 'maxx=-1.7976931348623157e+308'
Since this doesn't occur in serial or on Linux, I'm guessing this has something to do with the fact that Python's multiprocessing module switched from fork to spawn as the default start method on macOS for Python 3.8+.
The text was updated successfully, but these errors were encountered:
Also, the fact that this wasn't caught by our unit tests means we need better integration tests. We do test our samplers in parallel, but not with a real GeoDataset.
This also happens on Windows and is due to the whole fork vs spawn issue.
adamjstewart
changed the title
Parallel data loading doesn't work on macOS with Python 3.8+
Parallel data loading doesn't work on macOS/Windows
Dec 18, 2021
When trying to to benchmark the data loader, if I run with
num_workers > 0
, it crashes with the following error message:Since this doesn't occur in serial or on Linux, I'm guessing this has something to do with the fact that Python's multiprocessing module switched from
fork
tospawn
as the default start method on macOS for Python 3.8+.The text was updated successfully, but these errors were encountered: