Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BrokenProcessPool #27

Open
nicolasbolzan opened this issue Nov 26, 2019 · 4 comments
Open

BrokenProcessPool #27

nicolasbolzan opened this issue Nov 26, 2019 · 4 comments

Comments

@nicolasbolzan
Copy link

Hi!
I have been training a Language Model from Wikipedia in order to create a text classifier in FastAi. I have been using Google colab for it. But after a few minutes of training, the process stops with the following error:


get_wiki(path,lang)
dest = split_wiki(path,lang)

bs=64
data = (TextList.from_folder(dest)
.split_by_rand_pct(0.1, seed=42)
.label_for_lm()
.databunch(bs=bs, num_workers=1))
data.save('tmp_lm')

BrokenProcessPool Traceback (most recent call last)
in ()
1 bs=64
2 data = (TextList.from_folder(dest)
----> 3 .split_by_rand_pct(0.1, seed=42)
4 .label_for_lm()
5 .databunch(bs=bs, num_workers=1))

9 frames
/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in _inner(*args, **kwargs)
478 self.valid = fv(*args, from_item_lists=True, **kwargs)
479 self.class = LabelLists
--> 480 self.process()
481 return self
482 return _inner

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in process(self)
532 "Process the inner datasets."
533 xp,yp = self.get_processors()
--> 534 for ds,n in zip(self.lists, ['train','valid','test']): ds.process(xp, yp, name=n)
535 #progress_bar clear the outputs so in some case warnings issued during processing disappear.
536 for ds in self.lists:

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in process(self, xp, yp, name, max_warn_items)
712 p.warns = []
713 self.x,self.y = self.x[~filt],self.y[~filt]
--> 714 self.x.process(xp)
715 return self
716

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in process(self, processor)
82 if processor is not None: self.processor = processor
83 self.processor = listify(self.processor)
---> 84 for p in self.processor: p.process(self)
85 return self
86

/usr/local/lib/python3.6/dist-packages/fastai/text/data.py in process(self, ds)
295 tokens = []
296 for i in progress_bar(range(0,len(ds),self.chunksize), leave=False):
--> 297 tokens += self.tokenizer.process_all(ds.items[i:i+self.chunksize])
298 ds.items = tokens
299

/usr/local/lib/python3.6/dist-packages/fastai/text/transform.py in process_all(self, texts)
118 if self.n_cpus <= 1: return self._process_all_1(texts)
119 with ProcessPoolExecutor(self.n_cpus) as e:
--> 120 return sum(e.map(self._process_all_1, partition_by_cores(texts, self.n_cpus)), [])
121
122 class Vocab():

/usr/lib/python3.6/concurrent/futures/process.py in _chain_from_iterable_of_lists(iterable)
364 careful not to keep references to yielded objects.
365 """
--> 366 for element in iterable:
367 element.reverse()
368 while element:

/usr/lib/python3.6/concurrent/futures/_base.py in result_iterator()
584 # Careful not to keep a reference to the popped future
585 if timeout is None:
--> 586 yield fs.pop().result()
587 else:
588 yield fs.pop().result(end_time - time.monotonic())

/usr/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
430 raise CancelledError()
431 elif self._state == FINISHED:
--> 432 return self.__get_result()
433 else:
434 raise TimeoutError()

/usr/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
--> 384 raise self._exception
385 else:
386 return self._result

BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.


I was trying to solve this by varying the value of bs to 64,32,16. Also changing the value of num_workers but still failing.

The process is as follows, the ram memory begins to fill and at some point stops the execution of the script.

Details of the Google Colab Machine:

GPU Machine, RAM: 25.51 GB, Disk: 358.27 GB.

Is there any chance to run it on that environment?

Best Regards!

@fumpe
Copy link

fumpe commented Jan 14, 2020

Same problem here!!do you finded some solution?

@nicolasbolzan
Copy link
Author

Nope, still crashing on Google colab.

@jcatanza
Copy link
Contributor

jcatanza commented Jan 15, 2020

I've experienced a similar problem with the fastai Text Data API, and also tried reducing batch size, with no success. I thought at first that it's a problem with the Windows 10 64-bit Operating System, but perhaps it happens in Linux too? Can you please determine if you are working on a Linux or Windows machine in colab?

I discovered that this error is stochastic -- that is, sometimes it doesn't occur! My brute force solution (admittedly sub-optimal) is to run a try loop that repeatedly executes the line until completes successfully. Sometimes this takes hours.

This is a problem with the fastai library that ultimately needs to be addressed by the developers. Unfortunately, although there have been a number of Fastai Forum posts about this problem, AFAIK there has been zero response by developers.

Perhaps it is no longer a problem in Fastai v2? I wouldn't know, as I have not yet migrated to v2 yet.

@yathindrak
Copy link

yathindrak commented Feb 23, 2021

Getting the same error in paperspace free GPU: Quadro M4000 . Both in jupiter-notebook mode and terminal mode, getting the same error!
PS: I am using fastai-v1
@sgugger Any thoughts on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants