Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to Train Thru First Epoch #4

Open
talwfu opened this issue Jul 4, 2020 · 0 comments
Open

Fails to Train Thru First Epoch #4

talwfu opened this issue Jul 4, 2020 · 0 comments

Comments

@talwfu
Copy link

talwfu commented Jul 4, 2020

I'm running the code on Cuda GPU and I have received this error twice now. I changed the number of epochs, but this seems to be getting stuck in the first epoch. Please help!

2020-07-03 11:26:57.978 | INFO | bert_sentiment.data::9 - Loading the tokenizer
2020-07-03 11:26:58.106 | INFO | bert_sentiment.data::12 - Loading SST
2020-07-03 11:27:04.056 | INFO | bert_sentiment.data:init:54 - Loading SST train set
2020-07-03 11:27:04.056 | INFO | bert_sentiment.data:init:57 - Tokenizing
2020-07-03 11:28:06.925 | INFO | bert_sentiment.data:init:54 - Loading SST dev set
2020-07-03 11:28:06.926 | INFO | bert_sentiment.data:init:57 - Tokenizing
2020-07-03 11:28:14.955 | INFO | bert_sentiment.data:init:54 - Loading SST test set
2020-07-03 11:28:14.955 | INFO | bert_sentiment.data:init:57 - Tokenizing
22%|██▏ | 2154/9956 [8:51:16<32:04:18, 14.80s/it]

File "run.py", line 28, in
main()
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "run.py", line 24, in main
train(binary=binary, root=root, bert=bert_config, save=save)
File "/home/peakaa19/bnn_2/bert-sentiment/bert_sentiment/train.py", line 82, in train
train_loss, train_acc = train_one_epoch(
File "/home/<path/to>/bert-sentiment/bert_sentiment/train.py", line 22, in train_one_epoch
for batch, labels in tqdm(generator):
File "/usr/local/lib/python3.8/site-packages/tqdm/std.py", line 1129, in iter
for obj in iterable:
File "/usr/local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/usr/local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/usr/local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/usr/local/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 79, in default_collate
return [default_collate(samples) for samples in transposed]
File "/usr/local/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 79, in
return [default_collate(samples) for samples in transposed]
File "/usr/local/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [66] at entry 0 and [65] at entry 24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant