Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence training error #134

Open
kylethieringer opened this issue Mar 1, 2023 · 1 comment
Open

Sequence training error #134

kylethieringer opened this issue Mar 1, 2023 · 1 comment

Comments

@kylethieringer
Copy link

When training a new sequence model, I am running into an error where the model checkpoint is searching for 'val/f1_class_mean' in the metrics file but none is found. If I open the metrics file externally, no f1_class_mean dataset has been saved.

This error quits out of the currently running epoch and moves to the next one rather than finishing the current epoch.

Any help with this would be greatly appreciated!
Thanks

/home/kyle/deg/lib/python3.10/site-packages/pytorch_lightning/trainer/data_loading.py:132: UserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 12 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
Epoch 0:  61%|███████████████████████████████████████████████████████████▏                                     | 839/1376 [00:48<00:31, 17.14it/s, loss=188, v_num=0]/home/kyle/deg/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:644: UserWarning: ModelCheckpoint(monitor='val/f1_class_mean') not found in the returned metrics: ['train_loss', 'train/loss', 'train/fps', 'train/lr', 'train/data_loss', 'train/reg_loss', 'train/accuracy_overall', 'train/f1_overall', 'train/f1_class_mean', 'train/f1_class_mean_nobg', 'train/auroc_class_mean', 'train/mAP_class_mean', 'train/auroc_overall', 'train/mAP_overall']. HINT: Did you call self.log('val/f1_class_mean', value) in the LightningModule?
  warning_cache.warn(m)
/home/kyle/deg/lib/python3.10/site-packages/pytorch_lightning/trainer/data_loading.py:132: UserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 12 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
Epoch 1:  13%|████████████▉                                                                                    | 183/1376 [00:10<01:06, 17.95it/s, loss=152, v_num=0]^C/home/kyle/deg/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:688: UserWarning: Detected KeyboardInterrupt, attempting graceful shutdown...
  rank_zero_warn("Detected KeyboardInterrupt, attempting graceful shutdown...")
Epoch 1:  13%|████████████▉  
@kylethieringer
Copy link
Author

upon a fresh reinstall, I encountered an Index error in line 582 of datasets.py where it was trying to load in labels with a range of the length of the video however because of python indexing the last label is out of bounds.
image

I added a non permanent fix that allows the model to train however there are caveats. The biggest is that it shifts my labels for the last chunk of data one frame off. For the behavior Im studying this shouldnt matter (1 frame is within the expected noise for trying to label the behavior, plus very rare for the behavior to be at the very end of the video). I think this might be the result of some padding when loading the labels but not exactly sure where the source is. Heres the lines of code I added if it helps anyone else:

in /deepethogram/data/datasets.py lines 578-581:

    if not self.reduce:
        # new code start
        if label_indices[-1]>=self.label.shape[1]:
            label_indices = [i-1 for i in label_indices]
        # new code end
        labels = self.label[:, label_indices].astype(np.int64)

@kylethieringer kylethieringer changed the title Sequence traning error Sequence training error Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant