Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError encountered during retraining on Open Images Dataset #193

Open
Kweon0605 opened this issue Feb 21, 2024 · 2 comments
Open

ValueError encountered during retraining on Open Images Dataset #193

Kweon0605 opened this issue Feb 21, 2024 · 2 comments

Comments

@Kweon0605
Copy link

"!python train_ssd.py --dataset_type open_images --datasets ~/data/open_images --net mb1-ssd --pretrained_ssd models/mobilenet-v1-ssd-mp-0_675.pth --scheduler cosine --lr 0.01 --t_max 100 --validation_epochs 5 --num_epochs 100 --base_net_lr 0.001 --batch_size 5"
In colab, This is the code I'm using to train the model, but I'm encountering an error.
The Error massage is "2024-02-21 01:57:21,177 - root - INFO - Namespace(dataset_type='open_images', datasets=['/root/data/open_images'], validation_dataset=None, balance_data=False, net='mb1-ssd', freeze_base_net=False, freeze_net=False, mb2_width_mult=1.0, lr=0.01, momentum=0.9, weight_decay=0.0005, gamma=0.1, base_net_lr=0.001, extra_layers_lr=None, base_net=None, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resume=None, scheduler='cosine', milestones='80,100', t_max=100.0, batch_size=5, num_epochs=100, num_workers=4, validation_epochs=5, debug_steps=100, use_cuda=True, checkpoint_folder='models/')
2024-02-21 01:57:21,178 - root - INFO - Prepare training datasets.
2024-02-21 01:57:22,170 - root - INFO - Dataset Summary:Number of Images: 961
Minimum Number of Images for a Class: -1
Label Distribution:
Handgun: 727
Shotgun: 580
2024-02-21 01:57:22,172 - root - INFO - Stored labels into file models/open-images-model-labels.txt.
2024-02-21 01:57:22,172 - root - INFO - Train dataset size: 961
/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
2024-02-21 01:57:22,175 - root - INFO - Prepare Validation datasets.
2024-02-21 01:57:22,256 - root - INFO - Dataset Summary:Number of Images: 123
Minimum Number of Images for a Class: -1
Label Distribution:
Handgun: 81
Shotgun: 66
2024-02-21 01:57:22,256 - root - INFO - validation dataset size: 123
/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
2024-02-21 01:57:22,256 - root - INFO - Build network.
2024-02-21 01:57:22,350 - root - INFO - Init from pretrained ssd models/mobilenet-v1-ssd-mp-0_675.pth
2024-02-21 01:57:25,194 - root - INFO - Took 2.84 seconds to load the model.
2024-02-21 01:57:25,197 - root - INFO - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01.
2024-02-21 01:57:25,197 - root - INFO - Uses CosineAnnealingLR scheduler.
2024-02-21 01:57:25,198 - root - INFO - Start training from epoch 0.
/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "
Traceback (most recent call last):
File "/content/drive/MyDrive/try_ssd/pytorch-ssd/train_ssd.py", line 325, in
train(train_loader, net, criterion, optimizer,
File "/content/drive/MyDrive/try_ssd/pytorch-ssd/train_ssd.py", line 116, in train
for i, data in enumerate(loader):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 630, in next
data = self._next_data()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 694, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataset.py", line 302, in getitem
return self.datasets[dataset_idx][sample_idx]
File "/content/drive/MyDrive/try_ssd/pytorch-ssd/vision/datasets/open_images.py", line 44, in getitem
_, image, boxes, labels = self._getitem(index)
File "/content/drive/MyDrive/try_ssd/pytorch-ssd/vision/datasets/open_images.py", line 38, in _getitem
image, boxes, labels = self.transform(image, boxes, labels)
File "/content/drive/MyDrive/try_ssd/pytorch-ssd/vision/ssd/data_preprocessing.py", line 34, in call
return self.augment(img, boxes, labels)
File "/content/drive/MyDrive/try_ssd/pytorch-ssd/vision/transforms/transforms.py", line 55, in call
img, boxes, labels = t(img, boxes, labels)
File "/content/drive/MyDrive/try_ssd/pytorch-ssd/vision/transforms/transforms.py", line 247, in call
mode = random.choice(self.sample_options)
File "mtrand.pyx", line 936, in numpy.random.mtrand.RandomState.choice
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (6,) + inhomogeneous part. "

How can I fix it?

@zn845639326
Copy link

Try to downgrade numpy version, just like this:
pip install numpy==1.22.0
Maybe other version should also work.

@jaron-cui
Copy link

jaron-cui commented Oct 29, 2024

In vision/transforms/transform.py, on line 242 after

self.sample_options = (
    # using entire original input image
    None,
    # sample a patch s.t. MIN jaccard w/ obj in .1,.3,.4,.7,.9
    (0.1, None),
    (0.3, None),
    (0.7, None),
    (0.9, None),
    # randomly sample a patch
    (None, None),
)

insert
self.sample_options = np.array(self.sample_options, dtype=object)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants