Replies: 2 comments
-
👋 Hello @Brayan532, thank you for reaching out and for sharing such detailed information about your experiences with YOLO11n 🚀! Your commitment to optimizing your training process is evident, and we're here to help guide you. This is an automated response to assist you quickly 🎉. Rest assured, an Ultralytics engineer will review your query and provide personalized insights soon. In the meantime, here are some steps and resources that may help:
UpgradePlease ensure you are using the latest version of YOLO, as new updates may address any bugs or inconsistencies in earlier versions. You can upgrade your pip install -U ultralytics Ensure you are operating in an environment with Python>=3.8 and PyTorch>=1.8, as outlined in our repository's project requirements. EnvironmentsExplore the following environments for training and validation. These setups come pre-configured with the necessary dependencies:
Community & SupportIn addition to our direct support, you can connect with our global Ultralytics user community for real-time chat and discussions: StatusLastly, check the status of our CI workflows to verify if all current tests are passing. This ensures stability across all YOLO modes and tasks: Feel free to provide additional details or any updates over the course of your experiments. Thanks again for contributing to the Ultralytics community 💡! |
Beta Was this translation helpful? Give feedback.
-
@Brayan532 thank you for reaching out with detailed observations and questions. Here’s guidance for your concerns:
Let us know if you encounter further challenges, and feel free to explore more configuration tips at Model Training Tips. |
Beta Was this translation helpful? Give feedback.
-
Dear Glenn,
First of all, I want to sincerely thank you for actively addressing the numerous questions from the community. Your expertise and dedication are truly invaluable.
I am currently working with a lightly modified version of YOLO11n, which I have trained on the COCO dataset (approximately 117k images for training and 5k for validation). However, I have encountered a few challenges that I hope you can help me address with your professional insight. Here are the details:
I am using an 8GB GPU, which limits the maximum number of epochs I can run to 60 (as it takes about 24 hours to complete).
In my first round of training, I slightly modified the default configurations and used the following settings:
lr0=0.001, warmup=5, mixup=0.2, cos_lr=True
The result was a mAP50-95 of 25.6%.
Next, I used transfer learning from last.pt and continued training with updated settings:
lr0=0.0005, warmup=7, mixup=0.1, cos_lr=True
However, I observed a decline in mAP50-95 after the earlier epochs, which continued until the end of training.
Given this situation, I am uncertain about the best course of action to take next. I see two possible options:
Option 1: Restart training from scratch with new settings.
Option 2: Use transfer learning again from the first training attempt and apply updated settings to save time.
I would greatly appreciate your advice on how best to proceed to achieve better performance.
During training, I noticed that the validation process only involves 157 items, despite the validation folder containing 5,000 images (as confirmed by the following log):
\val\labels.cache... 4952 images, 48 backgrounds, 0 corrupt: 100% 5000/5000 [00:00<?, ?it/s]
For reference, my batch size is 16, and I have already cleared the cache and rerun the code, but the issue persists. It seems that while the total number of validation images is recognized correctly, only half of them are being used during validation.
Could you please help me understand why this is happening and how I can ensure the full validation dataset is utilized?
While training with the following command:
model.train(data="coco.yaml", epochs=60, imgsz=640, lr0=0.001, cos_lr=True, warmup_epochs=5.0, mixup=0.2)
I received the following message in the logs:
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.001' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: SGD(lr=0.01, momentum=0.9) with parameter groups 82 weight(decay=0.0), 89 weight(decay=0.0005), 88 bias(decay=0.0)
This seems to indicate that my lr0=0.001 setting was overridden, and lr0=0.01 was used instead. However, I noticed in args.yaml (saved in run/detect/train), the value of lr0 is still set to 0.001.
Could you clarify whether the learning rate in the logs (0.01) is the actual value being used, or if the value in args.yaml (0.001) is correct? This inconsistency has been confusing for me.
As someone with significantly more experience, I truly value your insights and guidance. Your advice would be incredibly helpful in resolving these issues and improving my training process.
Thank you very much for your time and support.
Best regards.
Beta Was this translation helpful? Give feedback.
All reactions