Request for Guidance on Training YOLO11n and Addressing Specific Issues #17804

Brayan532 · 2024-11-26T12:24:09Z

Brayan532
Nov 26, 2024

Dear Glenn,
First of all, I want to sincerely thank you for actively addressing the numerous questions from the community. Your expertise and dedication are truly invaluable.

I am currently working with a lightly modified version of YOLO11n, which I have trained on the COCO dataset (approximately 117k images for training and 5k for validation). However, I have encountered a few challenges that I hope you can help me address with your professional insight. Here are the details:

Training and Transfer Learning Challenges:
I am using an 8GB GPU, which limits the maximum number of epochs I can run to 60 (as it takes about 24 hours to complete).
In my first round of training, I slightly modified the default configurations and used the following settings:
lr0=0.001, warmup=5, mixup=0.2, cos_lr=True
The result was a mAP50-95 of 25.6%.
Next, I used transfer learning from last.pt and continued training with updated settings:
lr0=0.0005, warmup=7, mixup=0.1, cos_lr=True
However, I observed a decline in mAP50-95 after the earlier epochs, which continued until the end of training.
Given this situation, I am uncertain about the best course of action to take next. I see two possible options:
Option 1: Restart training from scratch with new settings.
Option 2: Use transfer learning again from the first training attempt and apply updated settings to save time.
I would greatly appreciate your advice on how best to proceed to achieve better performance.
Validation Dataset Issue:
During training, I noticed that the validation process only involves 157 items, despite the validation folder containing 5,000 images (as confirmed by the following log):
\val\labels.cache... 4952 images, 48 backgrounds, 0 corrupt: 100% 5000/5000 [00:00<?, ?it/s]
For reference, my batch size is 16, and I have already cleared the cache and rerun the code, but the issue persists. It seems that while the total number of validation images is recognized correctly, only half of them are being used during validation.
Could you please help me understand why this is happening and how I can ensure the full validation dataset is utilized?
Learning Rate Confusion:
While training with the following command:
model.train(data="coco.yaml", epochs=60, imgsz=640, lr0=0.001, cos_lr=True, warmup_epochs=5.0, mixup=0.2)
I received the following message in the logs:
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.001' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: SGD(lr=0.01, momentum=0.9) with parameter groups 82 weight(decay=0.0), 89 weight(decay=0.0005), 88 bias(decay=0.0)
This seems to indicate that my lr0=0.001 setting was overridden, and lr0=0.01 was used instead. However, I noticed in args.yaml (saved in run/detect/train), the value of lr0 is still set to 0.001.
Could you clarify whether the learning rate in the logs (0.01) is the actual value being used, or if the value in args.yaml (0.001) is correct? This inconsistency has been confusing for me.
As someone with significantly more experience, I truly value your insights and guidance. Your advice would be incredibly helpful in resolving these issues and improving my training process.
Thank you very much for your time and support.

Best regards.

UltralyticsAssistant · 2024-11-26T12:24:57Z

UltralyticsAssistant
Nov 26, 2024
Maintainer

👋 Hello @Brayan532, thank you for reaching out and for sharing such detailed information about your experiences with YOLO11n 🚀! Your commitment to optimizing your training process is evident, and we're here to help guide you.

This is an automated response to assist you quickly 🎉. Rest assured, an Ultralytics engineer will review your query and provide personalized insights soon.

In the meantime, here are some steps and resources that may help:

Training and Transfer Learning Challenges:
- For the issues with mAP decline, you may want to ensure you're following our Model Training Tips, which include strategies for learning rate schedules, dataset augmentation, and other best practices.
- If this is a 🐛 bug or unexpected behavior, please provide a minimum reproducible example. This helps us debug your workflow more effectively.
Validation Dataset Issue:
- To better understand why only 157 images are being evaluated per validation, double-check your dataset's structure and ensure all your validation labels are correctly formatted and present.
- Providing a structured MRE (as mentioned above) or sharing snippets of relevant training logs can help identify inconsistencies.
Learning Rate Confusion:
- From your description, it appears the optimizer may override explicit parameters (lr0, momentum) when set to auto. The discrepancy in logs vs. args.yaml is worth investigating further, so sharing the complete logs and configurations might provide clarity.

Upgrade

Please ensure you are using the latest version of YOLO, as new updates may address any bugs or inconsistencies in earlier versions. You can upgrade your ultralytics package with the following command:

pip install -U ultralytics

Ensure you are operating in an environment with Python>=3.8 and PyTorch>=1.8, as outlined in our repository's project requirements.

Environments

Explore the following environments for training and validation. These setups come pre-configured with the necessary dependencies:

Notebooks with free GPU:
Google Cloud Deep Learning VM: GCP Quickstart Guide
Docker Image: Docker Quickstart Guide

Community & Support

In addition to our direct support, you can connect with our global Ultralytics user community for real-time chat and discussions:

Status

Lastly, check the status of our CI workflows to verify if all current tests are passing. This ensures stability across all YOLO modes and tasks:

Feel free to provide additional details or any updates over the course of your experiments. Thanks again for contributing to the Ultralytics community 💡!

0 replies

glenn-jocher · 2024-11-26T15:52:03Z

glenn-jocher
Nov 26, 2024
Maintainer

@Brayan532 thank you for reaching out with detailed observations and questions. Here’s guidance for your concerns:

This is probably due to point 3 and warmup_epochs being enabled.
It's correct. Validation runs at double the batch size of training. So 157x32≈5024 images.
The learning rate displayed in logs reflects the value determined by the auto optimizer. For strict control, explicitly set optimizer (e.g., SGD/Adam) in your training command to avoid overrides. The learning rate in args.yaml may mismatch log settings and primarily acts as a storage reference.

Let us know if you encounter further challenges, and feel free to explore more configuration tips at Model Training Tips.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ultralytics

Request for Guidance on Training YOLO11n and Addressing Specific Issues #17804

{{title}}

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Ultralytics

Request for Guidance on Training YOLO11n and Addressing Specific Issues #17804

Brayan532 Nov 26, 2024

Replies: 2 comments

UltralyticsAssistant Nov 26, 2024 Maintainer

Upgrade

Environments

Community & Support

Status

glenn-jocher Nov 26, 2024 Maintainer

Brayan532
Nov 26, 2024

UltralyticsAssistant
Nov 26, 2024
Maintainer

glenn-jocher
Nov 26, 2024
Maintainer