Welcome to the inmind.ai_amazing_project's repository. This project is an (almost) cutting-edge solution developed to tackle the challenges of object detection within digital images. Leveraging state-of-the-art machine learning techniques and architectures, including YOLOv7 and custom PyTorch models, this system is designed to significantly enhance our capabilities in identifying and categorizing objects across various scenarios and datasets.
This amazing project encompasses the entire pipeline of object detection tasks—from dataset preparation and augmentation to training robust models and deploying them for real-time inference. The solutions developed here demonstrate our commitment to advancing the field of computer vision, and lay a solid foundation for future innovations, to ultimately combine computer vision with robotics.
This documentation provides a comprehensive guide to the project, including setup instructions, feature highlights, usage examples, and insights into the technologies we've employed. Our goal is to offer a clear overview of the project's capabilities and facilitate its adoption and further development.
Stay tuned as we dive deeper into the details of this exciting venture into the realm of artificial intelligence and computer vision.
Getting started with the Object Detection System project is straightforward. Follow these steps to set up your environment and run the project locally.
Before you begin, ensure you have the following installed:
- Python (version 3.8 or higher recommended)
- Git
-
Clone the Repository
First, clone the project repository to your local machine using Git:
git clone https://github.com/wgtayar/inmind_amazing_project cd inmind_amazing_project
-
Create a Virtual Environment (Optional but Recommended)
It's best practice to use a virtual environment for Python projects. This keeps your dependencies organized and avoids conflicts. To create and activate a virtual environment:
python -m venv venv # For Windows venv\Scripts\activate # For macOS and Linux source venv/bin/activate
-
Install Dependencies
With your virtual environment activated, install the project dependencies by running:
pip install -r requirements.txt
This command reads the
requirements.txt
file and installs all the necessary Python packages.
This object detection system is designed with the following capabilities:
- Data Preparation and Augmentation: Utilizes powerful libraries like Albumentations to prepare and augment images, enhancing the model's ability to generalize across different lighting conditions, angles, and backgrounds.
- Advanced Object Detection Models: Incorporates state-of-the-art models such as YOLOv7, alongside custom PyTorch models, ensuring high accuracy and efficiency in object detection tasks.
- Model Training and Evaluation: Offers a streamlined process for training object detection models, complete with evaluation metrics to assess model performance accurately.
- Hyperparameter Optimization: Supports experimenting with different hyperparameters to fine-tune the models for optimal performance.
- Real-time Inference: Capable of deploying trained models for real-time object detection, making it suitable for integration into live systems.
- Visualization Tools: Includes tools like TensorBoard for visualizing model metrics during training, and Netron for viewing model architectures, aiding in the interpretability and analysis of model performance.
- Inference API: Features a scalable API for model inference, providing endpoints for model listing, image-based detection, and returning annotated images with detected objects. -->
- Export to Inference Models: Enables exporting trained models to formats compatible with ONNX model, facilitating deployment across different platforms. Dockerization (Optional): Offers the option to dockerize the inference API, simplifying deployment and scaling in production environments.
This project encompasses several stages, including dataset preparation, model training, evaluation, and applying data augmentation techniques. Follow these steps to utilize the system effectively:
-
Convert Annotations to YOLO Format: Start by converting your dataset annotations from JSON to YOLO format, facilitating compatibility with YOLOv7 training requirements. Utilize the
convert_annotations_to_yolo_format
function provided inModelTraining.ipynb
for this purpose. This function reads annotations from the specified directory and converts them into YOLO format, saving the output in a designated directory. -
Splitting the Dataset: To ensure the robustness of your model, split your dataset into training and validation sets. The splitting process is demonstrated in
ModelTraining.ipynb
, leveraging thetrain_test_split
method fromsklearn.model_selection
.
To train your object detection model, follow these steps:
-
Loading the Dataset: Use the
CustomObjectDetectionDataset
class fromLoadingBMWDataset.py
to load your dataset. This class allows for easy integration of custom transformations. -
Training: Refer to the training process outlined in
CustomResNet.ipynb
. This notebook provides a comprehensive guide to setting up and executing the training loop with PyTorch, leveraging a custom ResNet backbone.
After training, evaluate your model's performance using the evaluation metrics provided in ModelTraining.ipynb
. The evaluate_model
function computes precision, recall, and F1 score, offering insight into your model's accuracy and reliability.
Data augmentation is a powerful technique to improve model generalization. Use the DatasetWithAugmentations
class in DataAugmentation.py
to apply a series of augmentations to your dataset, as shown:
dataset = DatasetWithAugmentations(img_dir, annotation_dir)
If necessary, use the scripts provided in jsonfixer.ipynb
to correct class IDs within your dataset annotations. This can be crucial for maintaining consistency and accuracy in your model's training data.
For training with YOLOv7 models, ensure your annotations are in the correct format by following the conversion process outlined in jsonfixer.ipynb
. This adaptation is essential for compatibility with YOLOv7's training requirements.
-
Saving: Upon completing the training, save your model's state dictionary for future use:
torch.save(model.state_dict(), 'path_to_save_model.pth')
-
Loading: To resume training or for evaluation, load the saved model parameters into the model architecture:
model.load_state_dict(torch.load('path_to_saved_model.pth'))
Follow these steps to effectively train, evaluate, and enhance your object detection models. For detailed code examples and instructions, refer to the corresponding Jupyter notebooks and Python files provided in this project.
Below are the TensorBoard screenshots demonstrating the training metrics and loss curves for the Yolov7 model training.
The ONNX models used in this project are available for download from the following OneDrive folder. These models reflect the weights acquired during training for both the Yolov7 and the customResNet models.
Download ONNX Models from OneDrive
The CustomResNet model extends a pre-trained ResNet50 model by integrating custom layers designed to enhance object detection capabilities. This architecture aims to leverage the robust feature extraction capabilities of ResNet50 while tailoring the model's head for specific object detection tasks.
- Base Model: ResNet50, known for its deep residual learning framework, which facilitates training of deeper networks by addressing the vanishing gradient problem.
- Custom Layers: Sequential layers have been added to the model's head, including ReLU-activated fully connected layers, aiming at refining the feature representations for object detection.
This approach draws inspiration from the following papers, which explore enhancements to convolutional neural network architectures for improved performance in object detection tasks:
- He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
- Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).
These works demonstrate the effectiveness of deep residual learning and feature hierarchies in object recognition, principles that underpin the design of our CustomResNet model.
YAML files play a crucial role in configuring the YOLOv7 model for training and inference. These files specify model parameters, paths to datasets, and other configuration settings that ensure the model is trained with the correct data and hyperparameters.
-
Configuration: Edit the YAML file to include the correct paths to your training and validation datasets. Additionally, set any model-specific parameters such as input size, number of classes, and hyperparameters.
-
Training: When initiating the training process, pass the YAML file as an argument to specify the configuration to be used. Example command:
python train.py --cfg path_to_your_yaml_file.yaml
-
Inference: Similarly, for inference, ensure the YAML file used for training is referenced to maintain consistency in model behavior and performance.
- Documentation: Clearly document any changes made to the default configuration to facilitate reproducibility.
- Version Control: Keep versions of your YAML configurations to track modifications over time and experiment with different settings.
By carefully managing and utilizing YAML files, you can effectively control the behavior of the YOLOv7 model, optimizing it for your specific object detection tasks.
To guide users on running your Dockerized Flask app, you can create a README.md file with clear instructions. Here's a template you can use and adjust according to your project specifics:
This guide provides instructions on how to run the Dockerized version of the Flask app, which serves an ONNX model for object detection tasks.
Before proceeding, ensure you have Docker installed on your system. If you need to install Docker, follow the official documentation here: Get Docker.
-
Clone the Repository
First, clone the repository containing the Flask app and navigate into the project directory:
git clone <repository_url> cd <project_directory>
Replace
<repository_url>
with the URL of your Git repository and<project_directory>
with the name of the directory into which you cloned the repository. -
Build the Docker Image
From the project directory, build the Docker image using the following command:
docker build -t myflaskapp .
Here,
myflaskapp
is the name given to the Docker image. Feel free to replace it with a name of your choice. -
Run the Docker Container
After the image has been successfully built, run the container using:
docker run -p 5000:5000 myflaskapp
This command runs the container and maps port 5000 of the container to port 5000 on your host machine, allowing you to access the Flask app at
http://localhost:5000
.
Once the app is running, you can interact with it using the following endpoints:
- List Models: Access
http://localhost:5000/models
to get a list of available models. - Make a Prediction: Send a POST request to
http://localhost:5000/predict
with an image file to receive predicted bounding boxes and scores. - Get an Image with Predictions: Send a POST request to
http://localhost:5000/predict-image
with an image file to receive the same image with bounding boxes drawn on it.
You can use tools like Postman or cURL to send POST requests. Here's an example cURL command to send an image to the prediction endpoint:
curl -X POST -F "file=@path_to_your_image.jpg" http://localhost:5000/predict
Replace path_to_your_image.jpg
with the actual path to the image file you wish to analyze.
To stop the running Docker container, you can use the Docker CLI. First, find the container ID using:
docker ps
Then, stop the container with:
docker stop <container_id>
Replace <container_id>
with the actual ID of your container.
For any feedback or issues, please open an issue in the repository or submit a pull request with improvements.
This project has been informed and inspired by a variety of resources, ranging from technical guides to academic research. Below is a list of references that have contributed to the development and understanding of the technologies and methodologies used in this project:
- Markdown Guide - Hacks: https://www.markdownguide.org/hacks/
- He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). https://link.springer.com/chapter/10.1007/978-3-319-46448-0_2
- How to Train YOLOv7 on Custom Data - Paperspace Blog: https://blog.paperspace.com/train-yolov7-custom-data/
- Fine-tuning YOLOv7 on a Custom Dataset - LearnOpenCV: https://learnopencv.com/fine-tuning-yolov7-on-custom-dataset/
- Understanding Git Push and 'origin' - Warp Dev: https://www.warp.dev/terminus/understanding-git-push-origin
- Online Markdown Editor - TutorialsPoint: https://www.tutorialspoint.com/online_markdown_editor.php
- How to Train a Custom Object Detection Model with YOLOv7 - Analytics Vidhya: https://www.analyticsvidhya.com/blog/2022/08/how-to-train-a-custom-object-detection-model-with-yolov7/
- Git Documentation: https://git-scm.com/docs
- Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).
- Implementing ResNet from Scratch: A comprehensive guide to building a Residual Network model from the ground up. (Placeholder for a real link)
- "Deep Residual Learning for Image Recognition" by Kaiming He et al. - This paper introduces the concept of deep residual learning and presents the ResNet architecture, laying the foundation for many modern deep learning approaches to computer vision. https://arxiv.org/abs/1512.03385