Skip to content

Latest commit

 

History

History
32 lines (31 loc) · 8.66 KB

README.md

File metadata and controls

32 lines (31 loc) · 8.66 KB

Enhancing Image Classification Performance through Architectural Exploration and Hyperparameter Tuning on CIFAR-10 Dataset

Our project focuses on improving image classification performance through deep learning techniques. We aim to refine model architectures and configurations to achieve higher accuracy. We use the CIFAR-10 dataset as our basis for evaluating these methods. In our research, we experiment with different aspects of model architecture, such as network depth and skip connections between layers. By adjusting these elements, we aim to understand how they affect the model's ability to classify images accurately. We also fine-tune important parameters like the number of classes and base channels to optimize the model's performance. To assess our progress, we compare our optimized models with a standard convolutional neural network (CNN) that includes transformer layers. These transformers help capture long-range dependencies in images. Through this comparison, we gain insights into the strengths and weaknesses of our approach. Overall, our goal is to advance image classification techniques by refining model architecture and parameter settings, with the ultimate aim of achieving higher accuracy on the CIFAR-10 dataset.

DataSet Description

CIFAR-10: A Foundational Dataset for Advancing Image Classification The CIFAR-10 dataset serves as the bedrock for our research into enhancing image classification performance through architectural and hyperparameter optimization. This seminal collection of 60,000 32x32 RGB images is a widely adopted benchmark that reflects the intricacies of real-world computer vision tasks. Explicitly designed to be challenging, CIFAR-10 contains 10 classes spanning vehicles to animals to objects. Each class holds 6,000 images capturing substantial visual diversity through permutations of perspectives, poses, sizes and types. Airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships and trucks constitute the categories.

Methodology

This research pursues a multi-pronged methodology to elevate image classification performance focusing on the seminal CIFAR-10 dataset. Three facets form the core strategy:

Architecture Exploration

Altering design dimensions including depth, width and connectivity to reveal accuracy impacts. Depth modulation stretches capacity. Width balancing prevents under/over-parameterization. Skip connections improve gradient flow and feature propagation Architecture

Hyperparameter Tuning

Optimizing cardinal capacity drivers including class numbers, base channels, depth/width factors. Finding ideal formulations to maximize representational power. Rigorous tuning regime with statistical evaluation. A four-dimensional grid is crafted by systematically varying the values of the earlier defined hyperparameters num_classes, base_channels, depth_factor, and width_factor within reasonable ranges based on model size constraints. For each hyperparameter vector from this grid, a model instance is trained for 5 epochswhiletrackingevaluationmetrics.Thereafter, inferences are drawn from the metric trends to select optimal configurations.The specific grid search hyperparameter values experimented with are:

  • Num_classes: Fixed at 10 for CIFAR-10
  • Base_channels: 32, 64
  • Depth_factor: 2,3
  • Width_factor: 2,3 This grid enables extensive exploration covering 4x5x5 =100 unique hyperparameter combinations. Additional dimensions can supplement this grid to expand search breadth. The optimal hyperparameter configuration is selected based on purely test accuracy attained aftermodel convergence. Priority is assigned to peak generalization performance signaling the model’s readiness for real-world deployment. Among configurations with similar test accuracy, the one with the lowest complexity is preferred.

Comparitive Testing

Evaluating against Convolutional + Transformer archetype to highlight relative pros/cons. Headto-head benchmarking on accuracy and efficiency vectors. Significant testing to ascertain benefits of optimization. The CNN-Transformer architecture strategically integrates convolutional and attention-based modules to enhance image classification performance. The Convolutional Neural Network (CNN) module utilizes convolutional filters to extract hierarchical visual features from images, starting with small 3x3 kernels to capture fundamental patterns. Max-pooling layers compress spatial dimensions for efficiency, while subsequent convolution layers build representational complexity. The Transformer Encoder module captures global relationships within image representations by treating flattened CNN outputs as token sequences, enabling the modeling of long-range inter-token dependencies via multi-headed self-attention. Encoder layers handle feature sequences, with multiple parallel attention heads weighing different input aspects. The self-attention block dynamically accentuates informative elements within sequences. A fully connected (FC) layer processes the transformer output, and mean pooling condenses the sequence into a representation for FC layer processing into class probabilities. The CIFAR-10 dataset serves as a standardized benchmark for evaluating model capabilities, with input images undergoing normalization for classification optimization. Model initialization involves configuring the architecture for 10-class categorization, while cross-entropy loss and the Adam optimizer are employed for iterative parameter updates, adhering to learning rate constraints.

Results

The improved CNN model demonstrates promising performance on the CIFAR-10 dataset, achieving a training accuracy of 75.17% over 10 epochs. Crucially, it showcases strong generalization capabilities, with a test set accuracy of 79.44%, affirming the effectiveness of architectural enhancements in real-world scenarios. In terms of computational efficiency, the model operates within modern hardware constraints, requiring 530 billion floating-point operations (FLOPs) for real-time inference. A comparative analysis with the CNN+Transformer baseline reveals stark differences. While the CNN+Transformer reaches a peak training accuracy of 10.72% over 5 epochs on CIFAR-10, with a corresponding test set accuracy of 10.35%, it falls significantly short of the improved CNN model. The inferior generalization performance of the CNN+Transformer signals challenges in effectively processing 2D image data, evidenced by fluctuating loss function values and architectural limitations in mapping inputs to correct outputs. Potential avenues for improvement include expanding model capacity, optimizing self-attention mechanisms for 2D data, and introducing custom data preprocessing strategies. However, inherent constraints in adapting language-centric techniques to image tasks may cap potential accuracy by flattening crucial spatial relationships, thereby limiting performance. In contrast, the improved CNN model exhibits smoothly decreasing loss profiles, leveling at 1.71 by the final epoch, indicating effective convergence and promising prospects for further refinement. Confusion Matrix for CNN-Transformer

References

  1. S. Aslam and A. B. Nassif, “Deep learning based cifar-10 classification,” Advances in Science and Engineering Technology International Conferences (ASET), Feb 2023.
  2. R. Doon, T. K. Rawat, and S. Gautam, “Cifar-10 classification using deep convolutional neural network,” IEEE Punecon, Dec 2018.
  3. D. Yeboah, M. S. A. Sanoussi, and G. K. Agordzo, “Image classification using tensorflow gpu,” 2021 International Conference on ICT for Smart Society (ICISS), Aug 2021.
  4. S. Shekhar, A. Bansode, and A. Salim, “A comparative study of hyperparameter optimization tools,” 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), Dec2021.
  5. Z. Wang, M. Agung, R. Egawa, R. Suda, and H. Takizawa, “Automatic hyperparameter tuning of machine learning models under time constraints,” IEEE International Conference on Big Data (Big Data), 2018.
  6. M. Aach, R. Sedona, A. Lintermann, G. Cavallaro, H. Neukirchen, and M. Riedel, “Accelerating hyperparameter tuning of a deep learning model for remote sensing image classification,” IGARSS 2022- 2022 IEEE International Geoscience and Remote Sensing Symposium, 2022.
  7. S. Simon, N. Kolyada, C. Akiki, M. Potthast, B. Stein, and N. Siegmund, “Exploring hyperparameter usage and tuning in machine learning research,” IEEE/ACM 2nd International Conference on AI Engineering Software Engineering for AI (CAIN), 2023.