Skip to content

This repository applies transfer learning with the Llama 3.2-1B/Gemma model for sentiment classification using the SST-2 dataset. The project involves freezing base model parameters, adding task-specific layers, and training for optimal performance. Results are analyzed and compared between zero-shot and transfer learning outputs.

Notifications You must be signed in to change notification settings

bp0609/Llama3.2-Transfer-Learning-For-Classification-Task

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Task - 1

Project: Transfer Learning and Evaluation of the Llama 3.2-1B/Gemma Model


1. Model Selection and Parameter Calculation

  • Selected Model: Llama 3.2-1B

  • Parameter Calculation:

    • Calculated Parameters: 1,235,814,400
    • Reported Parameters (in paper): 1.23 Billion

    Comparison: Since the paper does not explicitly provide the exact number of parameters for the Llama 3.2-1B, we compare the approximate parameter count mentioned in the paper with the parameters calculated through the code. Upon calculation, the parameters are approximately similar, validating the correctness of our implementation and alignment with the model architecture described in the paper.


2. Transfer Learning Process

  • Datasets:

    • Classification Task: SST-2 (Sentiment Analysis)
  • Train-Test Split:

    • Split ratio: 80% train, 20% test
    • Sampling: Stratified sampling
  • Transfer Learning Process:

    1. Classification Task: SST-2

      • Loaded pre-trained Llama 3.2-1B using AutoModelForSequenceClassification.


3. Evaluation Metrics

Classification (SST-2)
  • Metrics:

    • Accuracy: Measures overall correctness.
    • Precision: Measures the ratio of correctly predicted positive observations to the total predicted positives.
    • Recall: Measures the ratio of correctly predicted positive observations to the all positives in the dataset.
    • F1 Score: Harmonic mean of precision and recall.
    Pretrained (Zero-shot) Transfer-Learned
    Zero-shot Accuracy Fine-tuned

4. Model Parameters After Transfer-Learning

Note: In fine-tuning we are adding a task specific head to the output of the pre-trained Llama 3.2-1B model(base model). While importing

  1. Classification Task: SST-2
  • Pre-trained model parameters: 1,235,814,400
  • Tranfer-learned model parameters: 1,235,818,496
  • Conclusion: The total number of parameters in the pre-trained model and transfer learned model are different due to the addition of task-specific layer. The base model parameters remain the same, and the additional parameters are due to the task-specific head added during fine-tuning and are only trained on the task-specific dataset.

5. Model Upload to Hugging Face

Fine-tuned models are uploaded to the 🤗 Hub:


6. Analysis of Results

  1. Classification Task: SST-2
    • Higher Scores of Transfer Learned Model:

      • These models exhibit higher scores compared to the pre-trained models on the zero-shot evaluation. This is because the fine-tuned models are more task-specific and have learned the patterns specific to the SST-2 dataset.
      • These models have a task-specific head that is trained on the SST-2 dataset, which helps in capturing the sentiment patterns effectively.
      • These models are more specialized for the SST-2 task, leading to better performance compared to the zero-shot evaluation.
    • Understanding Parameter Behavior:

      • The number of parameters in the fine-tuned model increases due to the addition of task-specific layers which has a total of 4096 parameters.

      • The base model parameters remain the same, and the additional parameters are only trained on the task-specific dataset.

      • The base model parameters are freezed and only the task-specific head is trained on the task-specific dataset.

    • Zero-Shot vs. Transfer Learned Model Performance:

      • Zero-shot models generalize poorly on specialized tasks like sentiment analysis or question answering whereas transfer learned models are more task-specific and exhibit better performance on the respective tasks.

        Pretrained (Zero-shot) Transfer Learned
        Zero-shot Accuracy Transfer Learned

About

This repository applies transfer learning with the Llama 3.2-1B/Gemma model for sentiment classification using the SST-2 dataset. The project involves freezing base model parameters, adding task-specific layers, and training for optimal performance. Results are analyzed and compared between zero-shot and transfer learning outputs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published