-
Selected Model: Llama 3.2-1B
-
Parameter Calculation:
- Calculated Parameters:
1,235,814,400
- Reported Parameters (in paper):
1.23 Billion
Comparison: Since the paper does not explicitly provide the exact number of parameters for the Llama 3.2-1B, we compare the approximate parameter count mentioned in the paper with the parameters calculated through the code. Upon calculation, the parameters are approximately similar, validating the correctness of our implementation and alignment with the model architecture described in the paper.
- Calculated Parameters:
-
Datasets:
- Classification Task: SST-2 (Sentiment Analysis)
-
Train-Test Split:
- Split ratio: 80% train, 20% test
- Sampling: Stratified sampling
-
Transfer Learning Process:
-
Classification Task: SST-2
- Loaded pre-trained Llama 3.2-1B using AutoModelForSequenceClassification.
-
-
Metrics:
- Accuracy: Measures overall correctness.
- Precision: Measures the ratio of correctly predicted positive observations to the total predicted positives.
- Recall: Measures the ratio of correctly predicted positive observations to the all positives in the dataset.
- F1 Score: Harmonic mean of precision and recall.
Pretrained (Zero-shot) Transfer-Learned
Note: In fine-tuning we are adding a task specific head to the output of the pre-trained
Llama 3.2-1B model
(base model). While importing
- Classification Task: SST-2
- Pre-trained model parameters:
1,235,814,400
- Tranfer-learned model parameters:
1,235,818,496
- Conclusion: The total number of parameters in the pre-trained model and transfer learned model are different due to the addition of task-specific layer. The base model parameters remain the same, and the additional parameters are due to the task-specific head added during fine-tuning and are only trained on the task-specific dataset.
Fine-tuned models are uploaded to the 🤗 Hub:
- Classification Task: SST-2
-
Higher Scores of Transfer Learned Model:
- These models exhibit higher scores compared to the pre-trained models on the zero-shot evaluation. This is because the fine-tuned models are more task-specific and have learned the patterns specific to the SST-2 dataset.
- These models have a task-specific head that is trained on the SST-2 dataset, which helps in capturing the sentiment patterns effectively.
- These models are more specialized for the SST-2 task, leading to better performance compared to the zero-shot evaluation.
-
Understanding Parameter Behavior:
-
The number of parameters in the fine-tuned model increases due to the addition of task-specific layers which has a total of 4096 parameters.
-
The base model parameters remain the same, and the additional parameters are only trained on the task-specific dataset.
-
The base model parameters are freezed and only the task-specific head is trained on the task-specific dataset.
-
-
Zero-Shot vs. Transfer Learned Model Performance:
-