This repo is for learning purposes.
Author | Hassan Algoz |
---|---|
Acknowledgement | this repo is adapted from: https://cs230.stanford.edu/blog/pytorch/ |
My work on the repo was:
- add code piece by piece to make sure I get to understand it
- build the transformer following the "Attention Is All You Need" paper (decoder is yet to be implemented)
- However, the encoder was enough to build a sentiment analysis model
- using DirectML for Radeon GPU support
- add
explore.ipynb
for understanding the data - add
inspect.ipynb
for trained model analysis - simplify data loader
- use torch 2.0 (thanks to backward compatability with torch 1.x, I did not face any issues with this)
Given a text, can the model predict its label? e.g., spam/ham or positive/negative
-
Visit https://www.kaggle.com/code/matleonard/text-classification/input?select=yelp_ratings.csv and look for
yelp_ratings.csv
and download it. -
Place the
yelp_ratings.csv
asdata/yelp_ratings.csv
-
Explore the data using
explore.ipynb
-
Run the notebook
split_data.ipynb
to split it into:trian, test, dev
sets
Here is what yelp_ratings.csv
looks like:
"sentiment","text"
1,"Excellent food and staff.
I hope the course hasn't undergone any changes like the restaurant atmosphere and food has!"
0,"I really, really wanted to like The Chickery. I imagine this is what prison food is like."
1,"Sabrina is my stylist. She always has great advice about how to style my hair and what products to use."
- Note that we have dropped the
stars
column, and have made thesentiment
the first column. - Small data can be useful in searching for hyper-parameters. Then, for actual training, the more the better.
The purpose of explore.ipynb
is for gaining a better understanding of the data. Things like:
- Check for missing values
- Look at the distribution of the target variable
- Investigate relationships between features
- Visualize the data in various ways
- Check for any outliers or anomalies
- Gain insights that will help inform further analysis and modeling
We created a base_model
directory for you under the experiments
directory. It contains a file params.json
which sets the hyperparameters for the experiment. It looks like
{
"learning_rate": 1e-3,
"batch_size": 5,
"num_epochs": 2
}
For every new experiment, you will need to create a new directory under experiments
with a params.json
file.
To Train your experiment. Simply run:
python train.py --data_dir data/small --model_dir experiments/base_model
It will instantiate a model and train it on the training set following the hyperparameters specified in params.json
. It will also evaluate some metrics on the development set.
We created a new directory learning_rate
in experiments
for you. Now, run:
python search_hyperparams.py --data_dir data/small --parent_dir experiments/learning_rate
It will train and evaluate a model with different values of learning rate defined in search_hyperparams.py
and create a new directory for each experiment under experiments/learning_rate/
.
Display the results of the hyperparameters search in a nice format:
python synthesize_results.py --parent_dir experiments/learning_rate
Once you've run many experiments and selected your best model and hyperparameters based on the performance on the development set, you can finally evaluate the performance of your model on the test set. Run
python evaluate.py --data_dir data/small --model_dir experiments/base_model
Note that evaluate.py
has an evaluate()
function, which train.py
use to evaluate against the val
idation set, whereas doing python evaluate.py
does the evaluation against the test
set.
The purpose of inspect.ipynb
is to analyze the model.
- Identify any potential issues
- Consistently incorrect labels for certain examples
- Overfitting or underfitting certain types of examples
- Investigate how the model made predictions by:
- Reviewing the most important features
- Examining how changes to the input values affect the predictions
- Form hypotheses for how to improve the model
data
embeddings
- input vocabulary as vectorsreader.py
- defines how to read the dataset
evaluate.py
experiments
base_model
params.json
- hyperparameters for this experiment
model
attention_head.py
encoder.py
layer_norm.py
net.py
- neural network, loss function and metrics
requirements.txt
split_data.ipynb
- the notebook that splits the datatrain.ipynb
- same as train.py but used in blockstrain.py
- defines how the model is trainedutils.py
- general helpful functions