Environment Setup

Recreate

This directory includes several complex environments. You should be able to use the various .yml files to create new conda environments. If you don't already use conda, you need to install conda from here.

conda create --name <envName> --file <envName>.yml

The two environments available in this folder are for the SoC GPUs and for a local (conda) python installation. The cuda version is for the SoC GPUs. Note that you may have to install some cuda toolkit software before the environment installs correctly.

Here is an explanation of each saved environment:

Current envs

qagHfCuda.yml - main environment to use for this repository and thesis. use this for training LLaMA 2 with the trainer, data formatter, data processor, and other scripts in the src/ directory.
qagHf.yml - the non-cuda, local version of the environment. should allow for data processing and inference based on the final model

Archived Envs

fastT5.yml - used for generating ONNX models of the Potsawee T5 QAG model. Creating the models seemed to work, but opening them and running inference never worked.
qagLmqg.yml && qagLmqgCuda.yml - used to test the lmqg python package for model training. This did not work with LLaMA 2.
qagT5.yml && qagT5Cuda.yml - main environments used during the attempts to train T5 for QAG. never fully worked, but showed promise. also includes packages for Optimum ONNX generation which did work

Start from scratch

Otherwise start a conda environment from scratch with:

conda create -n aqg
conda activate aqg
conda install python=3.11.5

Install the python packages that you need.

For optimum and t5:

pip install datasets evaluate fastt5 huggingface kaggle pandas numpy onnx onnxruntime optimum tokenizers torch transformers nltk

For LLaMA 2 training:

pip install datasets evaluate huggingface numpy pandas transformers tokenizers torch

It's up to you to figure out how to install cuda toolkit on your machine.

Training

To fine-tune a new model set up the proper config for the model type you are training. There are three types:

AE
QG
E2E

Then run trainer.py. The run stats should be available in the pbe_qag team on wandb.ai. Each model type is it's own "project."

To run AE and QG traing back-to-back, just start with type in qag.ini as AE and run:

python trainer.py; python trainer.py

And then once the first training has started, simply change type to QG. Save the file. This will run AE training first and, when complete, will run another trainig, but this time QG as that is what qag.ini specifies.

Generating questions and answers

To run the model, specify whether you want pipeline or end-to-end generation in the ocnfig file, qag.ini. Values of AE or QG will result in pipeline generation. E2E will enable end-to-end generation. Then, just run generator.py.

python generator.py

You will be in an inference loop where you can enter a verse reference for generation or press enter for a random verse.

The configuration for the project is in qag.ini. This file determines the current model source, data source, and inference prompt used. Data && logs are kept in the data folder.

Resources

The resources folder is for miscellaneous project resources.

Name		Name	Last commit message	Last commit date
Latest commit History 184 Commits
.vscode		.vscode
data		data
env		env
research		research
scripts		scripts
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Environment Setup

Recreate

Current envs

Archived Envs

Start from scratch

Training

Generating questions and answers

Resources

About

Languages

MatousAc/qag

Folders and files

Latest commit

History

Repository files navigation

Environment Setup

Recreate

Current envs

Archived Envs

Start from scratch

Training

Generating questions and answers

Resources

About

Topics

Resources

Stars

Watchers

Forks

Languages