Project Name

Welcome to a Japanese LLM created using base model Open-Calm and finetuned on Japanese Alpaca Dataset. You can use this model to do any Language related task

Introduction

This LLM Apllication was build to create a intrustion based Japanese LLM. After openning the application you can ask any question or give it any language related task and the model will reply in Japanese

Installation

To get started, you need to set up the Conda environment.

Step 1: Install Conda

If you haven't already, install Conda from the official Anaconda website and follow the installation instructions.

Step 2: Finetune

To Finetune the model you first need to change the 'config.py' file as necessary. You can change the base model, dataset or any other training configuration as needed. The 'model_name' and 'dataset' should be huggingface model and dataset

model_name = 'cyberagent/open-calm-7b'
dataset = 'fujiki/japanese_alpaca_data'
dataset_dir = 'Dataset/JP_Alpaca'
peft_name = 'lora-calm-7b'
output_dir = 'lora-calm-7b-japanese-alpaca_v1'
CUTOFF_LEN = 1024 
VAL_SET_SIZE = 2000
idx = 5

eval_steps = 200
save_steps = 200
logging_steps = 20
EPOCHS = 100
LE = 3e-4

R=8
ALPHA=16
DROPOUT=0.05
target_modules = 'query_key_value'

After that you can simply run the 'finetune_lora.py' file.

python finetune_lora.py

You can also see logs of training using tensorboard using the below command.

tensorboard --logdir lora-calm-7b-japanese-alpaca_v1/runs

Inference Single

To do a single Inference you have to give the correct peft model path in 'config.py' and run 'inference_lora_single.py' file. After that give the necesary inputs to get reponse from LLM

peft_model_path = 'lora-calm-7b-japanese-alpaca_v1/checkpoint-164000'

python inference_lora_single.py

Inference Batch

To do a batch Inference on Test dataset of any custom dataset you have to select BATCH_SIZE and the path where you'll save the inference json in 'config.py'

BATCH_SIZE = 16
batch_inference_json_path = 'test_inference_dict_japanese_alpaca_best_train-loss.json'

and run 'inference_lora_batch.py' file. After some time this script will create a json file containing grounth truths and predictions so you can use it for evaluation

python inference_lora_batch.py'

Inference

To run the app you just hape to run the following command,

python app.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Demo		Demo
Evaluation		Evaluation
lora-calm-7b-japanese-alpaca_v1/runs		lora-calm-7b-japanese-alpaca_v1/runs
static		static
templates		templates
Prompts.ods		Prompts.ods
app.py		app.py
calc_f1.py		calc_f1.py
config.py		config.py
elyza.py		elyza.py
environment.yml		environment.yml
finetune_lora.py		finetune_lora.py
inference_lora_batch.py		inference_lora_batch.py
inference_lora_single.py		inference_lora_single.py
inference_lora_testing.py		inference_lora_testing.py
readme.md		readme.md
testing_f1_cosine.py		testing_f1_cosine.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Name

Table of Contents

Introduction

Installation

Step 1: Install Conda

Step 2: Finetune

Inference Single

Inference Batch

Inference

Demo

About

Releases

Packages

Languages

Hujaifa-Git/LLM-Finetune-on-Custom-Dataset-Japanese

Folders and files

Latest commit

History

Repository files navigation

Project Name

Table of Contents

Introduction

Installation

Step 1: Install Conda

Step 2: Finetune

Inference Single

Inference Batch

Inference

Demo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages