Skip to content

Commit

Permalink
Update README, pretraining details
Browse files Browse the repository at this point in the history
  • Loading branch information
ollmer authored Feb 11, 2021
1 parent 2fad7dd commit 7d0e65c
Showing 1 changed file with 22 additions and 8 deletions.
30 changes: 22 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
# ruGPT3-(Small, Medium, Large, XL)
This repository contains bunch of autoregressive transformer language models trained on a huge dataset of russian language.

Russian GPT-3 models (ruGPT3*) trained with 2048 context length with sparse and dense attention blocks. We also provide Russian GPT-2 large (ruGPT2Large) model trained with 1024 context length.
Russian GPT-3 models (ruGPT3XL, ruGPT3Large, ruGPT3Medium, ruGPT3Small) trained with 2048 sequence length with sparse and dense attention blocks. We also provide Russian GPT-2 large model (ruGPT2Large) trained with 1024 sequence length.

We suggest you to use ruGPT2Large or ruGPT3XL because this models are well tested and achieve the best perplexity.
We suggest using ruGPT2Large or ruGPT3XL because this models are well tested and achieve the best perplexity.

Usage examples are described in detail [here](examples/).

**Note: If you couldn't download the checkpoint, try adding it to your google drive following this [issue](https://www.geekrar.com/fix-bypass-google-drive-download-limit-error/)**


## Table of contents
* Setup
* [Setup ruGPT3XL](#Setup-ruGPT3XL)
Expand All @@ -29,12 +30,12 @@ Usage examples are described in detail [here](examples/).
* [Usage ruGPT3Small](#Usage-ruGPT3Small)
* [Usage ruGPT2Large](#Usage-ruGPT2Large)


## Setup
### Setup ruGPT3XL
Details of setup the XL model are described on a separate page [here](gw/).



### Setup ruGPT3Large
This model reuses code from [Microsoft fork of Megatron-LM](https://github.com/microsoft/DeepSpeedExamples/tree/master/Megatron-LM).
Supports python3.6 only.
Expand All @@ -59,12 +60,15 @@ pip install torch-blocksparse

Torch-Blocksparse depends on CUDA 10.1 and the [Triton](https://github.com/ptillet/triton) language compiler, which requires llvm-9.


### Setup ruGPT3Medium
For this model you can use code from Megatron LM in our repo or use transformers interface. Therefore, you should follow the instructions for setup ruGPT2Large or ruGPT3Large.


### Setup ruGPT3Small
For this model you can use code from microsoft Megatron LM in our repo or use transformers interface. Therefore, you should follow the instructions for setup ruGPT2Large or ruGPT3Large.


### Setup ruGPT2Large
This model is smaller and was trained with [transformers==v2.8.0](https://github.com/huggingface/transformers/tree/v2.8.0).
For installing use command:
Expand All @@ -73,11 +77,12 @@ pip install transformers
```

## Pretraining
All pretraining has been done on Nvidia Tesla V100-SXM3 32 Gb GPUs on [Christophari Cluster](https://sbercloud.ru/ru/christofari). Following are the details of pretraining for each model.
All pretraining was done on Nvidia Tesla V100-SXM3 32 Gb GPUs on a [Christofari Cluster](https://sbercloud.ru/ru/christofari). Following are the details of pretraining for each model.


### Pretraining ruGPT3XL
Model was trained with 512 sequence length using [Deepspeed](https://github.com/microsoft/DeepSpeed) and [Megatron](https://github.com/NVIDIA/Megatron-LM) code by [SberDevices](https://sberdevices.ru/) team, on 80B tokens dataset for 4 epochs. After that model was finetuned 1 epoch with sequence length 2048.
*Note! Model has sparse attention blocks.
Model was trained with 512 sequence length using [Deepspeed](https://github.com/microsoft/DeepSpeed) and [Megatron](https://github.com/NVIDIA/Megatron-LM) code by [SberDevices](https://sberdevices.ru/) team, on 80B tokens dataset for 4 epochs. After that model was finetuned 1 epoch with sequence length 2048.
*Note! Model has sparse attention blocks.*

Total training time was around 10 days on 256 GPUs.
Final perplexity on test set is `12.05`.
Expand All @@ -86,6 +91,7 @@ Final perplexity on test set is `12.05`.

See more details [here](gw/).


### Pretraining ruGPT3Large
Model was trained with sequence length 1024 using transformers lib by [SberDevices](https://sberdevices.ru/) team on 80B tokens for 3 epochs. After that model was finetuned 1 epoch with sequence length 2048.
*For load transformers checkpoint use `--load-openai`.
Expand All @@ -97,6 +103,7 @@ You can obtain this model here [GDrive](https://drive.google.com/file/d/1t4xw-nv

🤗HuggingFace model card [link](https://huggingface.co/sberbank-ai/rugpt3large_based_on_gpt2)


### Pretraining ruGPT3Medium
Model was trained with sequence length 1024 using transformers lib by [SberDevices](https://sberdevices.ru/) team on 80B tokens for 3 epoch. After that model was finetuned on 2048 context.

Expand All @@ -107,6 +114,7 @@ You can obtain this model here [GDrive](https://drive.google.com/file/d/1Lb9ILKw

🤗HuggingFace model card [link](https://huggingface.co/sberbank-ai/rugpt3medium_based_on_gpt2)


### Pretraining ruGPT3Small
Model was trained with sequence length 1024 using transformers by [SberDevices](https://sberdevices.ru/) team on 80B tokens around 3 epoch. After that model was finetuned on 2048 context.

Expand All @@ -116,17 +124,20 @@ You can obtain this model here [GDrive](https://drive.google.com/file/d/19dyhhay

🤗HuggingFace model card [link](https://huggingface.co/sberbank-ai/rugpt3small_based_on_gpt2)


### Pretraining ruGPT2Large
Model was trained with sequence length 1024 using transformers by [SberDevices](https://sberdevices.ru/) team on 170Gb data on 64 GPUs 3 weeks.

You can obtain this model here [GDrive](https://drive.google.com/file/d/1r65MwU0arie8NggxpSmc_3Ja5ldRNS70/view?usp=sharing) [Yandex.Disk](https://yadi.sk/d/BRbn4fl9wqKy0w) [GDrive option-2](https://drive.google.com/file/d/17YuV-uuhSVvMD1cnTe7cR-qscb3BtTiG/view?usp=sharing) or use transformers with model name `sberbank-ai/rugpt2large` (see [usage](#Usage-ruGPT2Large) for details).

🤗HuggingFace model card [link](https://huggingface.co/sberbank-ai/rugpt2large)


## Usage
### Usage ruGPT3XL
See all the details [here](gw/) or run example in [![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sberbank-ai/ru-gpts/blob/master/examples/ruGPT3XL_generation.ipynb)


### Usage ruGPT3Large
We provide 2 scripts for pretraining and generation with ruGPT3Large model. Save and load model checkpoints with `--save` and `--load`.

Expand Down Expand Up @@ -211,6 +222,7 @@ ruGPT3Large: или автор книги по бизнесу!

Example of generation in [![Googel Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sberbank-ai/ru-gpts/blob/master/examples/ruGPT3_generation_example.ipynb)


### Usage ruGPT3Medium
You can run megatron script with option `--load-openai` or use transformers interface:

Expand Down Expand Up @@ -238,6 +250,7 @@ ruGPT: как же джокер ты хитер, в этой игре
- Я не злодей, просто хотел узнать, можно ли узнать о чём?
```


### Usage ruGPT3Small
You can run megatron script with option `--load-openai` or use transformers interface:

Expand Down Expand Up @@ -269,6 +282,7 @@ ruGPT: На словах ты Лев Толстой, а на деле – Тол

Example of finetuning on essays and generation in [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sberbank-ai/ru-gpts/blob/master/examples/Finetune_ruGPT3Small.ipynb)


### Usage ruGPT2Large
We provide 2 scripts that pretrain and generate with ruGPT2Large from [transformers](https://github.com/huggingface/transformers/tree/v2.8.0) original code.

Expand All @@ -278,7 +292,7 @@ We can pass to model raw text files.
##### Running script
`bash ./scripts/pretrain_ruGPT2Large.sh`

This script runs single gpu ruGPT3Large pretraining. This script contains command for running on [Christophari](https://sbercloud.ru/ru/christofari):
This script runs single gpu ruGPT3Large pretraining. This script contains command for running on [Christofari](https://sbercloud.ru/ru/christofari):

```
python pretrain_transformers.py \
Expand All @@ -302,7 +316,7 @@ tokenizer = AutoTokenizer.from_pretrained("sberbank-ai/rugpt2large")
model = AutoModel.from_pretrained("sberbank-ai/rugpt2large")
```

##### Text Generation
#### Text Generation
`bash ./scripts/generate_ruGPT2Large.sh`

Starts an interactive terminal session that generates text either conditionally or unconditionally depending on what the user enters into the prompt.
Expand Down

0 comments on commit 7d0e65c

Please sign in to comment.