Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
Add url for xl finetune and generation
  • Loading branch information
king-menin authored Mar 9, 2021
1 parent 3f4ee27 commit 817fe85
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,8 @@ Test installation of deepspeed you can with the following command: `ds_report`.

Example of inference of RuGPT3XL [here](examples/ruGPT3XL_generation.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sberbank-ai/ru-gpts/blob/master/examples/ruGPT3XL_generation.ipynb)

Example of finetune, load finetuned model and generate is [here](examples/ruGPT3XL_finetune_example.ipynb).

For using sparse layers in model use ```--sparse-mode <mode>``` and specify key `"sparse_attention"` at deepspeed_config (RuGPT3XL config [example](src/deepspeed_config/gpt3_xl_sparse_2048.json)). Modes can be: `fixed`, `bigbird`, `bslongformer`, `variable`, `dense`.

More information about sparse attention [here](https://www.deepspeed.ai/tutorials/sparse-attention/).
Expand All @@ -160,8 +162,12 @@ Final perplexity on test set is `12.05`.

See more details for generation [here](examples/ruGPT3XL_generation.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sberbank-ai/ru-gpts/blob/master/examples/ruGPT3XL_generation.ipynb).

Example of finetune, load finetuned model and generate is [here](examples/ruGPT3XL_finetune_example.ipynb).

Our pretraining script [here](scripts/deepspeed_gpt3_xl.sh)

Example of finetuning script [here](scripts/deepspeed_gpt3_xl_finetune.sh)

### Pretraining ruGPT3Large
Model was trained with sequence length 1024 using transformers lib by [SberDevices](https://sberdevices.ru/) team on 80B tokens for 3 epochs. After that model was finetuned 1 epoch with sequence length 2048.

Expand Down

0 comments on commit 817fe85

Please sign in to comment.