FourierTransformer

This is the official Pytorch implementation of paper Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator

Install

Pytorch version >= 1.13.0
Fairseq version >= 0.12.3

git clone https://github.com/LUMIA-Group/FourierTransformer.git
cd FourierTransformer
pip install -e .

For faster training, install NVIDIA's apex library following fairseq.

Experiments

# Download files for preprocessing

wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json'
wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe'
wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt'

Further Pretraining on Pile

Download the Pile dataset from here.

Preprocess the Pile:

# BPE
for SPLIT in train val test; do \
 python -m examples.roberta.multiprocessing_bpe_encoder \
     --encoder-json encoder.json \
     --vocab-bpe vocab.bpe \
     --inputs pile/${SPLIT}.raw.en \
     --outputs pile/${SPLIT}.bpe.en \
     --keep-empty \
     --workers 120; \
done

# Binarize
fairseq-preprocess \
 --only-source \
 --source-lang "en" \
 --srcdict dict.txt \
 --trainpref pile/train.bpe \
 --validpref pile/val.bpe \
 --testpref pile/test.bpe \
 --destdir pile-bin \
 --workers 60

rename files in pile-bin by removing ".en".

Script to further pretrain Fourier-Bart (in our paper, we randomly sliced 10G data from the Pile to conduct further pretraining).

CNN-Dailymail

Download, Preprocess and Binarize: Follow this script.
Fine-tuning Fourier Transformer on CNN-DM summarization task:
```
cd Summarization
sh submits/cnn-dm.sh
```
Evaluate:
For calculating rouge, install files2rouge from here.
```
sh submits/eval-cnn-dm.sh
```

ELI5

Download, Preprocess and Binarize: Follow this script.
Fine-tuning Fourier Transformer on ELI5 QA task:
```
cd Summarization
sh submits/eli5.sh
```
Evaluate:
```
sh submits/eval_eli5.sh
```

LRA

As mentioned in our paper, the code for LRA is build from this repository. Please follow the scripts there to prepare the datasets.

To run LRA experiments,

cd LRA/code
sh run_tasks.sh

Feel free to play with different settings by modifying lra_config.py

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
LRA		LRA
Summarization		Summarization
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FourierTransformer

Install

Experiments

Further Pretraining on Pile

CNN-Dailymail

ELI5

LRA

About

Releases

Packages

Contributors 2

Languages

LUMIA-Group/FourierTransformer

Folders and files

Latest commit

History

Repository files navigation

FourierTransformer

Install

Experiments

Further Pretraining on Pile

CNN-Dailymail

ELI5

LRA

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages