Skip to content

Commit

Permalink
Merge branch 'master' into gcie/metrics
Browse files Browse the repository at this point in the history
Conflicts:
	fairseq/data/handwriting/raw_handwriting_dataset.py
	fairseq/models/wav2vec/wav2vec2_scribblelens.py
  • Loading branch information
janchorowski committed Dec 30, 2020
2 parents dbe041f + 3573075 commit 428606b
Show file tree
Hide file tree
Showing 185 changed files with 8,552 additions and 2,593 deletions.
30 changes: 30 additions & 0 deletions .github/stale.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Configuration for probot-stale - https://github.com/probot/stale
# Mostly copied from github.com/facebook/react/blob/master/.github/stale.yml
# Number of days of inactivity before an issue becomes stale
daysUntilStale: 90
# Number of days of inactivity before a stale issue is closed
daysUntilClose: 7
# Issues with these labels will never be considered stale
exemptLabels:
- bug
# Label to use when marking an issue as stale
staleLabel: stale
issues:
# Comment to post when marking an issue as stale.
markComment: >
This issue has been automatically marked as stale.
**If this issue is still affecting you, please leave any comment** (for example, "bump"), and we'll keep it open.
We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!
# Comment to post when closing a stale issue.
closeComment: >
Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!
pulls:
# Comment to post when marking a pull request as stale.
markComment: >
This pull request has been automatically marked as stale.
**If this pull request is still relevant, please leave any comment** (for example, "bump"), and we'll keep it open.
We are sorry that we haven't been able to prioritize reviewing it yet. Your contribution is very much appreciated.
# Comment to post when closing a stale pull request.
closeComment: >
Closing this pull request after a prolonged period of inactivity. If this issue is still present in the latest release, please ask for this pull request to be reopened. Thank you!
18 changes: 14 additions & 4 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,26 +19,36 @@ jobs:
runs-on: ${{ matrix.platform }}

steps:
- uses: actions/checkout@v1
- uses: actions/checkout@v2

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v1
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}

- name: Conditionally install pytorch
if: matrix.platform == 'windows-latest'
run: pip3 install torch -f https://download.pytorch.org/whl/torch_stable.html

- name: Install locally
run: |
python -m pip install --upgrade pip
git submodule update --init --recursive
python setup.py build_ext --inplace
python -m pip install --editable .
- name: Install optional test requirements
run: |
python -m pip install fairscale iopath transformers
- name: Lint with flake8
run: |
pip install flake8
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics --extend-exclude fairseq/model_parallel/megatron
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --extend-exclude fairseq/model_parallel/megatron
- name: Run tests
run: |
python setup.py test
41 changes: 41 additions & 0 deletions .github/workflows/build_wheels.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: build_wheels

on:
push:
branches:
- v[0-9]+.[0-9]+.[x0-9]+
tags:
- v*

jobs:
build_wheels:
name: Build wheels on ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest]

steps:
- uses: actions/checkout@v2

- name: Install Python
uses: actions/setup-python@v2
with:
python-version: '3.7'

- name: Install cibuildwheel
run: |
python -m pip install cibuildwheel
- name: Build wheels for CPython
run: |
python -m cibuildwheel --output-dir dist
env:
CIBW_BUILD: "cp36-*64 cp37-*64 cp38-*64"
CIBW_MANYLINUX_X86_64_IMAGE: manylinux1
CIBW_BEFORE_BUILD: git submodule update --init --recursive && pip install .

- uses: actions/upload-artifact@v2
with:
name: wheels
path: ./dist/*.whl
4 changes: 0 additions & 4 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
[submodule "fairseq/models/huggingface/transformers"]
path = fairseq/models/huggingface/transformers
url = https://github.com/myleott/transformers.git
branch = fairseq
[submodule "fairseq/model_parallel/megatron"]
path = fairseq/model_parallel/megatron
url = https://github.com/ngoyal2707/Megatron-LM
Expand Down
25 changes: 16 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ We provide reference implementations of various sequence modeling papers:
+ [Understanding Back-Translation at Scale (Edunov et al., 2018)](examples/backtranslation/README.md)
+ [Adaptive Input Representations for Neural Language Modeling (Baevski and Auli, 2018)](examples/language_model/README.adaptive_inputs.md)
+ [Lexically constrained decoding with dynamic beam allocation (Post & Vilar, 2018)](examples/constrained_decoding/README.md)
+ [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (Dai et al., 2019)](examples/truncated_bptt/README.md)
+ [Adaptive Attention Span in Transformers (Sukhbaatar et al., 2019)](examples/adaptive_span/README.md)
+ [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](examples/translation_moe/README.md)
+ [RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)](examples/roberta/README.md)
+ [Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)](examples/wmt19/README.md)
Expand All @@ -59,8 +61,10 @@ We provide reference implementations of various sequence modeling papers:

### What's New:

* November 2020: Adopted [Hydra](https://github.com/facebookresearch/hydra) as a configuration framework;
[added documentation explaining how to use it for new and existing projects](docs/hydra_integration.md)
* December 2020: [GottBERT model and code released](examples/gottbert/README.md)
* November 2020: Adopted the [Hydra](https://github.com/facebookresearch/hydra) configuration framework
* [see documentation explaining how to use it for new and existing projects](docs/hydra_integration.md)
* November 2020: [fairseq 0.10.0 released](https://github.com/pytorch/fairseq/releases/tag/v0.10.0)
* October 2020: [Added R3F/R4F (Better Fine-Tuning) code](examples/rxf/README.md)
* October 2020: [Deep Transformer with Latent Depth code released](examples/latent_depth/README.md)
* October 2020: [Added CRISS models and code](examples/criss/README.md)
Expand All @@ -69,13 +73,13 @@ We provide reference implementations of various sequence modeling papers:
* August 2020: [Added lexically constrained decoding](examples/constrained_decoding/README.md)
* August 2020: [wav2vec2 models and code released](examples/wav2vec/README.md)
* July 2020: [Unsupervised Quality Estimation code released](examples/unsupervised_quality_estimation/README.md)

<details><summary>Previous updates</summary><p>

* May 2020: [Follow fairseq on Twitter](https://twitter.com/fairseq)
* April 2020: [Monotonic Multihead Attention code released](examples/simultaneous_translation/README.md)
* April 2020: [Quant-Noise code released](examples/quant_noise/README.md)
* April 2020: [Initial model parallel support and 11B parameters unidirectional LM released](examples/megatron_11b/README.md)

<details><summary>Previous updates</summary><p>

* March 2020: [Byte-level BPE code released](examples/byte_level_bpe/README.md)
* February 2020: [mBART model and code released](examples/mbart/README.md)
* February 2020: [Added tutorial for back-translation](https://github.com/pytorch/fairseq/tree/master/examples/backtranslation#training-your-own-model-wmt18-english-german)
Expand All @@ -99,10 +103,10 @@ We provide reference implementations of various sequence modeling papers:
+ beam search
+ Diverse Beam Search ([Vijayakumar et al., 2016](https://arxiv.org/abs/1610.02424))
+ sampling (unconstrained, top-k and top-p/nucleus)
+ lexically constrained decoding ([Post & Vilar, 2018](examples/constrained_decoding/README.md))
* large mini-batch training even on a single GPU via delayed updates
* mixed precision training (trains faster with less GPU memory on [NVIDIA tensor cores](https://developer.nvidia.com/tensor-cores))
* extensible: easily register new models, criterions, tasks, optimizers and learning rate schedulers
+ [lexically constrained decoding](examples/constrained_decoding/README.md) (Post & Vilar, 2018)
* [gradient accumulation](https://fairseq.readthedocs.io/en/latest/getting_started.html#large-mini-batch-training-with-delayed-updates) enables training with large mini-batches even on a single GPU
* [mixed precision training](https://fairseq.readthedocs.io/en/latest/getting_started.html#training-with-half-precision-floating-point-fp16) (trains faster with less GPU memory on [NVIDIA tensor cores](https://developer.nvidia.com/tensor-cores))
* [extensible](https://fairseq.readthedocs.io/en/latest/overview.html): easily register new models, criterions, tasks, optimizers and learning rate schedulers
* [flexible configuration](docs/hydra_integration.md) based on [Hydra](https://github.com/facebookresearch/hydra) allowing a combination of code, command-line and file based configuration

We also provide [pre-trained models for translation and language modeling](#pre-trained-models-and-examples)
Expand Down Expand Up @@ -131,6 +135,9 @@ pip install --editable ./

# on MacOS:
# CFLAGS="-stdlib=libc++" pip install --editable ./

# to install the latest stable release (0.10.0)
# pip install fairseq==0.10.0
```

* **For faster training** install NVIDIA's [apex](https://github.com/NVIDIA/apex) library:
Expand Down
5 changes: 3 additions & 2 deletions docs/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ well for the IWSLT 2014 dataset:
> mkdir -p checkpoints/fconv
> CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/iwslt14.tokenized.de-en \
--lr 0.25 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000 \
--optimizer nag --lr 0.25 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000 \
--arch fconv_iwslt_de_en --save-dir checkpoints/fconv
By default, :ref:`fairseq-train` will use all available GPUs on your machine. Use the
Expand Down Expand Up @@ -182,9 +182,10 @@ sure to update ``--master_addr`` to the IP address of the first node:
--arch transformer_vaswani_wmt_en_de_big --share-all-embeddings \
--optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
--lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 \
--lr 0.0005 --min-lr 1e-09 \
--lr 0.0005 \
--dropout 0.3 --weight-decay 0.0 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
--max-tokens 3584 \
--max-epoch 70 \
--fp16
On SLURM clusters, fairseq will automatically detect the number of nodes and
Expand Down
Loading

0 comments on commit 428606b

Please sign in to comment.