diff --git a/docs/00-intro.adoc b/docs/00-intro.adoc index 1edd07d..ad12fbd 100644 --- a/docs/00-intro.adoc +++ b/docs/00-intro.adoc @@ -1,9 +1,10 @@ == Overview - https://github.com/isi-nlp/rtg[Reader-Translator-Generator (RTG)^] is a Neural Machine Translation toolkit based on pytorch. -link:versions.html[_See all versions_^] +* link:versions.html[_See all versions_^] +* Demo: 500-Eng multilingual NMT: http://rtg.isi.edu/many-eng/ + === Features * Reproducible experiments: one `conf.yml` that has everything -- data paths, params, and @@ -17,21 +18,21 @@ link:versions.html[_See all versions_^] *** Lot of varieties of transformer: width varying, skip transformer etc configurable from YAML files *** https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf[RNN based Encoder-Decoder^] with https://nlp.stanford.edu/pubs/emnlp15_attn.pdf[Attention^]. (No longer using it, but it's available for experimentation) * Language Modeling: RNN, Transformer -* And more .. +* And more ... ** Easy and interpretable code (for those who read code as much as papers) ** Object Orientated Design. (Not too many levels of functions and function factories like Tensor2Tensor) ** Experiments and reproducibility are main focus. To control an experiment you edit an YAML file that is inside the experiment directory. ** Where ever possible, prefer https://www.wikiwand.com/en/Convention_over_configuration[convention-over-configuration^]. Have a look at this experiment directory structure (below). [#colab-example] -=== Quick Start using Google Colab +=== Google Colab Example Use this Google Colab notebook for learning __how to train your NMT model with RTG__: https://colab.research.google.com/drive/198KbkUcCGXJXnWiM7IyEiO1Mq2hdVq8T?usp=sharing === Setup -`rtg` has been published to PyPi at https://pypi.org/project/rtg/ +image:https://badge.fury.io/py/rtg.svg["PyPI version", link="https://badge.fury.io/py/rtg"] ---- pip install rtg diff --git a/docs/10-conf.yml.adoc b/docs/10-conf.yml.adoc index b07ce28..30932dc 100644 --- a/docs/10-conf.yml.adoc +++ b/docs/10-conf.yml.adoc @@ -1,4 +1,4 @@ -[#conf.yml] +[#conf] == RTG *`conf.yml`* File The key component of RTG toolkit is a `conf.yml`. As the name suggest - it is a YAML file containing configuration @@ -18,7 +18,7 @@ such as BPE/char/words, and vocabulary size. ** Suite - a set of source and reference file pairs, for computing BLEU scores [#conf-minimal] -=== Minimal Yet Complete Config File: +=== Config Example: .conf.yml [source,yaml] @@ -92,6 +92,132 @@ updated_at: '2019-03-09T21:15:33.707183' # automatically updated by system seed: 12345 # fix the manual seed of pytorch + cuda + numpy + python_stdlib RNGs. Remove/comment this to disable ---- +[#config-opts] +=== Config options + +.Summary of component choices +[%autowidth] +|=== +|Component | Choices + +|model +|tfmnmt, rnnmt, rnnlm, tfmlm, skptfmnmt, wvtfmnmt, wvskptfmnmt, tfmextembmt, robertamt, mtfmnmt, hybridmt, CBOW, tfmcls + +|optimizer +| adam, sgd, adagrad, adam_w, adadelta, sparse_adam + +|schedule +| noam, inverse_sqrt + +|criterion +|sparse_cross_entropy, kl_divergence, focal_loss, binary_cross_entropy, smooth_kld, triplet_loss, smooth_kld_and_triplet_loss, dice_loss, squared_error + +|=== + + +[#config-schedule] +==== `schedule` options + +. `noam` with args: + * warmup + * constant + * model_dim + +. `inverse_sqrt` with args: + * warmup + * peak_lr + +[#config-criterion] +==== `criterion` options + +* `smooth_kld` (recommended; used since the first version of transformer) +** `label_smoothing`: float : [0, 1] : optional: default=0.1 + +.Args to `smooth_kld` +|=== +|Name |Type| Range/Choices| Required |Default +|`label_smoothing` +|`float` +| `[0.0, 1.0)` +| Optional +|0.1 +|=== + +* `sparse_cross_entropy` + +.Args to `sparse_cross_entropy` +|=== +|Name |Type| Range/Choices| Required |Default | Comment + +|`weight` +|`str` +| `{inv_freq, inv_sqrt_freq, inv_log_freq}` +| Optional +| None => disable weighing +| + +|`weight_calm_time` +|`int` +| [0, ) +| Optional +| 0 => disable calming; +| Applicable when `weight` is enabled + +|=== + + +* `kl_divergence` (re-implementation of `smooth_kld` with some extra features) + +.Args to `kl_divergence` +|=== +|Name |Type| Range/Choices| Required |Default + +|`label_smoothing` +|`float` +| `[0.0, 1.0)` +| Optional +| 0.0 => disable label smoothing + +|`weight` +|`str` +| `{inv_freq, inv_sqrt_freq, inv_log_freq}` +| Optional +| None => disable weighing + +|`weight_calm_time` +|`int` +| [0, ) +| Optional +| 0 => disable calming => weights applicable from step 0 + +|=== + +* `focal_loss` +.Args to `focal_loss` +|=== +|Name |Type| Range/Choices| Required |Default + +|`gamma` +|`float` +| `[0.0, )` +| Optional +| 0.0 => disable => cross entropy + +|`weight_calm_time` +|`int` +| [0, ) +| Optional +| 0 => disable calming => weights applicable from step 0 + +|=== + +* _Experimental loss functions:_ +** `dice_loss` + ** `binary_cross_entropy` + ** `triplet_loss` + ** `squared_error` + + [#conf-early-stop] === Early stop Add the below piece of config to `trainer` to enable early stop on convergence. @@ -243,7 +369,7 @@ prep: ---- [#conf-vocab] -== Vocabulary Preprocessing using Sentencepiece or NLCodec +== Vocabulary Preprocessing link:https://github.com/google/sentencepiece[Google's sentencepiece] is an awesome lib for preprocessing the text datasets. diff --git a/docs/80-migration.adoc b/docs/15-migration.adoc similarity index 61% rename from docs/80-migration.adoc rename to docs/15-migration.adoc index 440679b..a5ebc03 100644 --- a/docs/80-migration.adoc +++ b/docs/15-migration.adoc @@ -1,11 +1,14 @@ +[#migrate] +== Migration + [#migrate-to-0_6] -== Migration from v0.5.0 or earlier to v0.6.0 +=== v0.5.0 or earlier to v0.6.0 The optimizer block got a big update in v0.6.0, as a result it is not backward compatible. .Old config, prior to v0.6.0: -[yaml] +[source,yaml] ---- optim: args: @@ -24,7 +27,7 @@ optim: name: ADAM ---- .New config in v0.6.0 -[yaml] +[source,yaml] ---- optimizer: name: adam @@ -47,22 +50,3 @@ criterion: args: label_smoothing: 0.1 ---- - - -=== Learning rate schedule - -. `noam` with args: - * warmup - * constant - * model_dim - -. `inverse_sqrt` with args: - * warmup - * peark_lr - -=== Criterion -. `cross_entropy` - * label smoothing not implemented yet, FIXME: support label smoothing -. `smooth_kld` - * `label_smoothing` -. Other (experimental): `binary_cross_entropy`, `triplet_loss` \ No newline at end of file diff --git a/docs/45-scaling.adoc b/docs/45-scaling.adoc index 0f80a48..d003f29 100644 --- a/docs/45-scaling.adoc +++ b/docs/45-scaling.adoc @@ -1,5 +1,5 @@ [#scaling-big] -== Scaling to Big Datasets Using PySpark +== Scaling Big Using PySpark When dealing with big datasets, the traditional tools such as multiprocessing and SQLite3 simply aren't enogh. In such scenario, https://spark.apache.org/[PySpark] is a useful backend to use. diff --git a/docs/index.adoc b/docs/index.adoc index 0437e1f..138871f 100644 --- a/docs/index.adoc +++ b/docs/index.adoc @@ -12,11 +12,14 @@ USC Information Sciences Institute Natural Language Group //injects google analytics to :docinfo2: :hide-uri-scheme: +:source-highlighter: rouge include::00-intro.adoc[] include::10-conf.yml.adoc[] +include::15-migration.adoc[] + include::20-clitools.adoc[] include::30-environ.adoc[] @@ -25,8 +28,9 @@ include::40-train-pro.adoc[] include::45-scaling.adoc[] + include::50-serve.adoc[] -include::60-develop.adoc[] -include::80-migration.adoc[] + +include::60-develop.adoc[] diff --git a/docs/v0.6.0/index.html b/docs/v0.6.0/index.html index 1913696..c2eaed8 100644 --- a/docs/v0.6.0/index.html +++ b/docs/v0.6.0/index.html @@ -437,6 +437,221 @@ #footer-text{color:rgba(0,0,0,.6);font-size:.9em}} @media amzn-kf8{#header,#content,#footnotes,#footer{padding:0}} +