update docs

isi-nlp · Oct 21, 2021 · 4adfaf4 · 4adfaf4
1 parent fb73aec
commit 4adfaf4
Show file tree

Hide file tree

Showing 8 changed files with 1,087 additions and 535 deletions.
diff --git a/docs/00-intro.adoc b/docs/00-intro.adoc
@@ -1,9 +1,10 @@
 == Overview
 
-
 https://github.com/isi-nlp/rtg[Reader-Translator-Generator (RTG)^] is a Neural Machine Translation toolkit based on pytorch.
 
-link:versions.html[_See all versions_^]
+* link:versions.html[_See all versions_^]
+* Demo: 500-Eng multilingual NMT: http://rtg.isi.edu/many-eng/
+
 
 === Features
 * Reproducible experiments: one `conf.yml`  that has everything -- data paths, params, and
@@ -17,21 +18,21 @@ link:versions.html[_See all versions_^]
 *** Lot of varieties of transformer: width varying, skip transformer etc configurable from YAML files
 *** https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf[RNN based Encoder-Decoder^] with https://nlp.stanford.edu/pubs/emnlp15_attn.pdf[Attention^]. (No longer using it, but it's available for experimentation)
 * Language Modeling: RNN, Transformer
-* And more ..
+* And more ...
 ** Easy and interpretable code (for those who read code as much as papers)
 ** Object Orientated Design. (Not too many levels of functions and function factories like Tensor2Tensor)
 ** Experiments and reproducibility are main focus. To control an experiment you edit an YAML file that is inside the experiment directory.
 ** Where ever possible, prefer https://www.wikiwand.com/en/Convention_over_configuration[convention-over-configuration^]. Have a look at this experiment directory structure (below).
 
 [#colab-example]
-=== Quick Start using Google Colab
+=== Google Colab Example
 
 Use this Google Colab notebook for learning __how to train your NMT model with RTG__: https://colab.research.google.com/drive/198KbkUcCGXJXnWiM7IyEiO1Mq2hdVq8T?usp=sharing
 
 
 === Setup
 
-`rtg` has been published to PyPi at https://pypi.org/project/rtg/
+image:https://badge.fury.io/py/rtg.svg["PyPI version", link="https://badge.fury.io/py/rtg"]
 
 ----
 pip install rtg

diff --git a/docs/10-conf.yml.adoc b/docs/10-conf.yml.adoc
@@ -1,4 +1,4 @@
-[#conf.yml]
+[#conf]
 == RTG *`conf.yml`* File
 
 The key component of RTG toolkit is a `conf.yml`. As the name suggest - it is a YAML file containing configuration
@@ -18,7 +18,7 @@ such as BPE/char/words, and vocabulary size.
 ** Suite - a set of source and reference file pairs, for computing BLEU scores
 
 [#conf-minimal]
-=== Minimal Yet Complete Config File:
+=== Config Example:
 
 .conf.yml
 [source,yaml]
@@ -92,6 +92,132 @@ updated_at: '2019-03-09T21:15:33.707183'  # automatically updated by system
 seed: 12345  # fix the manual seed of pytorch + cuda + numpy + python_stdlib RNGs. Remove/comment this to disable
 ----
 
+[#config-opts]
+=== Config options
+
+.Summary of component choices
+[%autowidth]
+|===
+|Component | Choices
+
+|model
+|tfmnmt, rnnmt, rnnlm, tfmlm, skptfmnmt, wvtfmnmt, wvskptfmnmt, tfmextembmt, robertamt, mtfmnmt, hybridmt, CBOW, tfmcls
+
+|optimizer
+| adam, sgd, adagrad, adam_w, adadelta, sparse_adam
+
+|schedule
+| noam, inverse_sqrt
+
+|criterion
+|sparse_cross_entropy, kl_divergence, focal_loss, binary_cross_entropy, smooth_kld, triplet_loss, smooth_kld_and_triplet_loss, dice_loss, squared_error
+
+|===
+
+
+[#config-schedule]
+==== `schedule` options
+
+. `noam` with args:
+  * warmup
+  * constant
+  * model_dim
+
+. `inverse_sqrt` with args:
+  * warmup
+  * peak_lr
+
+[#config-criterion]
+==== `criterion` options
+
+* `smooth_kld`     (recommended; used since the first version of transformer)
+** `label_smoothing`:  float : [0, 1] : optional: default=0.1
+
+.Args to `smooth_kld`
+|===
+|Name |Type| Range/Choices| Required |Default
+|`label_smoothing`
+|`float`
+| `[0.0, 1.0)`
+| Optional
+|0.1
+|===
+
+* `sparse_cross_entropy`
+
+.Args to `sparse_cross_entropy`
+|===
+|Name |Type| Range/Choices| Required |Default | Comment
+
+|`weight`
+|`str`
+| `{inv_freq, inv_sqrt_freq, inv_log_freq}`
+| Optional
+| None => disable weighing
+|
+
+|`weight_calm_time`
+|`int`
+| [0, )
+| Optional
+| 0 => disable calming;
+| Applicable when `weight` is enabled
+
+|===
+
+
+* `kl_divergence`   (re-implementation of `smooth_kld` with some extra features)
+
+.Args to `kl_divergence`
+|===
+|Name |Type| Range/Choices| Required |Default
+
+|`label_smoothing`
+|`float`
+| `[0.0, 1.0)`
+| Optional
+| 0.0 => disable label smoothing
+
+|`weight`
+|`str`
+| `{inv_freq, inv_sqrt_freq, inv_log_freq}`
+| Optional
+| None => disable weighing
+
+|`weight_calm_time`
+|`int`
+| [0, )
+| Optional
+| 0 => disable calming => weights applicable from step 0
+
+|===
+
+* `focal_loss`
+.Args to `focal_loss`
+|===
+|Name |Type| Range/Choices| Required |Default
+
+|`gamma`
+|`float`
+| `[0.0, )`
+| Optional
+| 0.0 => disable => cross entropy
+
+|`weight_calm_time`
+|`int`
+| [0, )
+| Optional
+| 0 => disable calming => weights applicable from step 0
+
+|===
+
+* _Experimental loss functions:_
+** `dice_loss`
+ ** `binary_cross_entropy`
+ ** `triplet_loss`
+ ** `squared_error`
+
+
 [#conf-early-stop]
 === Early stop
 Add the below piece of config to `trainer` to enable early stop on convergence.
@@ -243,7 +369,7 @@ prep:
 ----
 
 [#conf-vocab]
-== Vocabulary Preprocessing using Sentencepiece or NLCodec
+== Vocabulary Preprocessing
 
 link:https://github.com/google/sentencepiece[Google's sentencepiece] is an awesome lib for
 preprocessing the text datasets.

diff --git a/docs/80-migration.adoc → docs/15-migration.adoc b/docs/80-migration.adoc → docs/15-migration.adoc
@@ -1,11 +1,14 @@
+[#migrate]
+== Migration
+
 [#migrate-to-0_6]
-== Migration from v0.5.0 or earlier to v0.6.0
+=== v0.5.0 or earlier to v0.6.0
 
 The optimizer block got a big update in v0.6.0, as a result it is not backward compatible.
 
 .Old config, prior to v0.6.0:
 
-[yaml]
+[source,yaml]
 ----
 optim:
   args:
@@ -24,7 +27,7 @@ optim:
   name: ADAM
 ----
 .New config in v0.6.0
-[yaml]
+[source,yaml]
 ----
 optimizer:
   name: adam
@@ -47,22 +50,3 @@ criterion:
   args:
     label_smoothing: 0.1
 ----
-
-
-=== Learning rate schedule
-
-. `noam` with args:
-  * warmup
-  * constant
-  * model_dim
-
-. `inverse_sqrt` with args:
-  * warmup
-  * peark_lr
-
-=== Criterion
-. `cross_entropy`
-   * label smoothing not implemented yet, FIXME: support label smoothing
-. `smooth_kld`
-    * `label_smoothing`
-. Other (experimental): `binary_cross_entropy`, `triplet_loss`
diff --git a/docs/45-scaling.adoc b/docs/45-scaling.adoc
@@ -1,5 +1,5 @@
 [#scaling-big]
-== Scaling to Big Datasets Using PySpark
+== Scaling Big Using PySpark
 
 When dealing with big datasets, the traditional tools such as multiprocessing and SQLite3 simply aren't enogh.
 In such scenario, https://spark.apache.org/[PySpark] is a useful backend to use.

diff --git a/docs/index.adoc b/docs/index.adoc
@@ -12,11 +12,14 @@ USC Information Sciences Institute  Natural Language Group
 //injects google analytics to <head>
 :docinfo2:
 :hide-uri-scheme:
+:source-highlighter: rouge
 
 include::00-intro.adoc[]
 
 include::10-conf.yml.adoc[]
 
+include::15-migration.adoc[]
+
 include::20-clitools.adoc[]
 
 include::30-environ.adoc[]
@@ -25,8 +28,9 @@ include::40-train-pro.adoc[]
 
 include::45-scaling.adoc[]
 
+
 include::50-serve.adoc[]
 
-include::60-develop.adoc[]
 
-include::80-migration.adoc[]
+
+include::60-develop.adoc[]