Multi-task learning #7

okyksl · 2022-02-21T08:28:12Z

Provides an easy-to-use and customizable tree-like multi-task learning model and related training, evaluation processes.

* Model artifact for inference or deployment * Eval artifact logging all the metrics * Label artifact logging labels in order of the predictions

* Forces use of padding = "max_length" * `add_special_tokens` and `return_token_type_ids` are set to True

* Use context to load inference hyperparameters

Notes: * Fix argparse batch_size arguments * Add sanity check for iterative models * Log labels and groups separately * Log inference params altogether inside a json file

* Fix MLFlow import related issues

Also fix `python_model` parameter of log_model

* Also read labels and groups

* Do not manually setting `add_special_tokens` and `return_token_type_ids`

…cution

Two types of weighting are applied together: * Across-class balancing: each sigmoidal unit loss is multiplied by inverse frequency * In-class balancing: positive loss is weighted by the in-class frequency w.r.t. negatives

Note the model returns pre-sigmoid logits rather than probabilites.

Two types of weighting are applied together: * Across-class balancing: each sigmoidal unit loss is multiplied by inverse frequency * In-class balancing: positive loss is weighted by the in-class frequency w.r.t. negatives

* self-explanatory outputs * two different output formats: flatten vs nested * adjust output style through infer config file

* text dataset only deals with data source and tokenization * target dataset only deals with encoding/decoding of targets and groups

Usage: * pass a list of targets to denote target columns * pass a list of lists as group_names to denote hierarchy groups for each target * pass a list for each group_name instance to denote targets belonging to that group_name i.e. a list of lists of lists * for one-task learning, either provide lists with length-one or provide data without list

… not in same length

Computes stats per-task basis.

okyksl added 30 commits December 3, 2021 11:41

feat(mlflow): artifact logging

4e62012

* Model artifact for inference or deployment * Eval artifact logging all the metrics * Label artifact logging labels in order of the predictions

feat(data): add maximum tokenizer length as an option

9bf8cd6

* Forces use of padding = "max_length" * `add_special_tokens` and `return_token_type_ids` are set to True

feat(data): add inference option

97730a6

feat(modeling): use a dynamically provided threshold for groups

1a54e3e

feat(mlflow): add mlflow prediction wrapper

b251fa8

* Use context to load inference hyperparameters

feat(mlflow): add artifact for inference params

47adf42

Notes: * Fix argparse batch_size arguments * Add sanity check for iterative models * Log labels and groups separately * Log inference params altogether inside a json file

fix(mlflow): use sigmoid on raw network scores

cceab6a

fix(mlflow): rename mlflow.py to infer.py

83f31ef

* Fix MLFlow import related issues

fix(data): prepare targets and groups

81c0c32

fix(data): provide targets in non-inference mode

4bf767c

docs(mlflow): add additional considerations to MLFlow deployment

29af38c

fix(data): add token types only if it is supported

9f339ac

feat(mlflow): add deployment option

dff1e5f

chore(): delete TODO file

fa86267

refactor(mlflow): use 'predictions' instead of 'logits'

44fe9d8

fix(mlflow): use artifact uri in logging model

71a2c63

Also fix `python_model` parameter of log_model

fix(mlflow): use proper artifact and code paths

11d23ec

refactor(mlflow): log label artifacts with the model

287a9be

feat(data): preprocessing for offline testing format

8187edd

feat(sagemaker): add inference job using MLFlow 'log_model'

816d6dc

feat(runner): reload best model in training

85123b0

feat(mlflow): add tokenizer options as arguments to mlflow

3372819

fix(mlflow): access artifacts in the pyfunc context object

f4d6c1e

* Also read labels and groups

refactor(data): update filtering info message

bf39d2e

refactor(data): make 'tokenizer_max_len' optional

4f6a050

* Do not manually setting `add_special_tokens` and `return_token_type_ids`

fix(data): use source length to return dataset length

7044ca9

fix(data): drop empty rows in preprocessing

fae90ac

fix(mlflow): cast predictions and probabilities to string

832492a

fix(sagemaker): use fixed 'source' field by renaming prior to job exe…

e1e7d1b

…cution

chore(params): fix 'epoch' hyperparameter

c215e38

okyksl added 30 commits December 3, 2021 11:41

feat(modeling): add focal loss-star

0490d45

fix(mlflow): upload full code path

fc03a59

fix(mlflow): add deployment dependencies

cf99e32

feat(data): compute target statistics

a29e13d

feat(data): support pickle, csv and excel dataframes

76f2ba2

feat(modeling): add inverse loss weighting

d5b35f2

Two types of weighting are applied together: * Across-class balancing: each sigmoidal unit loss is multiplied by inverse frequency * In-class balancing: positive loss is weighted by the in-class frequency w.r.t. negatives

feat(data): support sectors as a 1-head classification task

99a3b9b

docs(modeling): add documentation to multi task transformer

d7286f4

fix(modeling): return zero if coarse group is predicted as negative

44c9634

Note the model returns pre-sigmoid logits rather than probabilites.

feat(modeling): add inverse loss weighting

eb36bdc

Two types of weighting are applied together: * Across-class balancing: each sigmoidal unit loss is multiplied by inverse frequency * In-class balancing: positive loss is weighted by the in-class frequency w.r.t. negatives

feat(mlflow): update model output format

51b0bb5

* self-explanatory outputs * two different output formats: flatten vs nested * adjust output style through infer config file

refactor(data): modularize dataset into text and target datasets

0cbfd3c

* text dataset only deals with data source and tokenization * target dataset only deals with encoding/decoding of targets and groups

refactor(modeling): refactor and document modeling

dcbf81e

fix(eval): test on all samples

24f3cd5

fix(modeling): do not explicitly pass num_heads to models

3e95ce9

feat(sagemaker): use new data version

6a62c6d

feat(data): use new virtual analysis framework

a1e1426

feat(dataset): have an option to exclude unwanted target labels

37c7c67

fix(modeling): initialize module list correctly

46354e2

feat(runner): exclude "NOT_MAPPED" targets

ee3e89b

fix(runner): handle multi-task targets correctly

0d8c1b1

refactor(data): better error messages when groups and group_names are…

8eb627b

… not in same length

fix(data): compute stats for multi-head dataset

0b6b1d7

Computes stats per-task basis.

refactor(modeling): rename MultiTargetHead to MultiTargetTransformer

90f4369

feat(data): add iterative option for labels

6e9c7f8

feat(eval): add multihead metrics

84a8fde

fix(modeling): work in iterative settings

3f50ca7

feat(trainer): multi-task trainer

32a5d7f

fix(runner): fix evaluation and training logic for multi-task learning

5c53dd7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-task learning #7

Multi-task learning #7

okyksl commented Feb 21, 2022

Multi-task learning #7

Are you sure you want to change the base?

Multi-task learning #7

Conversation

okyksl commented Feb 21, 2022