Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-task learning #7

Open
wants to merge 61 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
4e62012
feat(mlflow): artifact logging
okyksl Aug 2, 2021
9bf8cd6
feat(data): add maximum tokenizer length as an option
okyksl Aug 3, 2021
97730a6
feat(data): add inference option
okyksl Aug 3, 2021
1a54e3e
feat(modeling): use a dynamically provided threshold for groups
okyksl Aug 3, 2021
b251fa8
feat(mlflow): add mlflow prediction wrapper
okyksl Aug 3, 2021
47adf42
feat(mlflow): add artifact for inference params
okyksl Aug 3, 2021
cceab6a
fix(mlflow): use sigmoid on raw network scores
okyksl Aug 3, 2021
83f31ef
fix(mlflow): rename mlflow.py to infer.py
okyksl Aug 4, 2021
81c0c32
fix(data): prepare targets and groups
okyksl Aug 4, 2021
4bf767c
fix(data): provide targets in non-inference mode
okyksl Aug 4, 2021
29af38c
docs(mlflow): add additional considerations to MLFlow deployment
okyksl Aug 4, 2021
9f339ac
fix(data): add token types only if it is supported
okyksl Aug 4, 2021
dff1e5f
feat(mlflow): add deployment option
okyksl Aug 4, 2021
fa86267
chore(): delete TODO file
okyksl Aug 4, 2021
44fe9d8
refactor(mlflow): use 'predictions' instead of 'logits'
okyksl Aug 4, 2021
71a2c63
fix(mlflow): use artifact uri in logging model
okyksl Aug 4, 2021
11d23ec
fix(mlflow): use proper artifact and code paths
okyksl Aug 4, 2021
287a9be
refactor(mlflow): log label artifacts with the model
okyksl Aug 4, 2021
8187edd
feat(data): preprocessing for offline testing format
okyksl Aug 4, 2021
816d6dc
feat(sagemaker): add inference job using MLFlow 'log_model'
okyksl Aug 4, 2021
85123b0
feat(runner): reload best model in training
okyksl Aug 5, 2021
3372819
feat(mlflow): add tokenizer options as arguments to mlflow
okyksl Aug 5, 2021
f4d6c1e
fix(mlflow): access artifacts in the pyfunc context object
okyksl Aug 5, 2021
bf39d2e
refactor(data): update filtering info message
okyksl Aug 5, 2021
4f6a050
refactor(data): make 'tokenizer_max_len' optional
okyksl Aug 5, 2021
7044ca9
fix(data): use source length to return dataset length
okyksl Aug 5, 2021
fae90ac
fix(data): drop empty rows in preprocessing
okyksl Aug 5, 2021
832492a
fix(mlflow): cast predictions and probabilities to string
okyksl Aug 5, 2021
e1e7d1b
fix(sagemaker): use fixed 'source' field by renaming prior to job exe…
okyksl Aug 5, 2021
c215e38
chore(params): fix 'epoch' hyperparameter
okyksl Aug 5, 2021
7f58e71
fix(data): apply literal_eval for filtering
okyksl Aug 6, 2021
0490d45
feat(modeling): add focal loss-star
okyksl Aug 6, 2021
fc03a59
fix(mlflow): upload full code path
okyksl Aug 6, 2021
cf99e32
fix(mlflow): add deployment dependencies
okyksl Aug 6, 2021
a29e13d
feat(data): compute target statistics
okyksl Aug 9, 2021
76f2ba2
feat(data): support pickle, csv and excel dataframes
okyksl Aug 9, 2021
d5b35f2
feat(modeling): add inverse loss weighting
okyksl Aug 9, 2021
99a3b9b
feat(data): support `sectors` as a 1-head classification task
okyksl Aug 9, 2021
d7286f4
docs(modeling): add documentation to multi task transformer
okyksl Aug 9, 2021
44c9634
fix(modeling): return zero if coarse group is predicted as negative
okyksl Aug 9, 2021
eb36bdc
feat(modeling): add inverse loss weighting
okyksl Aug 9, 2021
51b0bb5
feat(mlflow): update model output format
okyksl Aug 10, 2021
0cbfd3c
refactor(data): modularize dataset into text and target datasets
okyksl Aug 10, 2021
23f0991
feat(data): support multi-task learning
okyksl Aug 10, 2021
dcbf81e
refactor(modeling): refactor and document modeling
okyksl Sep 9, 2021
24f3cd5
fix(eval): test on all samples
okyksl Dec 3, 2021
3e95ce9
fix(modeling): do not explicitly pass num_heads to models
okyksl Dec 3, 2021
6a62c6d
feat(sagemaker): use new data version
okyksl Dec 3, 2021
a1e1426
feat(data): use new virtual analysis framework
okyksl Dec 5, 2021
37c7c67
feat(dataset): have an option to exclude unwanted target labels
okyksl Dec 5, 2021
46354e2
fix(modeling): initialize module list correctly
okyksl Dec 5, 2021
ee3e89b
feat(runner): exclude "NOT_MAPPED" targets
okyksl Dec 5, 2021
0d8c1b1
fix(runner): handle multi-task targets correctly
okyksl Dec 5, 2021
8eb627b
refactor(data): better error messages when groups and group_names are…
okyksl Dec 5, 2021
0b6b1d7
fix(data): compute stats for multi-head dataset
okyksl Dec 15, 2021
90f4369
refactor(modeling): rename MultiTargetHead to MultiTargetTransformer
okyksl Dec 15, 2021
6e9c7f8
feat(data): add iterative option for labels
okyksl Dec 15, 2021
84a8fde
feat(eval): add multihead metrics
okyksl Dec 15, 2021
3f50ca7
fix(modeling): work in iterative settings
okyksl Dec 15, 2021
32a5d7f
feat(trainer): multi-task trainer
okyksl Dec 15, 2021
5c53dd7
fix(runner): fix evaluation and training logic for multi-task learning
okyksl Dec 16, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/source/modeling/tracking.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,8 @@ You can find an example of deployment in the repo.
- The key of the deployment is creating a class that inherits from `mlflow.pyfunc.PythonModel` with a `predict()` function.
- That class is pickled and logged as artifact of the training. At inference time it will be used to make predictions.

Additionally, consider the following for more configurable deployment:

- *Dynamic inference parameters*: Store inference hyperparameters (e.g., batch size or thresholds) as a separate artifact in MLFlow. Use `artifacts` options in `log_model` and then retrieve the file using the `context` object provided by the MLFlow in `load_context` or `predict`.
- *Multiple outputs*: `predict` function can return a Pandas DataFrame object. Employ it if the model has multiple targets or for providing logits scores for dynamic threshold adjusting on the client-side.
- *Serving labels*: Log a separate artifact in MLFlow for the client-side to map predictions back to human-readable labels.
48 changes: 48 additions & 0 deletions notebooks/models/oguz/transformer_v0.5_1D.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -430,6 +430,54 @@
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": null,
"source": [],
"outputs": [],
"metadata": {}
},
{
"cell_type": "markdown",
"source": [
"## Inference Preprocessing (Offline Testing Environment)"
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"import pandas as pd\n",
"\n",
"DATA_PATH = 'leads.csv'\n",
"data = pd.read_csv(DATA_PATH)\n",
"data.head()"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"from ast import literal_eval\n",
"\n",
"lengths = {}\n",
"\n",
"for field in ['extracted_text_as_paragraphs', 'extracted_text_as_sentences']:\n",
" arr = data[field].apply(literal_eval).tolist()\n",
" lengths[field] = [len(ds) for ds in arr]\n",
" \n",
" infer_df = pd.DataFrame.from_dict({\n",
" 'excerpt': [d for ds in arr for d in ds]\n",
" })\n",
" infer_df = infer_df[~(infer_df['excerpt'].str.len() == 0)]\n",
" infer_df.to_csv(f'infer_{field}.csv', header=True, index=True)"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": null,
Expand Down
8 changes: 0 additions & 8 deletions scripts/training/oguz/huggingface-multihead/TODO

This file was deleted.

Empty file.
104 changes: 57 additions & 47 deletions scripts/training/oguz/huggingface-multihead/constants.py
Original file line number Diff line number Diff line change
@@ -1,102 +1,112 @@
SECTORS = [
"Agriculture",
"Cross",
"Education",
"Food Security",
"Health",
"Livelihoods",
"Logistics",
"Nutrition",
"Protection",
"Shelter",
"WASH",
]

PILLARS_1D = [
"Context",
"Humanitarian Profile",
"Displacement",
"Shock/Event",
"Casualties",
"Displacement",
"Humanitarian Access",
"Information",
"Information And Communication",
"Covid-19",
]

SUBPILLARS_1D = [
[
"Context->Security & Stability",
"Context->Demography",
"Context->Economy",
"Context->Hazard & Threats",
"Context->Politics",
"Context->Overview",
"Context->Key Event",
"Context->Socio Cultural",
"Context->Legal & Policy",
"Context->Environment",
"Context->Stakeholders",
"Context->Response gap",
"Context->Security & Stability",
"Context->Socio Cultural",
"Context->Legal & Policy",
"Context->Politics",
"Context->Technological",
],
[
"Humanitarian Profile->Affected Groups",
"Humanitarian Profile->Casualties",
"Humanitarian Profile->Population Movement",
"Shock/Event->Type And Characteristics",
"Shock/Event->Underlying/Aggravating Factors",
"Shock/Event->Hazard & Threats",
],
["Casualties->Dead", "Casualties->Injured", "Casualties->Missing"],
[
"Displacement->Push/Pull Factors",
"Displacement->Type/Numbers",
"Displacement->Local Integration",
"Displacement->Type/Numbers/Movements",
"Displacement->Push Factors",
"Displacement->Pull Factors",
"Displacement->Intentions",
"Displacement->Displacement",
"Displacement->Local Integration",
],
["Casualties->Dead", "Casualties->Injured", "Casualties->Missing"],
[
"Humanitarian Access->Relief To Population",
"Humanitarian Access->Population To Relief",
"Humanitarian Access->Physical Constraints",
"Humanitarian Access->Humanitarian Access Gaps",
(
"Humanitarian Access->Number Of People Facing Humanitarian Access Constraints"
"/Humanitarian Access Gaps"
),
],
[
"Information->Information Gaps",
"Information->Channels & Means",
"Information->Information Challenges",
"Information And Communication->Information Challenges And Barriers",
"Information And Communication->Communication Means And Preferences",
"Information And Communication->Knowledge And Info Gaps (Pop)",
"Information And Communication->Knowledge And Info Gaps (Hum)",
],
[
"Covid-19->Cases",
"Covid-19->Deaths",
"Covid-19->Testing",
"Covid-19->Contact Tracing",
"Covid-19->Hospitalization & Care",
"Covid-19->Vaccination",
"Covid-19->Restriction Measures",
],
]

SECTORS = [
"Agriculture",
"Cross",
"Education",
"Food Security",
"Health",
"Livelihoods",
"Logistics",
"Nutrition",
"Protection",
"Shelter",
"WASH",
]

PILLARS_2D = [
"Humanitarian Conditions",
"Capacities & Response",
"Impact",
"Priority Interventions",
"People At Risk",
"At Risk",
"Priority Needs",
]

SUBPILLARS_2D = [
[
"Humanitarian Conditions->Coping Mechanisms",
"Humanitarian Conditions->Living Standards",
"Humanitarian Conditions->Number Of People In Need",
"Humanitarian Conditions->Physical And Mental Well Being",
"Humanitarian Conditions->Number Of People In Need",
],
[
"Capacities & Response->International Response",
"Capacities & Response->National Response",
"Capacities & Response->Number Of People Reached",
"Capacities & Response->Response Gaps",
"Capacities & Response->Local Response",
"Capacities & Response->Number Of People Reached/Response Gaps",
],
[
"Impact->Driver/Aggravating Factors",
"Impact->Impact On People",
"Impact->Impact On People Or Impact On Services",
"Impact->Impact On Services",
"Impact->Impact On Systems And Services",
"Impact->Impact On Systems, Services And Networks",
"Impact->Number Of People Affected",
],
[
"Priority Interventions->Expressed By Humanitarian Staff",
"Priority Interventions->Expressed By Population",
],
[
"People At Risk->Number Of People At Risk",
"People At Risk->Risk And Vulnerabilities",
"At Risk->Risk And Vulnerabilities",
"At Risk->Number Of People At Risk",
],
[
"Priority Needs->Expressed By Humanitarian Staff",
Expand Down
Loading