Skip to content

Commit

Permalink
Merge pull request #226 from jrzaurin/multiple_tab_components
Browse files Browse the repository at this point in the history
Multiple tab components
  • Loading branch information
jrzaurin authored Aug 26, 2024
2 parents deb4f2e + 16922ce commit 220eb3f
Show file tree
Hide file tree
Showing 274 changed files with 15,113 additions and 17,716 deletions.
102 changes: 100 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -587,13 +587,111 @@ trainer.fit(
)
```

**7. Tabular with a multi-target loss**
**7. A two-tower model**

This is a popular model in the context of recommendation systems. Let's say we
have a tabular dataset formed my triples (user features, item features,
target). We can create a two-tower model where the user and item features are
passed through two separate models and then "fused" via a dot product.

<p align="center">
<img width="350" src="docs/figures/arch_7.png">
</p>


```python
import numpy as np
import pandas as pd

from pytorch_widedeep import Trainer
from pytorch_widedeep.preprocessing import TabPreprocessor
from pytorch_widedeep.models import TabMlp, WideDeep, ModelFuser

# Let's create the interaction dataset
# user_features dataframe
np.random.seed(42)
user_ids = np.arange(1, 101)
ages = np.random.randint(18, 60, size=100)
genders = np.random.choice(["male", "female"], size=100)
locations = np.random.choice(["city_a", "city_b", "city_c", "city_d"], size=100)
user_features = pd.DataFrame(
{"id": user_ids, "age": ages, "gender": genders, "location": locations}
)

# item_features dataframe
item_ids = np.arange(1, 101)
prices = np.random.uniform(10, 500, size=100).round(2)
colors = np.random.choice(["red", "blue", "green", "black"], size=100)
categories = np.random.choice(["electronics", "clothing", "home", "toys"], size=100)

item_features = pd.DataFrame(
{"id": item_ids, "price": prices, "color": colors, "category": categories}
)

# Interactions dataframe
interaction_user_ids = np.random.choice(user_ids, size=1000)
interaction_item_ids = np.random.choice(item_ids, size=1000)
purchased = np.random.choice([0, 1], size=1000, p=[0.7, 0.3])
interactions = pd.DataFrame(
{
"user_id": interaction_user_ids,
"item_id": interaction_item_ids,
"purchased": purchased,
}
)
user_item_purchased = interactions.merge(
user_features, left_on="user_id", right_on="id"
).merge(item_features, left_on="item_id", right_on="id")

# Users
tab_preprocessor_user = TabPreprocessor(
cat_embed_cols=["gender", "location"],
continuous_cols=["age"],
)
X_user = tab_preprocessor_user.fit_transform(user_item_purchased)
tab_mlp_user = TabMlp(
column_idx=tab_preprocessor_user.column_idx,
cat_embed_input=tab_preprocessor_user.cat_embed_input,
continuous_cols=["age"],
mlp_hidden_dims=[16, 8],
mlp_dropout=[0.2, 0.2],
)

# Items
tab_preprocessor_item = TabPreprocessor(
cat_embed_cols=["color", "category"],
continuous_cols=["price"],
)
X_item = tab_preprocessor_item.fit_transform(user_item_purchased)
tab_mlp_item = TabMlp(
column_idx=tab_preprocessor_item.column_idx,
cat_embed_input=tab_preprocessor_item.cat_embed_input,
continuous_cols=["price"],
mlp_hidden_dims=[16, 8],
mlp_dropout=[0.2, 0.2],
)

two_tower_model = ModelFuser([tab_mlp_user, tab_mlp_item], fusion_method="dot")

model = WideDeep(deeptabular=two_tower_model)

trainer = Trainer(model, objective="binary")

trainer.fit(
X_tab=[X_user, X_item],
target=interactions.purchased.values,
n_epochs=1,
batch_size=32,
)
```

**8. Tabular with a multi-target loss**

This one is "a bonus" to illustrate the use of multi-target losses, more than
actually a different architecture.

<p align="center">
<img width="200" src="docs/figures/arch_7.png">
<img width="200" src="docs/figures/arch_8.png">
</p>


Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.6.2
1.6.3
1 change: 0 additions & 1 deletion docs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,4 @@ them to address different problems
* `HyperParameter Tuning With RayTune <https://github.com/jrzaurin/pytorch-widedeep/blob/master/examples/notebooks/10_hyperParameter_tuning_w_raytune_n_wnb.ipynb>`__
* `Model Uncertainty Prediction <https://github.com/jrzaurin/pytorch-widedeep/blob/master/examples/notebooks/13_Model_Uncertainty_prediction.ipynb>`__
* `Bayesian Models <https://github.com/jrzaurin/pytorch-widedeep/blob/master/examples/notebooks/14_bayesian_models.ipynb>`__
* `Deep Imbalanced Regression <https://github.com/jrzaurin/pytorch-widedeep/blob/master/examples/notebooks/15_DIR-LDS_and_FDS.ipynb>`__

Binary file modified docs/figures/arch_7.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/figures/arch_8.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
847 changes: 0 additions & 847 deletions examples/notebooks/15_DIR-LDS_and_FDS.ipynb

This file was deleted.

53 changes: 0 additions & 53 deletions examples/scripts/california_housing_fds_lds.py

This file was deleted.

85 changes: 84 additions & 1 deletion examples/scripts/readme_snippets.py
Original file line number Diff line number Diff line change
Expand Up @@ -407,7 +407,90 @@ def output_dim(self):
)


# 7. Simply Tabular with a multi-target loss
# 7. A Two tower model
np.random.seed(42)

# user_features dataframe
user_ids = np.arange(1, 101)
ages = np.random.randint(18, 60, size=100)
genders = np.random.choice(["male", "female"], size=100)
locations = np.random.choice(["city_a", "city_b", "city_c", "city_d"], size=100)
user_features = pd.DataFrame(
{"id": user_ids, "age": ages, "gender": genders, "location": locations}
)

# item_features dataframe
item_ids = np.arange(1, 101)
prices = np.random.uniform(10, 500, size=100).round(2)
colors = np.random.choice(["red", "blue", "green", "black"], size=100)
categories = np.random.choice(["electronics", "clothing", "home", "toys"], size=100)

item_features = pd.DataFrame(
{"id": item_ids, "price": prices, "color": colors, "category": categories}
)

# Interactions dataframe
interaction_user_ids = np.random.choice(user_ids, size=1000)
interaction_item_ids = np.random.choice(item_ids, size=1000)
purchased = np.random.choice([0, 1], size=1000, p=[0.7, 0.3])
interactions = pd.DataFrame(
{
"user_id": interaction_user_ids,
"item_id": interaction_item_ids,
"purchased": purchased,
}
)
user_item_purchased = interactions.merge(
user_features, left_on="user_id", right_on="id"
).merge(item_features, left_on="item_id", right_on="id")


# Users
tab_preprocessor_user = TabPreprocessor(
cat_embed_cols=["gender", "location"],
continuous_cols=["age"],
)
X_user = tab_preprocessor_user.fit_transform(user_item_purchased)
tab_mlp_user = TabMlp(
column_idx=tab_preprocessor_user.column_idx,
cat_embed_input=tab_preprocessor_user.cat_embed_input,
continuous_cols=["age"],
mlp_hidden_dims=[16, 8],
mlp_dropout=[0.2, 0.2],
)

# Items
tab_preprocessor_item = TabPreprocessor(
cat_embed_cols=["color", "category"],
continuous_cols=["price"],
)
X_item = tab_preprocessor_item.fit_transform(user_item_purchased)
tab_mlp_item = TabMlp(
column_idx=tab_preprocessor_item.column_idx,
cat_embed_input=tab_preprocessor_item.cat_embed_input,
continuous_cols=["price"],
mlp_hidden_dims=[16, 8],
mlp_dropout=[0.2, 0.2],
)

two_tower_model = ModelFuser([tab_mlp_user, tab_mlp_item], fusion_method="dot")

model = WideDeep(deeptabular=two_tower_model)

trainer = Trainer(
model,
objective="binary",
)

trainer.fit(
X_tab=[X_user, X_item],
target=interactions.purchased.values,
n_epochs=1,
batch_size=32,
)


# 8. Simply Tabular with a multi-target loss

# let's add a second target to the dataframe
df["target2"] = [random.choice([0, 1]) for _ in range(100)]
Expand Down
17 changes: 8 additions & 9 deletions mkdocs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,15 +52,14 @@ nav:
- 12_ZILNLoss_origkeras_vs_pytorch_widedeep: examples/12_ZILNLoss_origkeras_vs_pytorch_widedeep.ipynb
- 13_model_uncertainty_prediction: examples/13_model_uncertainty_prediction.ipynb
- 14_bayesian_models: examples/14_bayesian_models.ipynb
- 15_DIR-LDS_and_FDS: examples/15_DIR-LDS_and_FDS.ipynb
- 16_Self-Supervised Pre-Training pt 1: examples/16_Self_Supervised_Pretraning_pt1.ipynb
- 16_Self-Supervised Pre-Training pt 2: examples/16_Self_Supervised_Pretraning_pt2.ipynb
- 17_Usign-a-custom-hugging-face-model: examples/17_Usign_a_custom_hugging_face_model.ipynb
- 18_feature_importance_via_attention_weights: examples/18_feature_importance_via_attention_weights.ipynb
- 19_wide_and_deep_for_recsys_pt1: examples/19_wide_and_deep_for_recsys_pt1.ipynb
- 19_wide_and_deep_for_recsys_pt2: examples/19_wide_and_deep_for_recsys_pt2.ipynb
- 20_load_from_folder_functionality: examples/20_load_from_folder_functionality.ipynb
- 21-Using-huggingface-within-widedeep: examples/21_Using_huggingface_within_widedeep.ipynb
- 15_Self-Supervised Pre-Training pt 1: examples/16_Self_Supervised_Pretraning_pt1.ipynb
- 15_Self-Supervised Pre-Training pt 2: examples/16_Self_Supervised_Pretraning_pt2.ipynb
- 16_Usign-a-custom-hugging-face-model: examples/17_Usign_a_custom_hugging_face_model.ipynb
- 17_feature_importance_via_attention_weights: examples/18_feature_importance_via_attention_weights.ipynb
- 18_wide_and_deep_for_recsys_pt1: examples/19_wide_and_deep_for_recsys_pt1.ipynb
- 18_wide_and_deep_for_recsys_pt2: examples/19_wide_and_deep_for_recsys_pt2.ipynb
- 19_load_from_folder_functionality: examples/20_load_from_folder_functionality.ipynb
- 20-Using-huggingface-within-widedeep: examples/21_Using_huggingface_within_widedeep.ipynb
- Contributing: contributing.md

theme:
Expand Down
Loading

0 comments on commit 220eb3f

Please sign in to comment.