Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic ranking example. #7

Merged
merged 1 commit into from
Oct 30, 2024
Merged

Conversation

hertschuh
Copy link
Collaborator

Example was run with all 3 backends (tf, jax and torch).

@hertschuh hertschuh requested a review from fchollet October 29, 2024 21:56
Copy link
Contributor

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

examples/basic_ranking.py Outdated Show resolved Hide resolved

def __init__(
self,
user_embeddings_count,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_users, num_items

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, although I'm keeping the term candidate instead of item everywhere for consistency.

return (
# Inputs are user ids and movie ids
{
"user_id": tf.strings.to_number(x["user_id"], out_type=tf.int32),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these IDs guaranteed to be sequential? If they're just random int-like, we would want to reindex them from 0 to num_users

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cheated because I know the dataset and they are sequential.

The original example was more general because it wasn't making any assumption on the IDs, not even that they were numbers, because it's using a StringLookup layer as part of the model. StringLookup is not multi-backend so we can't do that as part of the model, it would have to be part of preprocessing. But then it gets more complicated to demonstrate inference (in the retrieval example), because the reverse lookup is needed, so we would need:

  • model.predict
  • convert to Tensorflow
  • reverse lookup
  • convert to Python
  • lookup movie title

We have 3 options:

  • make it fully generic and use StringLookup outside of the model
  • assume IDs are ints and use an int to int mapping
  • leave it as is (assume sequential IDs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave it as is (assume sequential IDs)

That's ok if you clearly document this gotcha

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some detailed language about this shortcut.

Example was run with all 3 backends (tf, jax and torch).
@fchollet fchollet merged commit e4ba45d into keras-team:main Oct 30, 2024
5 checks passed
@hertschuh hertschuh deleted the basic_ranking branch October 30, 2024 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants