You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently looking into rephrasing a classification problem into a learning-to-rank problem. I have found your repo and it looks quite promising, you have definitely put a lot of effort into this.
My main question is about how I can use parts of this repo, import them and use them in my own model which operates on 1D vectors of inputs and outputs a scalar prediction per input example. In total It produces a 1D Tensor of roughly 10k predictions between 0 and 1 and I have a 1D tensor of labels between 0 and 1 of identical length to compare them with.
For the losses, taking lambdaLoss as example:
:param y_pred: predictions from the model, shape [batch_size, slate_length]
:param y_true: ground truth labels, shape [batch_size, slate_length]
How can I apply that to my output/labels? Do I have to sample n negatives for each positive to obtain m slates of length n+1 before I can feed it into the loss functions? In that case the y_pred and y_true would have the shape [m, n+1]. Is there a way or (preferably list-wise) loss function where I can directly drop my 1D output/labels vectors into (just like pytorchs BCE or MSE losses)?
For the models:
How can I make the architecture of my model more LTR-sensitive? Assume I have a network of fully connected layers which passes its output through a sigmoid layer to predict class probabilites. What would have to change to optimize the architecture for a Learning-To-Rank task? Which of the Ideas and implementations of this repo can I leverage to accomplish this?
Thanks in advance for your time and effort. I would be more than happy to elaborate if some of my questions are confusion or dont make sense.
Cheers,
Florin
The text was updated successfully, but these errors were encountered:
Thank you for your kind words and interest in our work.
Listwise loss functions operate on lists (slates / groups) of elements. I don't know the specific of your classification problem that you'd like to cast as a LTR problem but you need to have a notion of a list/group of elements where correctly ranking these elements would correspond to solving your underlying problem. As you said, sampling negatives might be a viable way to create such lists. Perhaps if you shared more specifics on your problem I could be of more help here.
As for directly plugging in your 1d outputs and labels into a listwise loss, I'm afraid this won't work as by definition listwise loss assumes an extra list dimension corresponding to the maximum length of a list in batch. BCELoss works with inputs and outputs of shape (batch size, ), listwise losses need an extra dimension to account for the fact that an individual object on which you're making a prediction is a list of items, not a single item.
As for the the model architecture - it has been a standard practice to use fully-connected networks as scoring functions in LTR, so that each item is scored separately from the others in the same list but their possible interactions are accounted for at a loss level (be it listwise or pairwise). Recently there has been a number of works using Transformer-like architectures as scoring functions, so that item interactions are taken into account both during scoring and loss computation. You can see a paper "Context-Aware Learning to Rank with Self Attention" which I'm a first author of for more details on how to use a transformer-style scoring function. It is also supported in this repo.
I hope this helps and feel free to ask more questions should anything remain unclear!
Hi!
I am currently looking into rephrasing a classification problem into a learning-to-rank problem. I have found your repo and it looks quite promising, you have definitely put a lot of effort into this.
My main question is about how I can use parts of this repo, import them and use them in my own model which operates on 1D vectors of inputs and outputs a scalar prediction per input example. In total It produces a 1D Tensor of roughly 10k predictions between 0 and 1 and I have a 1D tensor of labels between 0 and 1 of identical length to compare them with.
For the losses, taking lambdaLoss as example:
How can I apply that to my output/labels? Do I have to sample n negatives for each positive to obtain m slates of length n+1 before I can feed it into the loss functions? In that case the y_pred and y_true would have the shape
[m, n+1]
. Is there a way or (preferably list-wise) loss function where I can directly drop my 1D output/labels vectors into (just like pytorchs BCE or MSE losses)?For the models:
How can I make the architecture of my model more LTR-sensitive? Assume I have a network of fully connected layers which passes its output through a sigmoid layer to predict class probabilites. What would have to change to optimize the architecture for a Learning-To-Rank task? Which of the Ideas and implementations of this repo can I leverage to accomplish this?
Thanks in advance for your time and effort. I would be more than happy to elaborate if some of my questions are confusion or dont make sense.
Cheers,
Florin
The text was updated successfully, but these errors were encountered: