-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MaskedLARK: Additional Types of Data Labels #29
Comments
Right, the proposal is centered entirely around last click attribution at the moment. From the masking / training side of things there's really no issue here (if I understand). On the label side, there are labels (e.g.) {0, 1} and label [0, 30] -- the first copy has two values that need to be sent, the second has the quantized number to be sent (e.g., discretize into 5 buckets). You'll want to optimize with BCE + lambda MSE or something, and we'll need to generate fakes for both, but those are just details. Also from an implementation standpoint, I don't think it would be hard for the browser, either. I think most of the work is designing the API here. How can we tell the browser to track this -- perhaps we should expand |
I think there is more to this than the label space for the delay being e.g. [0, 30]: Only the browser could know the delay => somehow the browser would need to assign this label (rather than just take the label provided by the conversion pixel) |
Right -- it's more about where it makes sense to tell the browser that it needs to track this. Currently, the model passes along:
Which is the basic structuring, but revolves around a single label / space. We could modify to more like:
Then the browser would need to handle a few specific types of labels. |
I have two questions/requests for the labels in the data the browser produces in the MaskedLARK proposal.
If I understand correctly, when a user clicks on an ad, this would be stored in the browser, and then later, if the user converts, the browser would generate the datapoint (x=features, label=1). Whereas if the user failed to convert within a given time frame, the browser would generate the datapoint (x=features, label=0).
You describe how we could define arbitrary labels in the conversion pixel (e.g. use dollar amounts of the conversion instead of a binary is_conversion) but would it be possible to have the label include the delay between the click and the conversion? This would require help from the browser since the conversion pixel itself couldn't know the delay.
In the case where the user clicks ad 1, then clicks ad 2, and only then converts, would it be possible to emit a data point for the first click in addition to the second (ideally with a negative label)? In the proposal you emphasize how a click that timed out could be treated as a negative data point but I don't think you cover the case where a click is superseded by a more recent click.
That is, ideally (for us) if a user's browsing history looked like click_1 -> click_2 -> conversion -> click_3 -> (no conversion before click_3 times out), we would have the following data points (stored in the browser, to be masked etc later):
click_1: (x=click_1_feats, label=(is_conversion=False, delay=T_click_2 - T_click_1))
click_2: (x=click_2_feats, label=(is_conversion=True, delay=T_conversion - T_click_2))
click_3: (x=click_3_feats, label=(is_conversion=False, delay=Timeout_TTL))
The text was updated successfully, but these errors were encountered: