Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MaskedLARK: Additional Types of Data Labels #29

Open
lcevans opened this issue Aug 5, 2021 · 3 comments
Open

MaskedLARK: Additional Types of Data Labels #29

lcevans opened this issue Aug 5, 2021 · 3 comments

Comments

@lcevans
Copy link

lcevans commented Aug 5, 2021

I have two questions/requests for the labels in the data the browser produces in the MaskedLARK proposal.

If I understand correctly, when a user clicks on an ad, this would be stored in the browser, and then later, if the user converts, the browser would generate the datapoint (x=features, label=1). Whereas if the user failed to convert within a given time frame, the browser would generate the datapoint (x=features, label=0).

  1. You describe how we could define arbitrary labels in the conversion pixel (e.g. use dollar amounts of the conversion instead of a binary is_conversion) but would it be possible to have the label include the delay between the click and the conversion? This would require help from the browser since the conversion pixel itself couldn't know the delay.

  2. In the case where the user clicks ad 1, then clicks ad 2, and only then converts, would it be possible to emit a data point for the first click in addition to the second (ideally with a negative label)? In the proposal you emphasize how a click that timed out could be treated as a negative data point but I don't think you cover the case where a click is superseded by a more recent click.

That is, ideally (for us) if a user's browsing history looked like click_1 -> click_2 -> conversion -> click_3 -> (no conversion before click_3 times out), we would have the following data points (stored in the browser, to be masked etc later):

click_1: (x=click_1_feats, label=(is_conversion=False, delay=T_click_2 - T_click_1))
click_2: (x=click_2_feats, label=(is_conversion=True, delay=T_conversion - T_click_2))
click_3: (x=click_3_feats, label=(is_conversion=False, delay=Timeout_TTL))

@jpfeiffe
Copy link

Right, the proposal is centered entirely around last click attribution at the moment.

From the masking / training side of things there's really no issue here (if I understand). On the label side, there are labels (e.g.) {0, 1} and label [0, 30] -- the first copy has two values that need to be sent, the second has the quantized number to be sent (e.g., discretize into 5 buckets). You'll want to optimize with BCE + lambda MSE or something, and we'll need to generate fakes for both, but those are just details. Also from an implementation standpoint, I don't think it would be hard for the browser, either.

I think most of the work is designing the API here. How can we tell the browser to track this -- perhaps we should expand model_label_space to be a JSON like structure or something? Also, when training we need to tell the helper how to construct this kind of special loss -- that's also going to need a bit of thought.

@lcevans
Copy link
Author

lcevans commented Aug 13, 2021

I think there is more to this than the label space for the delay being e.g. [0, 30]: Only the browser could know the delay => somehow the browser would need to assign this label (rather than just take the label provided by the conversion pixel)

@jpfeiffe
Copy link

Right -- it's more about where it makes sense to tell the browser that it needs to track this. Currently, the model passes along:

{
    "model_tag" : "ConversionModel11005",
    "model_features" : "...", 
    "model_label_type" : <enum>,
    "model_label_space" : {0, 1}
}

Which is the basic structuring, but revolves around a single label / space. We could modify to more like:

{
    "model_tag" : "ConversionModel11005",
    "model_features" : "...", 
    "model_labels" : [
        {"type" : purchase-occurred, "space" : {0, 1}
        , {"type" : browser-delay, "space" : [0, 30]
    ]
}

Then the browser would need to handle a few specific types of labels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants