You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My understanding is that in the Snorkel/Google paper, modeling feature co-variance is noted. However in the current implementation, as far as I can tell, features are all assumed independent.
Currently this class uses a conditionally independent label model, in which the LFs
Do I mis-understand, or is there a way of handling labels being very highly correlated? For example I may have a classifier, which is more accurate for short text than for longer text. At the moment I can't really create "independent" features for "model" and "model_280" which only applies to longer text. Since this skews the bootstrapping of the model.
Please let me know if I do not interpret this correctly?
The text was updated successfully, but these errors were encountered:
Hi @moscow25, please take a look at this releated thread: #1596. In this case, you could manually resolve the dependency in the labeling function itself (e.g. by running the shorter model if the text field is below some character length limit), and you could also empirically test (e.g. using a hold-out set) whether adding both models as independent labeling functions actually helps performance.
Thanks @henryre. I appreciate the link to #1596. Empirically, the function works ok, good to know there's a formula one could implement from the paper.
My understanding is that in the Snorkel/Google paper, modeling feature co-variance is noted. However in the current implementation, as far as I can tell, features are all assumed independent.
snorkel/snorkel/labeling/model/label_model.py
Line 103 in ed77718
Do I mis-understand, or is there a way of handling labels being very highly correlated? For example I may have a classifier, which is more accurate for short text than for longer text. At the moment I can't really create "independent" features for "model" and "model_280" which only applies to longer text. Since this skews the bootstrapping of the model.
Please let me know if I do not interpret this correctly?
The text was updated successfully, but these errors were encountered: