Modeling non-independant labeling functions? #1641

moscow25 · 2021-04-05T06:08:52Z

My understanding is that in the Snorkel/Google paper, modeling feature co-variance is noted. However in the current implementation, as far as I can tell, features are all assumed independent.

snorkel/snorkel/labeling/model/label_model.py

Line 103 in ed77718

    
               Currently this class uses a conditionally independent label model, in which the LFs

Do I mis-understand, or is there a way of handling labels being very highly correlated? For example I may have a classifier, which is more accurate for short text than for longer text. At the moment I can't really create "independent" features for "model" and "model_280" which only applies to longer text. Since this skews the bootstrapping of the model.

Please let me know if I do not interpret this correctly?

henryre · 2021-04-12T15:53:15Z

Hi @moscow25, please take a look at this releated thread: #1596. In this case, you could manually resolve the dependency in the labeling function itself (e.g. by running the shorter model if the text field is below some character length limit), and you could also empirically test (e.g. using a hold-out set) whether adding both models as independent labeling functions actually helps performance.

moscow25 · 2021-04-12T18:47:43Z

Thanks @henryre. I appreciate the link to #1596. Empirically, the function works ok, good to know there's a formula one could implement from the paper.

github-actions · 2021-07-12T12:10:21Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions bot added the no-issue-activity label Jul 12, 2021

github-actions bot closed this as completed Jul 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modeling non-independant labeling functions? #1641

Modeling non-independant labeling functions? #1641

moscow25 commented Apr 5, 2021

henryre commented Apr 12, 2021

moscow25 commented Apr 12, 2021

github-actions bot commented Jul 12, 2021

Modeling non-independant labeling functions? #1641

Modeling non-independant labeling functions? #1641

Comments

moscow25 commented Apr 5, 2021

henryre commented Apr 12, 2021

moscow25 commented Apr 12, 2021

github-actions bot commented Jul 12, 2021