How to ensure to ppl in test set have been seen in the train set? #265

littlebeanbean7 · 2024-11-15T20:14:13Z

Hello Fani Lab team,

I hope you are well! I wanted to use OpeNTF to run baseline model, ideally if your function could take my train/test set's id (eg paper id in dblp data) as an input parameter, that would be very handy to use. But I don't find such an option.

I found in main.py that if I don't do time split, the train/test split is calling sklearn's train_test_split().

Then, my question is: would you ensure people (eg authors in dblp data) in the Test set have appeared in Train set? If yes, could you please point me to the code where do you do this? If not, could you please explain why we don't need to do that?

Thank you very much,
Lingling

hosseinfani · 2024-11-16T08:44:52Z

Hi @littlebeanbean7
there is no garantee. the split is based on team instances. So, there is a chance that an expert, or all experts of a team, have not been seen during the training.

However, there is a filtering step in our pipeline that filters out the sparse experts, that is, to remove the experts who have less than a number of teams. This way, you can make sure that for each expert, there are at least some number of teams in the dataset. Hence, when you split, there is a chance that the expert happens to be in the train and test in some teams.

Also, since the evaluation is n-fold, and the final result is on the average of n models, each trained on each fold, there is an even lower chance of zero-shot for an expert.

littlebeanbean7 · 2024-11-16T20:22:42Z

Thank you for your kind reply @hosseinfani ! I will add a chunk of code to load in Train and Test sets in main.py to ensure fair comparison with my experiment.

littlebeanbean7 changed the title ~~How to ensure to ppl in same test set have been seen in the train set?~~ How to ensure to ppl in test set have been seen in the train set? Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to ensure to ppl in test set have been seen in the train set? #265

How to ensure to ppl in test set have been seen in the train set? #265

littlebeanbean7 commented Nov 15, 2024

hosseinfani commented Nov 16, 2024

littlebeanbean7 commented Nov 16, 2024

How to ensure to ppl in test set have been seen in the train set? #265

How to ensure to ppl in test set have been seen in the train set? #265

Comments

littlebeanbean7 commented Nov 15, 2024

hosseinfani commented Nov 16, 2024

littlebeanbean7 commented Nov 16, 2024