Link Invent Dataset inconsistent with the code base and prior model. #39

vincrichard · 2023-05-24T01:14:22Z

Hello and thank you for the opensource repository.

I was going through LinkInvent and wanted to train to try to train the model in a TL fashion with the dataset provided in ReinventCommunity/notebooks/data/linkinvent_prior_training_data and the prior model. However, I think there was an error in the process of dataset creation. This was mainly for testing the code and I am aware there is no particular use in doing this TL.

The code expects the data to have warheads/inputs as first columns and linkers/targets as the second column. This can be seen in the code as well as in the ReinventCommunity/notebooks/models/linkinvent.prior vocabulary which has * and | as input tokens and [*] as target token.

The dataset provided however follows the following setup:
Linkers/target ---- warheads/inputs ----- Full smiles
[*]C#CC(O)CCCCCCC[*] ---- *C#CCO|*CCC#CCCCCCCC(C)C ---- CC(C)CCCCCCC#CCCCCCCCCCC(O)C#CC#CCO

They should be modified to:

Warheads/inputs ----- linker/target ---- Full smiles
*C#CCO|*CCC#CCCCCCCC(C)C ---- *C#CCO|*CCC#CCCCCCCC(C)C ----CC(C)CCCCCCC#CCCCCCCCCCC(O)C#CC#CCO

I tried it on my hand and after doing so it worked fine.
This might not be a big issue since in the case of LinkInvent, TL is less important. And in the case of a new model the vocabulary will be recreated. I still wanted to share this feedback since the dataset does not match the code logic.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Link Invent Dataset inconsistent with the code base and prior model. #39

Link Invent Dataset inconsistent with the code base and prior model. #39

vincrichard commented May 24, 2023 •

edited

Loading

Link Invent Dataset inconsistent with the code base and prior model. #39

Link Invent Dataset inconsistent with the code base and prior model. #39

Comments

vincrichard commented May 24, 2023 • edited Loading

vincrichard commented May 24, 2023 •

edited

Loading