You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.
Hello and thank you for the opensource repository.
I was going through LinkInvent and wanted to train to try to train the model in a TL fashion with the dataset provided in ReinventCommunity/notebooks/data/linkinvent_prior_training_data and the prior model. However, I think there was an error in the process of dataset creation. This was mainly for testing the code and I am aware there is no particular use in doing this TL.
The code expects the data to have warheads/inputs as first columns and linkers/targets as the second column. This can be seen in the code as well as in the ReinventCommunity/notebooks/models/linkinvent.prior vocabulary which has * and | as input tokens and [*] as target token.
The dataset provided however follows the following setup:
Linkers/target ---- warheads/inputs ----- Full smiles [*]C#CC(O)CCCCCCC[*] ---- *C#CCO|*CCC#CCCCCCCC(C)C ---- CC(C)CCCCCCC#CCCCCCCCCCC(O)C#CC#CCO
They should be modified to:
Warheads/inputs ----- linker/target ---- Full smiles *C#CCO|*CCC#CCCCCCCC(C)C ---- *C#CCO|*CCC#CCCCCCCC(C)C ----CC(C)CCCCCCC#CCCCCCCCCCC(O)C#CC#CCO
I tried it on my hand and after doing so it worked fine.
This might not be a big issue since in the case of LinkInvent, TL is less important. And in the case of a new model the vocabulary will be recreated. I still wanted to share this feedback since the dataset does not match the code logic.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hello and thank you for the opensource repository.
I was going through LinkInvent and wanted to train to try to train the model in a TL fashion with the dataset provided in
ReinventCommunity/notebooks/data/linkinvent_prior_training_data
and the prior model. However, I think there was an error in the process of dataset creation. This was mainly for testing the code and I am aware there is no particular use in doing this TL.The code expects the data to have warheads/inputs as first columns and linkers/targets as the second column. This can be seen in the code as well as in the
ReinventCommunity/notebooks/models/linkinvent.prior
vocabulary which has*
and|
as input tokens and[*]
as target token.The dataset provided however follows the following setup:
Linkers/target ---- warheads/inputs ----- Full smiles
[*]C#CC(O)CCCCCCC[*]
----*C#CCO|*CCC#CCCCCCCC(C)C
----CC(C)CCCCCCC#CCCCCCCCCCC(O)C#CC#CCO
They should be modified to:
Warheads/inputs ----- linker/target ---- Full smiles
*C#CCO|*CCC#CCCCCCCC(C)C
----*C#CCO|*CCC#CCCCCCCC(C)C
----CC(C)CCCCCCC#CCCCCCCCCCC(O)C#CC#CCO
I tried it on my hand and after doing so it worked fine.
This might not be a big issue since in the case of LinkInvent, TL is less important. And in the case of a new model the vocabulary will be recreated. I still wanted to share this feedback since the dataset does not match the code logic.
The text was updated successfully, but these errors were encountered: