-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with torchscript #315
Comments
Thanks for the thorough issue! torchmd-net/torchmdnet/priors/zbl.py Lines 53 to 55 in 6694816
to 0.0 I have seen similar things before with TorchScript. |
Thanks as always for the quick response @RaulPPelaez! Making that change in the ZBL prior did indeed fix the issue and I was able to generate the torchscript module and run some dynamics using it. I will continue testing that to see if I stumble on any other bugs. As for the old checkpoint loading problem, I have attached a checkpoint file and the yaml file used to run the experiment to this issue as a zip file. This model was trained without ZBL on one A100 GPU. I get the following error if I try to load the model:
Because the most recent commit at 6694816 involved reformatting some keys of the state dictionary, I thought that this would be related to that. Thanks for looking into it! |
@FranklinHu1, I am able to load your model using #318. |
Hello,
I am running into a problem with using torchscript to integrate a trained tensornet model with openmm for dynamics. This is in the newest version of the code as of writing (hash 6694816).
The system
I am running this code on NERSC Perlmutter, which uses A100 GPUs (either 40GB or 80GB). My anaconda environment is as follows. I set this environment up following the documentation available at https://torchmd-net.readthedocs.io/en/latest/installation.html using the install from source instructions:
Setup
I trained a tensornet model with the ZBL prior using the following configuration file. I included the ZBL prior since I am working with systems containing ions. Training was done on a single A100 GPU, and was restarted from the latest checkpoint after 1000 epochs.
I then used the following script to generate the force module. Since I am using periodic boundary conditions, I use the version
ForceModulePBC
:The error
I ran the following command using this script:
It seems the code to generate this torchscript module fails on the call to
torch.jit.script()
with the following error:Other info
Because of the MLP change introduced in commit 6694816, I cannot try to load old models since the keys have been mismatched. However, I did try downgrading my version of the repository to version 74702da and the code above did all work (albeit with an older trained model).
As always, thank you so much for your time, and any help would be greatly appreciated!
The text was updated successfully, but these errors were encountered: