-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LM embeddings for complex did not have the right length for the protein #38
Comments
Hello! Rather than a screenshot of the code itself, could you also provide the associated stderr output? That is, if the code generates a print statement or a .out file, could you please show that as well? From what I've seen with working in this package, I think this should not be an endemic issue, rather, this might happen sometimes in the "pseudo-dynamics" timesteps of the model-- that being said, I do believe that this issue might be moreso due to the structure rather than the esm embeddings. I hope this helps! |
Hello, Thank you for providing the associated issues; for context, do the third and fourth chains also expect 559 residues as well? Also, are what args are you passing into the top level argument? I hope to continue to provide support! |
Hi ! Yes, the 3rd and 4th chains also have 559 amino acids. The hyperparameters are as follows and are consistent with the README: It may take up to a month for me to provide the specific PDB complex file. Best wishes ! |
Hello, Don't worry so much about providing the pdb files, I have done the stack trace by hand for you: Top level call: run_single_protein_inference.py Then, within inference.py, this line calls pdbBind, which leads to a call to the init here, which eventually calls inference_preprocessing here, which leads to the call for extract_receptor_structure here This calls extract_receptor_structure -> which leads to your original problem, here So, to place into context why this is important, we do this stack trace so see exactly at which point you have issues with lm_embeddings generation. From the stack trace, it seems that the issue comes from the input to the extract_receptor_structure function, which is likely Given no hts option in your top-level command, then I would next look into the path where the esm embeddings are generated, as there is different If the esm2 outputs are still available, could you possibly provide the related files within data/esm2_output? Otherwise, I would try looking into the outputs provided from the command If none of the original environment are available, I would suggest that the flag '--truncation_seq_length' in the esm/scripts/extract.py could be your issue, and you should try to replicate the issue with a large protein on the PDB, such as Best of luck! |
Hi! I've tried to reproduce the issue and provide the ESM2 outputs. Therefore, it is believed that when performing many protein-ligand docking tasks, saving all preprocessed data in a single folder might lead to confusion and inconsistencies. Thus, it is suggested that adding a path prefix, data_pre_dir = f'{args.results}/{args.header}', before all data used in run_single_protein_inference.py: f"{data_pre_dir}/data". I apologize for taking up your valuable time. Best wishes! |
Yeah, that makes sense! I wish you the best in your future use, have a good day! |
Hi !
Great work !
I have encountered several issues during the docking process. And I am seeking to determine whether the issue originate from the protein structure or the ESM employed, and to pinpoint the exact nature of the problem.
LM embeddings for complex did not have the right length for the protein
Thank you !
best wishes!
The text was updated successfully, but these errors were encountered: