LM embeddings for complex did not have the right length for the protein #38

liyue9129 · 2024-11-07T14:01:19Z

Hi !
Great work !
I have encountered several issues during the docking process. And I am seeking to determine whether the issue originate from the protein structure or the ESM employed, and to pinpoint the exact nature of the problem.
LM embeddings for complex did not have the right length for the protein

Thank you !
best wishes!

The text was updated successfully, but these errors were encountered:

patjiang · 2024-11-08T17:39:02Z

Hello! Rather than a screenshot of the code itself, could you also provide the associated stderr output? That is, if the code generates a print statement or a .out file, could you please show that as well?

From what I've seen with working in this package, I think this should not be an endemic issue, rather, this might happen sometimes in the "pseudo-dynamics" timesteps of the model-- that being said, I do believe that this issue might be moreso due to the structure rather than the esm embeddings. I hope this helps!

liyue9129 · 2024-11-15T14:23:36Z

Thank you so much for your help !
The associated stderr output is as follows：

Due to computer issues in the past few days resulting in data loss, I apologize for not being able to provide the specific PDB complex file.
However, during my previous debugging, I found that the problem originated from using the ESM embeddings to generate protein features, as shown in the red box in the figure below. The first and second chains are fine, but the third and fourth chains only encode 37 and 32 amino acids, respectively.

Best wishes !

patjiang · 2024-11-15T17:52:39Z

Hello,

Thank you for providing the associated issues; for context, do the third and fourth chains also expect 559 residues as well?

Also, are what args are you passing into the top level argument?

I hope to continue to provide support!

liyue9129 · 2024-11-16T02:10:21Z

Hi !

Yes, the 3rd and 4th chains also have 559 amino acids.

The hyperparameters are as follows and are consistent with the README:
` command = [
"python", "run_single_protein_inference.py",
f"/public/home/user/complex_preparation/posebusters/{data_name}/{name}/{name}_protein.pdb",
f"/public/home/user/complex_preparation/posebusters/{data_name}/{name}/{name}_ligand_smiles_to_csv.csv",
"--savings_per_complex", "40",
"--inference_steps", "20",
"--header", f"{name}",
"--device", "0",
"--python", "/public/home/user/miniconda3/envs/dynamicbind/bin/python",
"--relax_python", "/public/home/user/miniconda3/envs/relax/bin/python",
"--result", f"/public/home/user/DLDock/DynamicBind/test/{data_name}",
]
、

It may take up to a month for me to provide the specific PDB complex file.

Best wishes !

patjiang · 2024-11-18T21:23:13Z

Hello,

Don't worry so much about providing the pdb files, I have done the stack trace by hand for you:

Top level call: run_single_protein_inference.py
This runs and functions fine until: inference.py is called (here

Then, within inference.py, this line calls pdbBind, which leads to a call to the init here, which eventually calls inference_preprocessing here, which leads to the call for extract_receptor_structure here

This calls extract_receptor_structure -> which leads to your original problem, here

So, to place into context why this is important, we do this stack trace so see exactly at which point you have issues with lm_embeddings generation. From the stack trace, it seems that the issue comes from the input to the extract_receptor_structure function, which is likely lm_embeddings_chains_all. For reference, this is generated in these lines here.

Given no hts option in your top-level command, then I would next look into the path where the esm embeddings are generated, as there is different lm_embeddings_chains_all behaviour based on hts.

If the esm2 outputs are still available, could you possibly provide the related files within data/esm2_output?

Otherwise, I would try looking into the outputs provided from the command for embeddings_path in embeddings_paths: lm_embeddings_chains.append(torch.load(embeddings_path)['representations'][33])

If none of the original environment are available, I would suggest that the flag '--truncation_seq_length' in the esm/scripts/extract.py could be your issue, and you should try to replicate the issue with a large protein on the PDB, such as
this.

Best of luck!

liyue9129 · 2024-11-28T02:26:21Z

Hi！
It is so nice of you to provide such a clear stack trace ！

I've tried to reproduce the issue and provide the ESM2 outputs.
To do so, I modified the path where preprocessed data is saved in run_single_protein_inference.py and found that the error no longer occurred.

Therefore, it is believed that when performing many protein-ligand docking tasks, saving all preprocessed data in a single folder might lead to confusion and inconsistencies.

Thus, it is suggested that adding a path prefix, data_pre_dir = f'{args.results}/{args.header}', before all data used in run_single_protein_inference.py: f"{data_pre_dir}/data".

I apologize for taking up your valuable time.

Best wishes!

patjiang · 2024-11-28T18:05:36Z

Yeah, that makes sense! I wish you the best in your future use, have a good day!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LM embeddings for complex did not have the right length for the protein #38

LM embeddings for complex did not have the right length for the protein #38

liyue9129 commented Nov 7, 2024 •

edited

Loading

patjiang commented Nov 8, 2024

liyue9129 commented Nov 15, 2024

patjiang commented Nov 15, 2024

liyue9129 commented Nov 16, 2024 •

edited

Loading

patjiang commented Nov 18, 2024

liyue9129 commented Nov 28, 2024

patjiang commented Nov 28, 2024

LM embeddings for complex did not have the right length for the protein #38

LM embeddings for complex did not have the right length for the protein #38

Comments

liyue9129 commented Nov 7, 2024 • edited Loading

patjiang commented Nov 8, 2024

liyue9129 commented Nov 15, 2024

patjiang commented Nov 15, 2024

liyue9129 commented Nov 16, 2024 • edited Loading

patjiang commented Nov 18, 2024

liyue9129 commented Nov 28, 2024

patjiang commented Nov 28, 2024

liyue9129 commented Nov 7, 2024 •

edited

Loading

liyue9129 commented Nov 16, 2024 •

edited

Loading