You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently I've been using PoseCheck to evaluate clash, strain energy and key interactions for SBDD models. And I found the following unexpected behaviors:
When loading CrossDocked test set proteins, a ValueError occurs: vdw radii for types: CO. These can be defined manually using the keyword 'vdwradii', which results in protein loading failed. I did a quick search to resolve this bug but cannot find relavant solutions, except that copilot suggests inserting the following code:
vdw_radii = {'CO': 1.7} # Define the vdw radius for 'CO'. The value 1.7 is just an example, replace it with the correct value.
ag.guess_bonds(vdwradii=vdw_radii)
I'm not sure whether 1.7 is correct for CO, and what is ag, how to call guess_bonds?
Since doing PoseCheck on >10K molecules is quite slow, multi-processing is necessary. An error would occur if multiple processes load the same protein at the same time. After some debugging, I found the following code (starting at line 60 in posecheck/utils/loading.py):
tmp_path = pdb_path.split(".pdb")[0] + "_tmp.pdb"
# Call reduce to make tmp PDB with waters
reduce_command = f"{reduce_path} -NOFLIP {pdb_path} -Quiet > {tmp_path}"
subprocess.run(reduce_command, shell=True)
# Load the protein from the temporary PDB file
prot = load_protein_prolif(tmp_path)
os.remove(tmp_path)
I guess when multiple processes accessing the same tmp_path, a process may unexpectedly remove the file before another process load it. Thus I made the following quick fix:
This quick fix works for me. Please check if this is correct and consider update your source code for better multi-process support.
Except for those bugs, I will appreciate your help in the following cases:
In your README tips, you said Reading and processing all the PDB files using reduce can take a while for a large test set. If you are running PoseCheck frequently, it might be worth pre-processing all proteins yourself using prot = posecheck.utils.loading.load_protein_from_pdb(pdb_path) and setting this directly within PoseCheck using pc.protein = prot. I guess you are mentioning the following code which could be quite time-consuming:
# Call reduce to make tmp PDB with waters
reduce_command = f"{reduce_path} -NOFLIP {pdb_path} -Quiet > {tmp_path}"
subprocess.run(reduce_command, shell=True)
How about adding an interface to preprocess all protein files? And I'm curious about how much time can be saved if proteins are preprocessed?
Can you provide some formal code for calculating the exact numbers for key interactions? Since your interface pc.calculate_interactions() returns a complicated dataframe, I'm not sure whether my parsing is correct:
Please verify my code and provide some formal guidelines.
When using PoseCheck, many warning messages will be printed like this:
/opt/conda/lib/python3.9/site-packages/MDAnalysis/topology/guessers.py:146: UserWarning: Failed to guess the mass for the following atom types: CO
warnings.warn("Failed to guess the mass for the following atom types: {}".format(atom_type))
/opt/conda/lib/python3.9/site-packages/MDAnalysis/converters/RDKit.py:473: UserWarning: No `bonds` attribute in this AtomGroup. Guessing bonds based on atoms coordinates
warnings.warn(
WARNING: atom ZN from ZN will be treated as zinc
*WARNING*: Residues GLN 84 and HIS 90 in chain B appear unbonded
and will be treated as a chain break
*WARNING*: Residues GLN 84 and HIS 90 in chain B appear unbonded
and will be treated as a chain break
*WARNING*: Residues LEU 98 and ASP 101 in chain B appear unbonded
and will be treated as a chain break
*WARNING*: Residues LEU 98 and ASP 101 in chain B appear unbonded
and will be treated as a chain break
*WARNING*: Res "ZN" not in HETATM Connection Database. Hydrogens not added.
*WARNING*: Res "K" not in HETATM Connection Database. Hydrogens not added.
*WARNING*: Res "K" not in HETATM Connection Database. Hydrogens not added.
I think this information doesn't change the results, right? Do you know how to suppress these warning message? A verbose flag would be quite useful if this message can be turned off.
I think many of your followers are using CrossDocked dataset. A formal and efficient evaluation script for CrossDocked (many proteins, each protein associated with many molecules) would be beneficial. I would like to discuss with you and contribute to this script.
If you want discussion, drop me an email at kevinqu16 [at] gmail [dot] com.
That's all. Thank you!
The text was updated successfully, but these errors were encountered:
Hi:
Recently I've been using PoseCheck to evaluate clash, strain energy and key interactions for SBDD models. And I found the following unexpected behaviors:
vdw radii for types: CO. These can be defined manually using the keyword 'vdwradii'
, which results in protein loading failed. I did a quick search to resolve this bug but cannot find relavant solutions, except that copilot suggests inserting the following code:I'm not sure whether 1.7 is correct for CO, and what is
ag
, how to callguess_bonds
?I guess when multiple processes accessing the same
tmp_path
, a process may unexpectedly remove the file before another process load it. Thus I made the following quick fix:This quick fix works for me. Please check if this is correct and consider update your source code for better multi-process support.
Except for those bugs, I will appreciate your help in the following cases:
Reading and processing all the PDB files using reduce can take a while for a large test set. If you are running PoseCheck frequently, it might be worth pre-processing all proteins yourself using prot = posecheck.utils.loading.load_protein_from_pdb(pdb_path) and setting this directly within PoseCheck using pc.protein = prot.
I guess you are mentioning the following code which could be quite time-consuming:How about adding an interface to preprocess all protein files? And I'm curious about how much time can be saved if proteins are preprocessed?
pc.calculate_interactions()
returns a complicated dataframe, I'm not sure whether my parsing is correct:Please verify my code and provide some formal guidelines.
I think this information doesn't change the results, right? Do you know how to suppress these warning message? A
verbose
flag would be quite useful if this message can be turned off.If you want discussion, drop me an email at kevinqu16 [at] gmail [dot] com.
That's all. Thank you!
The text was updated successfully, but these errors were encountered: