Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Carry through space group information when reading and writing pdb files #689

Open
zanebeckwith opened this issue Oct 31, 2024 · 3 comments

Comments

@zanebeckwith
Copy link

zanebeckwith commented Oct 31, 2024

Thank you for the excellent project!

I noticed that, when I read in a pdb-formatted file and then write it back out,
the space group information in the CRYST1 record gets lost.

Looking in the code, it looks like the space group is hardcoded to always be P1.

And, the Z value always gets saved as 1.

Is there any reason the space group and Z can't be saved when a file is read in, and properly written when the structure is saved to file?

If there is a way to do this, I'm more than happy to give it a shot and submit a PR.

Thanks!

@padix-key
Copy link
Member

padix-key commented Nov 1, 2024

AtomArray objects do not have a notion of space groups. Regarding the unit cell, they only store the cell vectors. Thus, pdb.set_structure() has no way to know the space group and only adds a placeholder space group.

In addition the support for the PDB format is relatively rudimentary in Biotite, as the format itself is deprecated, i.e. there are no low-level file editing capabilities. For PDBx (CIF/BinaryCIF) Biotite has such capabilities. You could set the space group e.g. like this:

symmetry_category = CIFCategory({"space_group_name_H-M": "P 21 21 21"})
cif_file.block["symmetry"] = symmetry_category

If you would like to implement reading/writing the space group for PDBFile, I would suggest to create a pair of methods get_space_group(), set_space_group(). This way the AtomArray does not need to to store this extra information.

@zanebeckwith
Copy link
Author

Hi @padix-key, thank you for your help!

Sorry for the long delay on my response

I think a setup like you described would work fine for our purposes. I can throw up a PR sometime soon to implement that

I have a question, though (apologies for my ignorance on crystallography):

Don't the AtomArray objects need to know the space group (and the Z value), in addition to the cell vectors?

If the pdb file contains the asymmetric unit, the space group and Z value will be needed to construct the full unit cell, right? Otherwise, with just the cell vectors, the AtomArray would have an incomplete unit cell

Again, apologies if either my understanding of crystallography is faulty here or if Biotite is already handling this gracefully

@padix-key
Copy link
Member

padix-key commented Nov 17, 2024

get_structure() only returns the asymmetric unit of the structure. This means if the full unit cell would contain the structure multiple times, the AtomArray would indeed miss atoms, if the user expects the entire unit cell. To get the entire unit cell the function get_symmetry_mates() is planned (#660). But this function would also only use the space group information from the PDBx file to construct the AtomArray representing the unit cell. The AtomArray would still not contain the space group info itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants