Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support extraction of secondary structure elements from PDBx files #710

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions src/biotite/structure/io/pdbx/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
"set_component",
"list_assemblies",
"get_assembly",
"get_sse"
]

import itertools
Expand Down Expand Up @@ -1616,3 +1617,56 @@ def _convert_string_to_sequence(string, stype):
raise InvalidFileError(
"mmCIF _entity_poly.type unsupported" " type: " + stype
)

def get_sse(pdbx_file, data_block=None):
"""
Gets secondary structure from pdbx file
ceziegler marked this conversation as resolved.
Show resolved Hide resolved

Parameters
----------
pdbx_file : CIFFile or CIFBlock or BinaryCIFFile or BinaryCIFBlock
The file object.

Returns
----------
ceziegler marked this conversation as resolved.
Show resolved Hide resolved
sec_struct_dic: keys are the different chains from the pdbx file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the description of a parameter Numpydoc uses the form

some_name : some_type
    Some description.

and values are a letter representing the secondary structure
'a' means alpha-helix, 'b' means beta-strand/sheet, 'c' means coil.
'' indicates that a residue is not an amino acid or it comprises
no CA atom for each atom in the atom array

"""
sec_struct_dic = {}
block = _get_block(pdbx_file, data_block)
cif_feats = list(block.keys())

# Init all chains with "c" for coil
for idx, chain in enumerate(block["struct_ref_seq"]["pdbx_strand_id"].as_array(str)):
ref_id = block["struct_ref_seq"]["ref_id"].as_array(int)[idx]
chain_idxs = np.where(block['entity_poly_seq']['entity_id'].as_array(int) == ref_id)[0]
sec_struct_dic[chain] = np.repeat('c', len(chain_idxs))

# Get alpha helices
if "struct_conf" in cif_feats:
alpha = block["struct_conf"]
pdb_chain = alpha['beg_label_asym_id'].as_array(str)
start_pos = alpha['beg_label_seq_id'].as_array(int)
end_pos = alpha['end_label_seq_id'].as_array(int)

# set alpha helix positions
for idx in range(len(pdb_chain)):
sec_struct_dic[pdb_chain[idx]][start_pos[idx]:(end_pos[idx]+1)] = 'a'
ceziegler marked this conversation as resolved.
Show resolved Hide resolved

# Get beta sheets
if "struct_sheet" in cif_feats:
beta = block["struct_sheet_range"]
pdb_chain = beta['beg_label_asym_id'].as_array(str)
start_pos = beta['beg_label_seq_id'].as_array(int)
end_pos = beta['end_label_seq_id'].as_array(int)

# set alpha helix positions
for idx in range(len(pdb_chain)):
sec_struct_dic[pdb_chain[idx]][start_pos[idx]:(end_pos[idx]+1)] = 'b'
Comment on lines +1661 to +1670
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost a duplication of the above code, only the category and assigned letter is different. Maybe it would be clearer to refactor it as a function that takes sec_struct_dic, the category and the letter to fill in?



return sec_struct_dic