Skip to content

Commit

Permalink
Dna vs rna ribbons (#1023)
Browse files Browse the repository at this point in the history
* Version 1.2.0 of ribbons makes everything RNA style unless DNA and RNA are both present in a model, in which case it switches the DNA orientation.

* Fixing spelling error

* removing the RNA residue names from the list of DNA residues to make DNA-only list

* Use the secondary structure information in the PDB file if present, otherwise run ksdssp

* fixed link_residues test

* cootbx: make file finding more portable

- Works for both development builds and installer builds

[skip ci]

* bootstrap: add AlphaFold repository

[skip ci]

* conda: clean up devenv file fot cctbx dependencies

[skip ci]

* CI: switch mamba to conda

* libtbx: remove entry point functions

* libtbx: remove most distutils imports

- Fixes #1018 for more general Python 3.12 support
- Three remaining distutils imports remain, but in deprecated code

* conda: add Boost packages and bump minimum Python version

[skip ci]

* CI: update mirror in syntax check

[skip ci]

* boost: add temporary flag for Boost Timer deprecation

* CI: add checkout step

[skip ci]

* conda: add qt-webengine to devenv file

[skip ci]

* Catch None in arrays supplied to the minimizer

* Update CHANGELOG.rst for 2024.9 release

[skip ci]

* reversed unneeded change

* better cyclic testing

* Better handle whitespaces and other symbols in the filename

* passing cif_object to model

* CI: ignore conda enviroment removal failure

[skip ci]

* corrected missing angles

* quote chain ID in selections

* CI: use conda-forge scons for syntax check

* libtbx: remove fastentrypoints.py

* CI: add back channel for Python 3.13 release candidates

* small adjustment

* libtbx: updates for Python 3.13

- unittest.makeSuite -> unittest.defaultTestLoader.loadTestsFromTestCase
- __firstlineno__ and __static_attributes__ are new default class attributes
- leading whitespace in docstring is removed

* Make sure to apply remove_selection

* Allow const_shrink_donor_acceptor to be set in holton_geometry_validation

* another try to apply remove_selection to water selection

* fix indentation

* Moved test from phenix_regression/mmtbx/tst_pdb_interpretation.csh

* Allow specification of model filename in get_fmodel (in case more than one is present in data manager)

* Add assert_lines_not_in_file(), assert_lines_not_in_text() functions.

* Move parts of phenix_regression/mmtbx/tst_pdb_interpretation.csh here

* Clean clutter

* revamped geo writing

* update

* small adjustment

* Migrating the functionality from geo-parsing branch into master.

* Updating test.

* Attempt to pass py2 syntax checks... remove nonlocal and f-strings

* Incorporate new 'origin_ids.get_origin_label_and_internal' function, no longer need to parse any header info. Simplify configuration dictionary. Check number of restraints parsed against header value.

* Update documentation, remove commented out code

* Update documentation.

* reorganize imports

* Clean clutter. py2 syntax.

* Minor refactoring of fetch_pdb

* Py2 compatibility

* Add docstrings, remove commented code

* doc string

* new pdbtools feature to average alt confs with test

* Bug fix: this really has to be protein only!

* added alt conf option to reference coordinate option and added writing to .geo

* pinch_limit

* Allow ignoring waters

* Cleanup, typos

* parallelity restraints for alt confs

* Add stub for fetch_emdb

* fetch_emdb: now it downloads model and map(s)

* fix for elems bug after reading in json from pickle, added additional tests to mmtbx list

* don't include H in reference coords

* Move useful function into a central location. Create a test.

* small changes

* Mark unused, untested functions

* Refactoring fetch_pdb for readability, adding map functionality, without breaking anything yet.

* Unused import

* Tidy-up Phenix: remove unused functionality in mmtbx/building

* Remove unused imports

* Cleanup extra functionality. No searching for 3 different undocumented environtal variables.

* Cleanup

* Skip if not on Py3

* Fix test

* starting...

* cofsky data

* start of more fine-grain residue classes

* python3 syntax

* Allow selection of altloc and water

* clean clutter

* CI: use official Python 3.13 for syntax check

[skip ci]

* Check origin labels not origin ids during parsing tests. Remove checks for number of expected entries. Refactor atom_labels field only contains non-i_seq labels.

* Clean clutter

* Don't take PDB id in core pdb_input and cif_input functions.

* slow progress

* automatic lookup of parent and child

* removed verbose

* Maintain cctbx.xfel.merge cosym behaviour (#1022)

Update to the cosym worker to avoid additional filtering introduced by dials/dials#2741

* Remove extra definition of random_selection(). Move random_selection() to C++.

* Switch random_bool to random_selection in hierarchy remove_atoms() to have consistent number of atoms removed every time. Fix tests.

* Moving ksdssp into the main build and also configuring it

* Fixing prints for ribbons.py to go to the logger and adding a regression test for it.

* Removing unused import

* Version 1.2.0 of ribbons makes everything RNA style unless DNA and RNA are both present in a model, in which case it switches the DNA orientation.

* Fixing spelling error

* removing the RNA residue names from the list of DNA residues to make DNA-only list

* Use the secondary structure information in the PDB file if present, otherwise run ksdssp

* Moving ksdssp into the main build and also configuring it

* Fixing prints for ribbons.py to go to the logger and adding a regression test for it.

* Removing unused import

* Bumping tweak version number to relaunch the pull-request tests, which have been running for two days.

---------

Co-authored-by: Nigel W. Moriarty <[email protected]>
Co-authored-by: Billy K. Poon <[email protected]>
Co-authored-by: Pavel <[email protected]>
Co-authored-by: Oleg Sobolev <[email protected]>
Co-authored-by: terwill <[email protected]>
Co-authored-by: cschlick <[email protected]>
Co-authored-by: dcliebschner <[email protected]>
Co-authored-by: Vincent Chen <[email protected]>
Co-authored-by: James Beilsten-Edmands <[email protected]>
  • Loading branch information
10 people authored Oct 21, 2024
1 parent 0d879fd commit 6ba48be
Show file tree
Hide file tree
Showing 5 changed files with 137 additions and 19 deletions.
3 changes: 2 additions & 1 deletion libtbx/auto_build/bootstrap.py
Original file line number Diff line number Diff line change
Expand Up @@ -1993,6 +1993,7 @@ class CCIBuilder(Builder):
'clipper',
'eigen',
'reduce',
'ksdssp',
]
CODEBASES_EXTRA = []
# Copy these sources from cci.lbl.gov
Expand All @@ -2014,6 +2015,7 @@ class CCIBuilder(Builder):
'smtbx',
'gltbx',
'wxtbx',
'ksdssp',
]
LIBTBX_EXTRA = []

Expand Down Expand Up @@ -2380,7 +2382,6 @@ class PhenixBuilder(CCIBuilder):
'elbow',
'amber_adaptbx',
'amber_library',
'ksdssp',
'pulchra',
'solve_resolve',
'reel',
Expand Down
26 changes: 24 additions & 2 deletions mmtbx/kinemage/ribbons.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,10 @@
def _IsStandardResidue(resname):
return resname.strip().upper() in _amino_acid_resnames

_nucleic_acid_resnames = set(nucleic_acid_codes.rna_one_letter_code_dict.keys()).union(
set(nucleic_acid_codes.dna_one_letter_code_dict.keys()))
# Find the RNA and DNA residue sets. Remove the RNA names from the DNA set to get only definitely DNA names.
_rna_resnames = set(nucleic_acid_codes.rna_one_letter_code_dict.keys())
_dna_resnames = set(nucleic_acid_codes.dna_one_letter_code_dict.keys()) - _rna_resnames
_nucleic_acid_resnames = _dna_resnames.union(_rna_resnames)
def _IsNucleicAcidResidue(resname):
return resname.strip().upper() in _nucleic_acid_resnames

Expand Down Expand Up @@ -81,6 +83,26 @@ def _FindContiguousResiduesByAtomDistances(chain, type_function, desired_atoms,

# ------------------------------------------------------------------------------

def chain_has_DNA(chain):
'''Return True if the chain contains any DNA residues.
:param chain: PDB chain to be searched for DNA residues.
'''
for residue_group in chain.residue_groups():
if residue_group.unique_resnames()[0].strip().upper() in _dna_resnames:
return True
return False

def chain_has_RNA(chain):
'''Return True if the chain contains any RNA residues.
:param chain: PDB chain to be searched for RNA residues.
'''
for residue_group in chain.residue_groups():
if residue_group.unique_resnames()[0].strip().upper() in _rna_resnames:
return True
return False

# ------------------------------------------------------------------------------

def find_contiguous_protein_residues(chain, distance_threshold=5.0):
'''Return a list of contiguous protein residues in the chain based on the distance between CA atoms.
:param chain: PDB chain to be searched for contiguous residues.
Expand Down
55 changes: 39 additions & 16 deletions mmtbx/programs/ribbons.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,10 @@
from mmtbx.kinemage.ribbons import find_contiguous_protein_residues, find_contiguous_nucleic_acid_residues
from mmtbx.kinemage.ribbons import make_protein_guidepoints, make_nucleic_acid_guidepoints
from mmtbx.kinemage.ribbons import untwist_ribbon, swap_edge_and_face, _FindNamedAtomInResidue, _IsNucleicAcidResidue
from mmtbx.kinemage.ribbons import chain_has_DNA, chain_has_RNA
from mmtbx.kinemage.nrubs import Triple, NRUBS

version = "1.1.0"
version = "1.2.1"

master_phil_str = '''
do_protein = True
Expand Down Expand Up @@ -65,7 +66,7 @@ class Program(ProgramTemplate):
mmtbx.ribbons model.pdb
Output:
If neither output.file_name nor output.filename is specified, it will write
If output.filename is not specified, it will write
to a file with the same name as the input model file name but with the
extension replaced with with '.kin'.
Expand All @@ -85,7 +86,7 @@ def validate(self):
inName = self.data_manager.get_default_model_name()
p = Path(inName)
self.params.output.filename = str(p.with_suffix(suffix))
print('Setting output.filename Phil parameter to',self.params.output.filename)
print('Setting output.filename Phil parameter to',self.params.output.filename, file=self.logger)

# ------------------------------------------------------------------------------

Expand Down Expand Up @@ -537,11 +538,21 @@ def run(self):
selection = hierarchy.atom_selection_cache().selection(selection_string)
hierarchy = hierarchy.select(selection)

# See if the model file has secondary structure records.
# This should return None if there are no secondary structure records in the model.
sec_str_from_pdb_file = self.model.get_ss_annotation()

# Analyze the secondary structure and make a dictionary that maps from residue sequence number to secondary structure type
# by filling in 'COIL' as a default value for each and then parsing all of the secondary structure records in the
# model and filling in the relevant values for them.
print('Finding secondary structure:')
ss_manager = mmtbx.secondary_structure.manager(hierarchy)
print('Finding secondary structure:', file=self.logger)
params = mmtbx.secondary_structure.manager.get_default_ss_params()
params.secondary_structure.protein.search_method="ksdssp"
params = params.secondary_structure
ss_manager = mmtbx.secondary_structure.manager(hierarchy,
params=params,
sec_str_from_pdb_file=sec_str_from_pdb_file,
log=self.logger)
self.secondaryStructure = {}
for model in hierarchy.models():
for chain in model.chains():
Expand Down Expand Up @@ -597,7 +608,7 @@ def run(self):
modelID = model.id
if modelID == "":
modelID = "_"
print('Processing model', modelID, 'with', len(model.chains()), 'chains')
print('Processing model', modelID, 'with', len(model.chains()), 'chains', file=self.logger)
if groupByModel:
outString += "@group {{{} {}}} animate dominant master= {{all models}}\n".format(self.idCode, str(modelID).strip())

Expand All @@ -619,15 +630,24 @@ def run(self):
chainColors[name] = c
chainCount += 1

# Determine whether DNA, RNA, or both are present in the model
hasDNA = False
hasRNA = False
for chain in model.chains():
if chain_has_DNA(chain):
hasDNA = True
if chain_has_RNA(chain):
hasRNA = True

# Cycle over all chains in the model and make a group or subgroup for each chain
# depending on whether we are grouping by model or not.
for chain in model.chains():
print('Processing chain',chain.id)
print('Processing chain',chain.id, file=self.logger)

if self.params.do_protein:
# Find the contiguous protein residues by CA distance
contiguous_residue_lists = find_contiguous_protein_residues(chain)
print('Found {} contiguous protein residue lists'.format(len(contiguous_residue_lists)))
print('Found {} contiguous protein residue lists'.format(len(contiguous_residue_lists)), file=self.logger)

if len(contiguous_residue_lists) > 0:
if groupByModel:
Expand All @@ -653,9 +673,9 @@ def run(self):

for contig in contiguous_residue_lists:
guidepoints = make_protein_guidepoints(contig)
print(' Made {} protein guidepoints for {} residues'.format(len(guidepoints),len(contig)))
print(' Made {} protein guidepoints for {} residues'.format(len(guidepoints),len(contig)), file=self.logger)
if self.params.untwist_ribbons:
print(' Untwisted ribbon')
print(' Untwisted ribbon', file=self.logger)
untwist_ribbon(guidepoints)
# There is always secondary structure looked up for protein residues, so we skip the case from the Java code
# where it can be missing.
Expand All @@ -676,7 +696,7 @@ def run(self):
if self.params.do_nucleic_acid:
# Find the contiguous nucleic acid residues by CA distance
contiguous_residue_lists = find_contiguous_nucleic_acid_residues(chain)
print('Found {} contiguous nucleic acid residue lists'.format(len(contiguous_residue_lists)))
print('Found {} contiguous nucleic acid residue lists'.format(len(contiguous_residue_lists)), file=self.logger)

if len(contiguous_residue_lists) > 0:
if groupByModel:
Expand All @@ -695,15 +715,18 @@ def run(self):

for contig in contiguous_residue_lists:
guidepoints = make_nucleic_acid_guidepoints(contig)
print(' Made {} NA guidepoints for {} residues'.format(len(guidepoints),len(contig)))
print(' Made {} NA guidepoints for {} residues'.format(len(guidepoints),len(contig)), file=self.logger)
if self.params.untwist_ribbons:
print(' Untwisted ribbon')
print(' Untwisted ribbon', file=self.logger)
untwist_ribbon(guidepoints)
if self.params.DNA_style:
print(' Swapped edge and face (DNA style)')
# If the model has both DNA and RNA, and if this chain is DNA, swap the edge and face so that
# we can distinguish between them in the same model. Also, if the DNA_style parameter has been
# set, then always make this style.
if self.params.DNA_style or (hasDNA and hasRNA and chain_has_DNA(chain)):
print(' Swapped edge and face (DNA style)', file=self.logger)
swap_edge_and_face(guidepoints)
else:
print(' Using RNA style ribbons')
print(' Using RNA style ribbons', file=self.logger)

outString += self.printFancyRibbon(guidepoints, 3.0, 3.0,
"color= {nucl"+chain.id+"} master= {nucleic acid} master= {ribbon} master= {RNA helix?}",
Expand Down
71 changes: 71 additions & 0 deletions mmtbx/regression/tst_ribbons.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
##################################################################################
# This is a test program to validate that mmtbx.ribbons worked.
#

# Copyright 2024 Richardson Lab at Duke University
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from __future__ import absolute_import, division, print_function
from libtbx.utils import format_cpu_times
import os, subprocess, tempfile
import mmtbx
from mmtbx.programs import ribbons
import libtbx.load_env
from iotbx.cli_parser import run_program
from six.moves import cStringIO as StringIO
import re

def RunRibbonTests():

#========================================================================
# Regression test a against a snippet of a file, comparing the output
# to the output generated by a previous version of the program. If there are
# differences, report that this is the case and recommend verifying that the
# differences are intentional and replacing the stored output.
data_dir = libtbx.env.under_dist(
module_name = "mmtbx",
path = os.path.join("regression","pdbs"),
test = os.path.isdir)
model_file = os.path.join(data_dir,'Fe_1brf_snip_reduced.pdb')
temp_file = os.path.join(tempfile._get_default_tempdir(),
next(tempfile._get_candidate_names())+".kin" )

out = StringIO()
try:
# Run the program
args = [model_file, "output.overwrite=True", 'output.filename='+temp_file]
results = run_program(program_class=ribbons.Program, logger=out, args=args)

except Exception as e:
raise Exception("Could not call subprocess to do regression test: "+str(e))
instructions = ("Use KiNG or another program to see what changed and then determine if the "+
"differences are expected. If so, modify the expected numbers of sheets and helices tested for.")

# Count the number of helices and sheets in the output
pattern = r"(\d+) helices and (\d+) sheets defined"
match = re.search(pattern, out.getvalue())
if match:
N = int(match.group(1))
S = int(match.group(2))
else:
raise Exception("Helix/sheet summary not found (printed by secondary structure manager)")

if int(N) != 3 or int(S) != 1:
raise Exception("Different number of helices ("+str(N)+" vs. 3) or sheets ("+str(S)+" vs. 1): "+instructions)

if __name__ == '__main__':

RunRibbonTests()
print(format_cpu_times())
print('OK')
1 change: 1 addition & 0 deletions mmtbx/run_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,7 @@
# validation/molprobity
"$D/regression/tst_probe.py",
"$D/regression/tst_reduce.py",
"$D/regression/tst_ribbons.py",
"$D/validation/regression/tst_molprobity_arguments.py",
"$D/validation/regression/tst_chiral_validation.py",
"$D/validation/regression/tst_waters.py",
Expand Down

0 comments on commit 6ba48be

Please sign in to comment.