Dna vs rna ribbons (#1023)

* Version 1.2.0 of ribbons makes everything RNA style unless DNA and RNA are both present in a model, in which case it switches the DNA orientation. * Fixing spelling error * removing the RNA residue names from the list of DNA residues to make DNA-only list * Use the secondary structure information in the PDB file if present, otherwise run ksdssp * fixed link_residues test * cootbx: make file finding more portable - Works for both development builds and installer builds [skip ci] * bootstrap: add AlphaFold repository [skip ci] * conda: clean up devenv file fot cctbx dependencies [skip ci] * CI: switch mamba to conda * libtbx: remove entry point functions * libtbx: remove most distutils imports - Fixes #1018 for more general Python 3.12 support - Three remaining distutils imports remain, but in deprecated code * conda: add Boost packages and bump minimum Python version [skip ci] * CI: update mirror in syntax check [skip ci] * boost: add temporary flag for Boost Timer deprecation * CI: add checkout step [skip ci] * conda: add qt-webengine to devenv file [skip ci] * Catch None in arrays supplied to the minimizer * Update CHANGELOG.rst for 2024.9 release [skip ci] * reversed unneeded change * better cyclic testing * Better handle whitespaces and other symbols in the filename * passing cif_object to model * CI: ignore conda enviroment removal failure [skip ci] * corrected missing angles * quote chain ID in selections * CI: use conda-forge scons for syntax check * libtbx: remove fastentrypoints.py * CI: add back channel for Python 3.13 release candidates * small adjustment * libtbx: updates for Python 3.13 - unittest.makeSuite -> unittest.defaultTestLoader.loadTestsFromTestCase - __firstlineno__ and __static_attributes__ are new default class attributes - leading whitespace in docstring is removed * Make sure to apply remove_selection * Allow const_shrink_donor_acceptor to be set in holton_geometry_validation * another try to apply remove_selection to water selection * fix indentation * Moved test from phenix_regression/mmtbx/tst_pdb_interpretation.csh * Allow specification of model filename in get_fmodel (in case more than one is present in data manager) * Add assert_lines_not_in_file(), assert_lines_not_in_text() functions. * Move parts of phenix_regression/mmtbx/tst_pdb_interpretation.csh here * Clean clutter * revamped geo writing * update * small adjustment * Migrating the functionality from geo-parsing branch into master. * Updating test. * Attempt to pass py2 syntax checks... remove nonlocal and f-strings * Incorporate new 'origin_ids.get_origin_label_and_internal' function, no longer need to parse any header info. Simplify configuration dictionary. Check number of restraints parsed against header value. * Update documentation, remove commented out code * Update documentation. * reorganize imports * Clean clutter. py2 syntax. * Minor refactoring of fetch_pdb * Py2 compatibility * Add docstrings, remove commented code * doc string * new pdbtools feature to average alt confs with test * Bug fix: this really has to be protein only! * added alt conf option to reference coordinate option and added writing to .geo * pinch_limit * Allow ignoring waters * Cleanup, typos * parallelity restraints for alt confs * Add stub for fetch_emdb * fetch_emdb: now it downloads model and map(s) * fix for elems bug after reading in json from pickle, added additional tests to mmtbx list * don't include H in reference coords * Move useful function into a central location. Create a test. * small changes * Mark unused, untested functions * Refactoring fetch_pdb for readability, adding map functionality, without breaking anything yet. * Unused import * Tidy-up Phenix: remove unused functionality in mmtbx/building * Remove unused imports * Cleanup extra functionality. No searching for 3 different undocumented environtal variables. * Cleanup * Skip if not on Py3 * Fix test * starting... * cofsky data * start of more fine-grain residue classes * python3 syntax * Allow selection of altloc and water * clean clutter * CI: use official Python 3.13 for syntax check [skip ci] * Check origin labels not origin ids during parsing tests. Remove checks for number of expected entries. Refactor atom_labels field only contains non-i_seq labels. * Clean clutter * Don't take PDB id in core pdb_input and cif_input functions. * slow progress * automatic lookup of parent and child * removed verbose * Maintain cctbx.xfel.merge cosym behaviour (#1022) Update to the cosym worker to avoid additional filtering introduced by dials/dials#2741 * Remove extra definition of random_selection(). Move random_selection() to C++. * Switch random_bool to random_selection in hierarchy remove_atoms() to have consistent number of atoms removed every time. Fix tests. * Moving ksdssp into the main build and also configuring it * Fixing prints for ribbons.py to go to the logger and adding a regression test for it. * Removing unused import * Version 1.2.0 of ribbons makes everything RNA style unless DNA and RNA are both present in a model, in which case it switches the DNA orientation. * Fixing spelling error * removing the RNA residue names from the list of DNA residues to make DNA-only list * Use the secondary structure information in the PDB file if present, otherwise run ksdssp * Moving ksdssp into the main build and also configuring it * Fixing prints for ribbons.py to go to the logger and adding a regression test for it. * Removing unused import * Bumping tweak version number to relaunch the pull-request tests, which have been running for two days. --------- Co-authored-by: Nigel W. Moriarty <[email protected]> Co-authored-by: Billy K. Poon <[email protected]> Co-authored-by: Pavel <[email protected]> Co-authored-by: Oleg Sobolev <[email protected]> Co-authored-by: terwill <[email protected]> Co-authored-by: cschlick <[email protected]> Co-authored-by: dcliebschner <[email protected]> Co-authored-by: Vincent Chen <[email protected]> Co-authored-by: James Beilsten-Edmands <[email protected]>
cctbx · Oct 21, 2024 · 6ba48be · 6ba48be
1 parent 0d879fd
commit 6ba48be
Show file tree

Hide file tree

Showing 5 changed files with 137 additions and 19 deletions.
diff --git a/libtbx/auto_build/bootstrap.py b/libtbx/auto_build/bootstrap.py
@@ -1993,6 +1993,7 @@ class CCIBuilder(Builder):
     'clipper',
     'eigen',
     'reduce',
+    'ksdssp',
   ]
   CODEBASES_EXTRA = []
   # Copy these sources from cci.lbl.gov
@@ -2014,6 +2015,7 @@ class CCIBuilder(Builder):
     'smtbx',
     'gltbx',
     'wxtbx',
+    'ksdssp',
   ]
   LIBTBX_EXTRA = []
 
@@ -2380,7 +2382,6 @@ class PhenixBuilder(CCIBuilder):
     'elbow',
     'amber_adaptbx',
     'amber_library',
-    'ksdssp',
     'pulchra',
     'solve_resolve',
     'reel',

diff --git a/mmtbx/kinemage/ribbons.py b/mmtbx/kinemage/ribbons.py
@@ -6,8 +6,10 @@
 def _IsStandardResidue(resname):
   return resname.strip().upper() in _amino_acid_resnames
 
-_nucleic_acid_resnames = set(nucleic_acid_codes.rna_one_letter_code_dict.keys()).union(
-  set(nucleic_acid_codes.dna_one_letter_code_dict.keys()))
+# Find the RNA and DNA residue sets. Remove the RNA names from the DNA set to get only definitely DNA names.
+_rna_resnames = set(nucleic_acid_codes.rna_one_letter_code_dict.keys())
+_dna_resnames = set(nucleic_acid_codes.dna_one_letter_code_dict.keys()) - _rna_resnames
+_nucleic_acid_resnames = _dna_resnames.union(_rna_resnames)
 def _IsNucleicAcidResidue(resname):
   return resname.strip().upper() in _nucleic_acid_resnames
 
@@ -81,6 +83,26 @@ def _FindContiguousResiduesByAtomDistances(chain, type_function, desired_atoms,
 
 # ------------------------------------------------------------------------------
 
+def chain_has_DNA(chain):
+  '''Return True if the chain contains any DNA residues.
+  :param chain: PDB chain to be searched for DNA residues.
+  '''
+  for residue_group in chain.residue_groups():
+    if residue_group.unique_resnames()[0].strip().upper() in _dna_resnames:
+      return True
+  return False
+
+def chain_has_RNA(chain):
+  '''Return True if the chain contains any RNA residues.
+  :param chain: PDB chain to be searched for RNA residues.
+  '''
+  for residue_group in chain.residue_groups():
+    if residue_group.unique_resnames()[0].strip().upper() in _rna_resnames:
+      return True
+  return False
+
+# ------------------------------------------------------------------------------
+
 def find_contiguous_protein_residues(chain, distance_threshold=5.0):
   '''Return a list of contiguous protein residues in the chain based on the distance between CA atoms.
   :param chain: PDB chain to be searched for contiguous residues.

diff --git a/mmtbx/programs/ribbons.py b/mmtbx/programs/ribbons.py
@@ -8,9 +8,10 @@
 from mmtbx.kinemage.ribbons import find_contiguous_protein_residues, find_contiguous_nucleic_acid_residues
 from mmtbx.kinemage.ribbons import make_protein_guidepoints, make_nucleic_acid_guidepoints
 from mmtbx.kinemage.ribbons import untwist_ribbon, swap_edge_and_face, _FindNamedAtomInResidue, _IsNucleicAcidResidue
+from mmtbx.kinemage.ribbons import chain_has_DNA, chain_has_RNA
 from mmtbx.kinemage.nrubs import Triple, NRUBS
 
-version = "1.1.0"
+version = "1.2.1"
 
 master_phil_str = '''
 do_protein = True
@@ -65,7 +66,7 @@ class Program(ProgramTemplate):
   mmtbx.ribbons model.pdb
 
 Output:
-  If neither output.file_name nor output.filename is specified, it will write
+  If output.filename is not specified, it will write
   to a file with the same name as the input model file name but with the
   extension replaced with with '.kin'.
 
@@ -85,7 +86,7 @@ def validate(self):
       inName = self.data_manager.get_default_model_name()
       p = Path(inName)
       self.params.output.filename = str(p.with_suffix(suffix))
-      print('Setting output.filename Phil parameter to',self.params.output.filename)
+      print('Setting output.filename Phil parameter to',self.params.output.filename, file=self.logger)
 
 # ------------------------------------------------------------------------------
 
@@ -537,11 +538,21 @@ def run(self):
     selection = hierarchy.atom_selection_cache().selection(selection_string)
     hierarchy = hierarchy.select(selection)
 
+    # See if the model file has secondary structure records.
+    # This should return None if there are no secondary structure records in the model.
+    sec_str_from_pdb_file = self.model.get_ss_annotation()
+
     # Analyze the secondary structure and make a dictionary that maps from residue sequence number to secondary structure type
     # by filling in 'COIL' as a default value for each and then parsing all of the secondary structure records in the
     # model and filling in the relevant values for them.
-    print('Finding secondary structure:')
-    ss_manager = mmtbx.secondary_structure.manager(hierarchy)
+    print('Finding secondary structure:', file=self.logger)
+    params = mmtbx.secondary_structure.manager.get_default_ss_params()
+    params.secondary_structure.protein.search_method="ksdssp"
+    params = params.secondary_structure
+    ss_manager = mmtbx.secondary_structure.manager(hierarchy,
+                                                   params=params,
+                                                   sec_str_from_pdb_file=sec_str_from_pdb_file,
+                                                   log=self.logger)
     self.secondaryStructure = {}
     for model in hierarchy.models():
       for chain in model.chains():
@@ -597,7 +608,7 @@ def run(self):
       modelID = model.id
       if modelID == "":
         modelID = "_"
-      print('Processing model', modelID, 'with', len(model.chains()), 'chains')
+      print('Processing model', modelID, 'with', len(model.chains()), 'chains', file=self.logger)
       if groupByModel:
         outString += "@group {{{} {}}} animate dominant master= {{all models}}\n".format(self.idCode, str(modelID).strip())
 
@@ -619,15 +630,24 @@ def run(self):
         chainColors[name] = c
         chainCount += 1
 
+      # Determine whether DNA, RNA, or both are present in the model
+      hasDNA = False
+      hasRNA = False
+      for chain in model.chains():
+        if chain_has_DNA(chain):
+          hasDNA = True
+        if chain_has_RNA(chain):
+          hasRNA = True
+
       # Cycle over all chains in the model and make a group or subgroup for each chain
       # depending on whether we are grouping by model or not.
       for chain in model.chains():
-        print('Processing chain',chain.id)
+        print('Processing chain',chain.id, file=self.logger)
 
         if self.params.do_protein:
           # Find the contiguous protein residues by CA distance
           contiguous_residue_lists = find_contiguous_protein_residues(chain)
-          print('Found {} contiguous protein residue lists'.format(len(contiguous_residue_lists)))
+          print('Found {} contiguous protein residue lists'.format(len(contiguous_residue_lists)), file=self.logger)
 
           if len(contiguous_residue_lists) > 0:
             if groupByModel:
@@ -653,9 +673,9 @@ def run(self):
 
             for contig in contiguous_residue_lists:
               guidepoints = make_protein_guidepoints(contig)
-              print(' Made {} protein guidepoints for {} residues'.format(len(guidepoints),len(contig)))
+              print(' Made {} protein guidepoints for {} residues'.format(len(guidepoints),len(contig)), file=self.logger)
               if self.params.untwist_ribbons:
-                print('  Untwisted ribbon')
+                print('  Untwisted ribbon', file=self.logger)
                 untwist_ribbon(guidepoints)
               # There is always secondary structure looked up for protein residues, so we skip the case from the Java code
               # where it can be missing.
@@ -676,7 +696,7 @@ def run(self):
         if self.params.do_nucleic_acid:
           # Find the contiguous nucleic acid residues by CA distance
           contiguous_residue_lists = find_contiguous_nucleic_acid_residues(chain)
-          print('Found {} contiguous nucleic acid residue lists'.format(len(contiguous_residue_lists)))
+          print('Found {} contiguous nucleic acid residue lists'.format(len(contiguous_residue_lists)), file=self.logger)
 
           if len(contiguous_residue_lists) > 0:
             if groupByModel:
@@ -695,15 +715,18 @@ def run(self):
 
             for contig in contiguous_residue_lists:
               guidepoints = make_nucleic_acid_guidepoints(contig)
-              print(' Made {} NA guidepoints for {} residues'.format(len(guidepoints),len(contig)))
+              print(' Made {} NA guidepoints for {} residues'.format(len(guidepoints),len(contig)), file=self.logger)
               if self.params.untwist_ribbons:
-                print('  Untwisted ribbon')
+                print('  Untwisted ribbon', file=self.logger)
                 untwist_ribbon(guidepoints)
-              if self.params.DNA_style:
-                print('  Swapped edge and face (DNA style)')
+              # If the model has both DNA and RNA, and if this chain is DNA, swap the edge and face so that
+              # we can distinguish between them in the same model.  Also, if the DNA_style parameter has been
+              # set, then always make this style.
+              if self.params.DNA_style or (hasDNA and hasRNA and chain_has_DNA(chain)):
+                print('  Swapped edge and face (DNA style)', file=self.logger)
                 swap_edge_and_face(guidepoints)
               else:
-                print('  Using RNA style ribbons')
+                print('  Using RNA style ribbons', file=self.logger)
 
               outString += self.printFancyRibbon(guidepoints, 3.0, 3.0,
                     "color= {nucl"+chain.id+"} master= {nucleic acid} master= {ribbon} master= {RNA helix?}",

diff --git a/mmtbx/regression/tst_ribbons.py b/mmtbx/regression/tst_ribbons.py
@@ -0,0 +1,71 @@
+##################################################################################
+# This is a test program to validate that mmtbx.ribbons worked.
+#
+
+#                Copyright 2024  Richardson Lab at Duke University
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import, division, print_function
+from libtbx.utils import format_cpu_times
+import os, subprocess, tempfile
+import mmtbx
+from mmtbx.programs import ribbons
+import libtbx.load_env
+from iotbx.cli_parser import run_program
+from six.moves import cStringIO as StringIO
+import re
+
+def RunRibbonTests():
+
+  #========================================================================
+  # Regression test a against a snippet of a file, comparing the output
+  # to the output generated by a previous version of the program.  If there are
+  # differences, report that this is the case and recommend verifying that the
+  # differences are intentional and replacing the stored output.
+  data_dir = libtbx.env.under_dist(
+    module_name = "mmtbx",
+    path = os.path.join("regression","pdbs"),
+    test = os.path.isdir)
+  model_file = os.path.join(data_dir,'Fe_1brf_snip_reduced.pdb')
+  temp_file = os.path.join(tempfile._get_default_tempdir(),
+    next(tempfile._get_candidate_names())+".kin" )
+
+  out = StringIO()
+  try:
+    # Run the program
+    args = [model_file, "output.overwrite=True", 'output.filename='+temp_file]
+    results = run_program(program_class=ribbons.Program, logger=out, args=args)
+
+  except Exception as e:
+    raise Exception("Could not call subprocess to do regression test: "+str(e))
+  instructions = ("Use KiNG or another program to see what changed and then determine if the "+
+      "differences are expected.  If so, modify the expected numbers of sheets and helices tested for.")
+
+  # Count the number of helices and sheets in the output
+  pattern = r"(\d+) helices and (\d+) sheets defined"
+  match = re.search(pattern, out.getvalue())
+  if match:
+       N = int(match.group(1))
+       S = int(match.group(2))
+  else:
+       raise Exception("Helix/sheet summary not found (printed by secondary structure manager)")
+
+  if int(N) != 3 or int(S) != 1:
+    raise Exception("Different number of helices ("+str(N)+" vs. 3) or sheets ("+str(S)+" vs. 1): "+instructions)
+
+if __name__ == '__main__':
+
+  RunRibbonTests()
+  print(format_cpu_times())
+  print('OK')
diff --git a/mmtbx/run_tests.py b/mmtbx/run_tests.py
@@ -274,6 +274,7 @@
   # validation/molprobity
   "$D/regression/tst_probe.py",
   "$D/regression/tst_reduce.py",
+  "$D/regression/tst_ribbons.py",
   "$D/validation/regression/tst_molprobity_arguments.py",
   "$D/validation/regression/tst_chiral_validation.py",
   "$D/validation/regression/tst_waters.py",