-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
206 lines (139 loc) · 7.31 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
ptgraph2.py
-----------
ptgraph2 parses the output of STRIDE
(Frishman and Argos 1995 "Knowledge-Based Protein Secondary Structure
Assignment" PROTEINS 23-566-579) or DSSP (Kabsch and Sander 1983)
to find helices and strands and builds
a graph structure from those using structure and H-bond information
from STRIDE or DSSP, and uses BioPython Bio.PDB to parse ATOM cards from
PDB files to get geometric information for Carbon-alpha atoms used in some
calculations.
It is written in Python (2.5.1) and depends on some Python libraries:
* BioPython 1.43 (http://www.biopython.org).
+ mxTextTools (eGenix mx-Extension base 2.0.6) (http://www.egenix.com/files/python/egenix-mx-base-2.0.6.tar.gz)
+ PyDot 0.9.10 (http://dkbza.org/data/pydot-0.9.10.tar.gz)
- PyParsing 1.4.6 (http://sourceforge.net/project/showfiles.php?group_id=97203&package_id=104017&release_id=500496)
By default, DDOMAIN is used for domain identification. DDOMAIN is described in:
Zhou, Xue, Zhou 2007 'DDOMAIN: Dividing structures into domains using a
normalized domain-domain interaction profile' Protein Science 16:947-955.
DDDOMAIN is available as a 64-bit linux executable and FORTRAN-77 source code
from http://sparks.informatics.iupui.edu/Resource_files/DDOMAIN.tar.gz
GraphViz itself is also required (http://www.research.att.com/sw/tools/graphviz)
Note that a modified version of STRIDE is required. This in the stride/
directory. The license allows this for academic use (see stride/stride.doc).
Instead of using GraphViz (dot/neato), the default output
is now SVG for use with the Dunnart constraint-based
diagram editor:
http://www.csse.monash.edu.au/~mwybrow/dunnart/
Dunnart (0.14+SVN) has been modified to work with this by including
strand and helix shapes, amongst other things (by Michael Wybrow).
(Currently not generally available).
Developed on Linux 2.6.9 (x86_64) with Python 2.5.1
and BioPython 1.43 with Numeric 24.2
Note that the Bio.PDB library needs to be patched with a modification
I made to fix some things in PDBIO.py. This patch is in
Bio.PDB.PDBIO.diff
There is another diff to Bio.SCOP in order to correctly get all genetic
domains in sequence nonredundant subsets, this pach is in
Bio.SCOP.__init__.diff
Example usage:
ptgraph2.py 1QLP.pdb
This will produce the 1QLP.svg file which is opened with dunnart which will
automatically reformat it and can be edited there if required. Then save
it and it can be re-opened with other SVG viewers or editors.
pytableaucreate.py
-------------------
pytableaucreate.py is a program for generating protein tableaux in either
numeric or standard tableau format (Kamat and Lesk (2007); Konagurthu et al
(2008)). It uses routines (pttableau.py) also used internally by ptgraph2.
domainfc.py, evaldd.py, traindomfc.py
-------------------------------------
These programs are for domain decomposition. They have ended up in the
same directiory as ptgraph as the share some code (domainfc.py builds a
protein SSE graph like ptgraph2, and they all use ptdomain.py).
domainfc.py performs domain decomposition using clustering on a protein graph,
and also allows evaluation against a benchmark.
evaldd.py allows evaluation of domain decomposition methods against each
other or a benchmark.
traindomfc.py varies parameter and runs each parameter against a benchmark
to find best value, outputting a table for use in the R
statistics package.
fastcom.py is a wrapper for FastCommunity used by domainfc.py
buildtableauxdb.py, tabsearchqpml.py, tabmatchpbool.py, tabsearchdp.py
----------------------------------------------------------------------
Programs for building tableaux database and querying it.
tabsearchqpml.py depends on MATLAB and the mlabwrap library, and numpy:
. mlablwrap
http://mlabwrap.sourceforge.net/
v1.0 was used.
. numpy
http://numpy.scipy.org/
v1.01 was used
MATLAB Version 7.2.0.283 (R2006a) was used.
tsevalfn.py
-----------
tsevalfn.py is used to evaluate the output of tabsearchqpml.py against
SCOP (using Bio.SCOP) as ground truth, generating tables that can
be read into R and plotted as ROC curves with plotsearchroc.r
getnrpdblist.py
---------------
getnrpdblist.py is a program for generating a list of nonredundant PDB
identifiers from the output of CD-HIT, available from
http://bioinformatics.ljcrf.edu/cd-hi
convdbnumeric2ascii.py
----------------------
convert Numeric tableaux db built by buildtableauxdb -n to fixed width
text file suitable for parsing by FORTRAN (or other) external programs
such as tsrchn.
convdbpacked2ascii.py
---------------------
convert PTTableauPacked tableaux db build by buildtableauxdb to text
file suitable for parsing by FORTRAN (or other) external programs
such as tsrchd.
convdb2.py
----------
convert two database files, the PTTableauPacked tableaux db and a Numeric
distance matrix db both build by buildtableauxdb to text file containing
tableau and distance matrix for each entry suitable for parsing by
FORTRAN (or other) external programs such as tsrchd.
Directories
-----------
The webserver directory contains HTML and CGI code for the protein
topology graph webserver. Note that the mapApp subdirectory there
contains the mapApp library from http://www.carto.net/papers/svg/gui/mapApp/
as decribed in the README and license file therein.
The doc directory contains documentation about protein topology cartoons.
The slides directory contains seminar slides about protein topology cartoons.
The TableauCreate and stride directories contain modified versions of
TableauCreator and STRIDE, respectively.
(NOTE: TableauCreate is now obsolete, an internal implementation
of protein tableaux is now used rather than running TableauCreator).
The domain_doc directory contains docuemntation about domain decomposition
(domainfc.py etc.)
The domain_output directory contains output from traindom.py and domainfcy.py,
used for determining parameters and performance and building graphs using R
(from the Makefile in the domain_doc_directory).
ADS
Tue Jan 1 14:40:35 EST 2008
$Id: README 3421 2010-03-10 03:39:37Z alexs $
ptgraph.py (OBSOLETE)
----------
ptgraph.py is a program to draw two-dimensional graph representations of
protein topology. It reads information from a PDB file and writes a graph
rpresentation of the topological structure.
It is written in Python and depends on some Python libraries:
. pymmLib: python macromolecular library (1.0.0) http://pymmlib.sourceforge.net
Painter and Merritt 2004 "mmLib Python toolkit for manipulating annotated
structural models of biological macromolecules" J. Appl. Cryst. 37:174-178.
. pydot: python interface to GraphViz (0.9.10) http://dkbza.org/pydot.html
(note these libraries themselves depend on other libraries, e.g. numpy,
these transitive dependencies are not listed here).
GraphViz itself is also required (http://www.research.att.com/sw/tools/graphviz)
Need to use DSSP or STRIDE to get secondary structure then other algorithms
to get supersecondary structure (group strands into sheets etc.). But for
now we assume there are HELIX and SHEET records in the PDB and use that
to define the structure.
Due to mmLib (and another half-dozen libraries I looked at) basically having
far too many problems (mmLib seems to fail to build data structures from
SHEET records on most PDB files for instance) ptgraph.py reprsents a
now abandoned approach.