-
Notifications
You must be signed in to change notification settings - Fork 27
/
history.txt
320 lines (256 loc) · 15 KB
/
history.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
Version 4.10.0 changes:
--------------------------------------------
Some of these changes may be present in 4.09 and were undocumented, if
they were caught during launch. They are officially present in 4.10.0.
Major:
* TRF now has an exit status of 0 on success, and non-zero otherwise.
Previously, TRF would have a non-zero exit statys which was to
interpreted by the user as the number of TRs processed. However, this is
impractical since the maximum value an exit status might take is 255.
Also, the UNIX convention is to exit with a status of 0 on success. Non-
zero statuses are to be interpreted as errors, with certain values being
reserved for specific meaning.
* TRF will throw an error if a value of over 2000 is given for MaxPeriod
parameter. Our documentation states that this value must be between 1
and 2000, inclusive. Minor changes to the code have been made in
anticipation for raising this limit. However, very large TRs might be
considered outside the scope of TRF, as the dynamics involved in
duplication may be different than the assumptions made in the algorithm
are designed for.
Minor:
* We are beginning to adopt Semantic Versioning (https://semver.org/).
We may not be fully compliant yet, and compliance will rely on future
developers adhering to the convention. TRF 4.10.0 will be the first
version released with a semver complaint version numbers.
Fixes:
* TRF will no longer crash if the argument list is empty (regression in
4.09)
* TRF will no longer crash when given a MaxPeriod value of over 2004. As
above, TRF will simply not allow any value of MaxPeriod over 2000.
Internal changes:
trf.c:
* Fix: Defer checking presence of -v flag avoid a crash when the
argument list is empty.
* Check contents of `outputcount` and `endstatus` members of `paramset`
struct to determine success after running TRFControlRoutine. Print
informative messages on exit.
* Parameter validation. Will now print an error on most bad parameters,
along with a helpful message. Exits with status 1 in case of a bad
parameter.
trfrun.h:
* TRFControlRoutine now has a void return type. We no longer rely on a
return value from this function.
* Set paramset.endstatus (now a `char *`) to some informative message,
rather than returning some integer value defined by macros, when
exiting.
* More error checking. Don't exit immediately when allocating memory for
AlignPair struct. Instead, set error message and propagate back to main.
tr30dat.h:
* Reformatted with astyle.
* Some `char *` variables changed to `unsigned char *` if they were ever
used as an index to an array. Avoids compiler warnings, safer.
* Some `int` members in TRFPARAMSET definition changed to `unsigned int`
for safety/semantics.
* Note added on changes needed to EC array for support of pattern sizes
over 2000. See below, and major changes above.
tr30dat.c:
* Some `char *` variables changed to `unsigned char *` if they were ever
used as an index to an array. Avoids compiler warnings, safer.
* Set paramset.endstatus message in function-generating macro
new1Darrayfunc. Avoid exiting right away so that the exact error is
visible to callers/main.
* Also in new1Darrayfunc, use type of pointer, rather than type name.
Less important in a macro written in this way, but more consistent.
* Use `MAXPATTERNSIZECONSTANT` to walk through Consensus struct's
members, since that is the value used to initialize them. Added note on
changes needed to implementation should the limit on pattern size ever
be raised.
* Similarly when setting the `waiting_time_criteria` and
`k_run_sums_criteria` members in the structs in Distance array. Code in
place to "round down" pattern sizes above 2000, but too many other code
changes need to be made to actually support this, and I have no time to
test them.
Version 4.09 fixes: 02/19/2016
--------------------------------------------
Major changes:
* Allowed longer maximum length for a TR by a change to
MAXWRAPLENGTHCONST (see below). Now set to 2 million bases by default and
allowed to be increased by user using -l/-L option below. (Changed, because
original size was 500,000 and this was too small for HG38 which has a TR of
length 5.3 million bases.)
* Changed function GetTopPeriods. This function records distances between
identical two-letter words in the TR array in order to determine the most
commonly encountered periods. It is used to exclude periods that are not
common, yet detect a TR. The original function allowed a two-letter word to
record it's distance to a match back to the beginning of the TR array,
essentially a O(n^2) operation. Now, each two letter word can only record a
distance at most up to MAXDISTANCECONSTANT*3 = 6,000 bases. This makes the
code much faster for long TRs. Before the change, the program seemed to
hang, but was just in the quadratic time loop.
* New option, -l/-L, allows user to define expected length of longest TR,
in millions of bases. Default is 2 (= 2 million).
* Using getopt for option handling.
Makefile:
* New file
* Simple Makefile to produce binaries for Windows (dos), Linux, and
Mac OS X.
tr30dat.h:
* Platform defining macros are now given on the command line when
compiling (see Makefile).
* If one of the platform macros is not given, a message is displayed
and UNIXCONSOLE is chosen.
* MAXWRAPLENGTHCONST is no longer used. Now replaced by an option
(-l/-L) supplied by the user.
* Changed all instances of MAXWRAPLENGTH to paramset.maxwraplength
* Bandcenter is now dynamically allocated because this array depended
on the macro MAXWRAPLENGTHCONST being defined. It could not be
initialized using maxwraplength or paramset.maxwraplength since these
are undefined during compile time, and it may exceed the limits of the
stack if maxwraplength is large.
* maxwraplength is an unsigned int, to avoid improper casting and to
allow a larger range.
* Added member maxwraplength to TRFPARAMSET, which stores the options
given to TRF. It is an unsigned int, like the maxwraplength global
above.
* FASTASEQUENCE struct member length changed to unsigned int.
tr30dat.c:
* Changed function GetTopPeriods so that running time is closer to linear
instead of quadratic. Changed the maximum lookback distance test for a two-
letter word to "dist<(MAXDISTANCECONSTANT*3)" which is 6000 bases.
MAXDISTANCECONSTANT is the longest period reported, which is 2000 bases.
* Bandcenter is allocated at the start of newtupbo and free'd at the
end of that function.
* Changed all instances of MAXWRAPLENGTH to paramset.maxwraplength
* Some ints changed to unsigned ints to try and prevent overflow
issues. Specifically, the counting variable "r".
trfrun.h:
* Including windows .h files when WINDOWSGUI macro is defined. Meant
to eventually replace redundant copy under win32-gui directory.
* More informative error messages are produced when the program is
unable to allocate memory for stemp variable.
* maxwraplength now using the minimum of paramset.maxwraplength
(provided by user or with default) and pseq->length. The former is due
to the changes mentioned earlier.
* stemp now initialized using calloc instead of a malloc followed by
memset.
* stemp allocation uses a cast to size_t to avoid integer overflow
since the result of the multiplication may exceed limits of unsigned
ints.
* Added an fclose to quiet cppcheck complaint of a resource leak from
not closing a file handle on an error.
* Added a check for word size when printing memory allocation error
message, since the return type of sizeof varies by platform word
length.
trf.c:
* Now using getopt for ease of new option addition and for options
with agruments
* Updated documentation for new option
* Usage string is now a global const char*.
Version 4.08 fixes:
--------------------------------------------
* added a check for fgets in trfclean.h because it might enter infinite loop
(we had our tandem drive filled up with junk data, this was prob the cause)
Version 4.07b fixes: 10/01/12
--------------------------------------------
* fixed alignment files "file K of N", K was off by 1 before
* seems bestlist structure was not cleared
or dellocated properly, fixed now, was making multiple sequence files run slower and slower
Version 4.07 fixes: 8/14/12
--------------------------------------------
changes from trf4.06
in tr30dat.h
/*changes by Gary Benson on 8/14/12 */
#define versionstring "4.07"
/*changes by Gary Benson on 8/14/12 */
#define MIXED 2
in tr30dat.c
in newtupbo
/*changes by Gary Benson on 8/14/12 */
/* Allows traceback to go all the way back to row zero and doesn't stop at first zero score (like GLOBAL) but allows column c to wraparound (like LOCAL) */
/* Second call, further below, is still with LOCAL, but occurs after consensus has been found, and still stops traceback at first score of zero */
get_narrowband_pair_alignment_with_copynumber( d, min(2*max(MINBANDRADIUS,d_range(d)), (d/3) ), MIXED);
in get_narrowband_pair_alignment_with_copynumber
/*changes by Gary Benson on 8/14/12 */
/* macros for get_narrowband_pair_alignment_with_copynumber now include ((option==LOCAL)||(option==MIXED)) which replaces (option==LOCAL) */
/*changes by Gary Benson on 8/14/12 */
/* Allows traceback to go all the way back to row zero and doesn't stop at first zero score (like GLOBAL) but allows column c to wraparound (like LOCAL) */
/* ((option==GLOBAL)||(option==MIXED)) now replaces (option==GLOBAL) */
/* (c==-1) removed */
if (((option==LOCAL)&&(S[r][i]<=0))||(((option==GLOBAL)||(option==MIXED))&&(r==0)))
Version 4.06 fixes:
1) Changed memory allocations in newdistancelist to be smaller. (Y.Gelfand)
2) Doing multiple sequences now does not write to disc (Y.Gelfand)
3) Added -ngs sqitch to output less verbose .dat output for multiple sequences (Y.Gelfand)
4) Added check to make sure all required paramters are entered from commandline (Y.Gelfand)
Version 4.05 fixes:
1) Changed memory allocation of S array to be smaller when fasta sequences are small (supposedly callocs were taking too much time)
Version 4.04 fixes:
1) Martin Frith finds another problem and Dr. Benson tracks it down to radius of narrowband being too small and alignment never getting back.
Version 4.03 fixes:
1) Martin Frith finds a problem with vanishing tandem repeat and Dr. Benson tracks it down as a line in the redundancy algorithm being in the wrong loop.
Version 4.02 fixes:
1) Yevgeniy removes all references to long
Version 4.01 fixes:
1) Yevgeniy finds and fixes a bug when N characters being part of pattern cause problems
Version 4.00 fixes on 12/13/2004:
1) Yevgeniy makes needed changes for UNIX_GUI implementation using gtk+.
2) Dr. Benson replaces new_meet_criteria() with new_meet_criteria_3().
3) Alfredo and Dr. Benson replace multiples_criteria() and find_smallest_multiple()
with multiples_criteria_4() and GetTopPeriods().
4) Yevgeniy makes change to LoadSequenceFromFile adding a check for EOF in one of
the while loops.
5) -h for suppressing HTML output was added. this includes new routine trf_message().
6) Corrected output of tuple distances to html file. Was printing 'MAXDISTANCE'
instead of value of MAXDISTANCE variable.
7) Modified add_to_bestperiodlist(int d) and adjust_bestperiod_entry(int d). Added d to parameter list so could check if distance was 1. Due to change in multiples_criteria_4 which doesn't compute periods if the distance is 1.
8) Modified narrowbandwrap. After reverse alignment, find bandcenter of row with maximum score. This is made bandcenter of first row of forward alignment, rather than mincol. Zero is set at mincolumn rather than bandcenter (when they are different). This allows the alignment to start up the same way it ended. Changed because a repeat in the set of 39 was lost this way. It may cause other repeats to be lost in future.
9) Substitute distanceseenarray for distanceseenlist. List maintains one record at most for each pattern size aligned either with or without consensus. The most recent alignment with the pattern size is stored. The array and associated functions are installed and distanceseenlist and functions are commented out.
10) Change NUMBER_OF_PERIODS definition so that top 5 are copied into Sortmultiples for compatibility with best_period data structure, although I don't think we still need this. It has not been removed.
11) Changed alignment routines so that *longest* best scoring alignment is found. This is done by changing a test from > to >=. (2/17/05)(3/14/05)
12) Changed alignment routines so that zeroth row has gap penalties for not starting in zeroth column, and -1000 penalties for starting out of bounds. (3/14/05)
13) Modifications to distanceseenlist routines which are now superceded. (3/14/05)
14) In searching for best periods, search only up to the MAXDISTANCE so larger periods don't kill good ones in range.
15) Reduced minimum copy number for larger patterns. 1.9 for patternsize<=50, 1.8 for patternsize>100, ramp from 1.9 down to 1.8 for patternsize >50 and <=100. (2/17/05)
16) Bug fix in trfclean.h to create files even if no repeats are returned.
Version 3.30 fixes:
A bug that caused buffer overrun and crashed the program was
corrected. Files affected: tr30dat.h, tr30dat.c, and trfrun.h.
Explanation:
Variable ConsClasslength was updated by get_consesus() and then that
value was used to save to an array. The array was incorrectly sized
Consensus_count[MAXPATTERNSIZE+1], one for each "possible" pattern
size. The problem was that sometimes the updated ConsClasslength was
greater than the expected MAXPATTERNSIZE so the statement:
Consensus_count[Classlength]++;
updated memory outside of the originaly allocated block. This would
manifest itself as the program crashing as it attempted to free the
Consensus_count array. Presumably this process corrupted the memory
manager's "free list".
These are the changes made for while working on a new gtk version
on Aug 27, 2003
1) changed the "type *functionname(length)" to "type *functionname(int length)"
in dtr30dat.c. This error was not caught before because some compilers
default the undeclared type to int automatically.
These are the changes made for bug-fix release v3.21 on Jan 16, 2003
1) Fix bug concerning end of line character on the sequence name. This
change was on file trfrun.h
These are the changes made for new release v3.20 on Jan 9, 2003.
1) Changed trfini.h:
There was a string declared as static (i.e. char *string=" ";) that
was being modified by the code.
Changed to char string[] = {' ',' ','\0'}; so changing it no longer
is a problem.
2) Changed trfrun.h:
Allocated more space for Statistics_Distance array. Number of units
allocated befored was MAXDISTANCE+d_range(MAXDISTANCE)+1. This was
causing problems on the get_statistcis routine where due to alignment
insertions there was more space required.
Now we just allocate MAXDISTANCE*4.
3) Changed trfclean.h:
Made changes to the redundancy criteria as to how it uses the overlap
to remove redundant repeats. The old way was breaking inner loop too
early. This use to leave some redundant repeats in the file which
should had been removed.
4) Changed tr30dat.c:
Made additional check in get_statistics to ensure that even with the
change #2 we did not write past the allocated memory.