-
Notifications
You must be signed in to change notification settings - Fork 8
/
CHANGES.txt
71 lines (71 loc) · 5.79 KB
/
CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
21122021: _collinear_genes.pl tested with -s
27122021: get_pangenes.pl integrates _cut_sequences.pl & _collinear_genes.pl
04012022: checks regex matches chr names
04012022: tested with -s, made test_rice
10012022: wfmash only needed with -w
25012022: gene BED files checked as they might be interrupted
28012022: added -H, trying out diff MINMASKLEN values in barley
03022022: while parsing GFF files, chr names and extracted sequences are checked
11022022: genomic segments are added as segment_collinear features in _collinear_genes.pl
15022022: genomic segments are used to produce .gdna.fna clusters in get_pangenes.pl
02032022: added & tested _collinear_genes.pl -sg
04032022: added & tested get_pangenes.pl -s
04032022: get_pangenes.pl -s prints ANI matrix from GSAlign estimates
15032022: collinearity TSV files sorted
15032022: pangenome matrices are chr-sorted with -s
16032022: _cluster_analysis.pl unclusters non-neighbors more than $MAXDISTNEIGHBORS away
16032022: BED-like pangenome matrices produced with -s
17032022: updated documentation in README
24032022: added get_pangenes.pl -N (and _cluster_analysis.pl -m)
04042022: tested wfmash v0.8.1-25-g1344b9e on rice chr1 testset (-p 80 -s 1000)
20042022: added sub split_genome_sequences_per_chr_bedtools to _collinear_genes (uses faster bedtools)
20042022: adopted minimap2-2.24 and updated Makefile
22042022: save compressed copy of $merged_tsv_file as evidence of clusters
22042022: check_evidence.pl prints basic stats for a cluster
05052022: check_evidence.pl -f prints GFF fixes for long gene models
06052022: check_evidence.pl -f prints sorted- non-overlapping GFF fixes for long gene models
11052022: check_evidence.pl -f tested for split and missing gene models, premature stop codons tracked
11052022: GFF patches, maybe created with check_evidence.pl, can now be used with get_pangenes.pl -p, existing WGAs reused
20052022: prints WGA stats summary
27052022: check_evidence.pl -s -r oryza_sativa_RAPDB,oryza_sativa_MSU appends to file mode isoform sequence, preferring refs
02052022: check_evidence.pl locates internal stop codons in CDS sequences, those cannot be used to fix gene models
07062022: check_evidence.pl skip long segments with candidate long/split genes, uses $MAXSEGMENTSIZE
09062022: check_evidence.pl sub liftover_gmap calls validates lifted CDS sequences with sub no_premature_stops
21062022: check_evidence.pl split models only fixed if mapped gene overlaps >= MINFIXOVERLAP=0.75
21062022: check_evidence.pl long models only fixed if mapped gene pair overlaps >= MINFIXOVERLAP=0.75
29062022: added _dotplot.pl to produce dotplot figures with R package pafr
14072022: added _chunk_chr.pl to break long chromosomes in chunks separated by geneless stretches (testing only)
30082022: tested get_pangenes.pl with rice, barley and wheat
12092022: check_evidence.pl -f -v prints out GMAP alignments, -p allows for partial CDS lift-over
13092022: check_evidence.pl -f does not use long/short isoforms for lift-over
09022023: fixed segments with flipped species while producing PAF in _dotplot.pl
20022023: get_pangenes.pl stops if 0 genes parsed from GFF
07032023: _collinear_genes.pl prints out all unmapped genes and the underlying cause
09032023: _collinear_genes.pl now maps genes in WGAs in both strands, added optional -n
17032023: _cluster_analysis.pl now merges disjoint clusters (diff species, 75% supporting edges) caused often by split gene models
30032023: check_evidence.pl -P prints python code to plot genomic context of pangene cluster, requires pyGenomeViz
03042023: discard partially mapped genes at the ends of chrs (query2ref) in _collinear_genes.pl
26042023: updated documentation and added Example 2: pangene and Presence-Absence Variation (PAV) analysis
04052023: added check_quality.pl and updated documentation
30082023: completed target test_pangenes in Makefile
30082023: Makefile target install_pangenes now system-installs pangenes/cpanfile; these are core modules, only DB_file seems to be lacking in Travis
25092023: added match_cluster.pl and updated documentation; uses .cdna.fna clusters by default, -C seems slower
24102023: added sub select_GFF_valid_genes to leave out non-coding genes; see $GFFACCEPTEDFEATS & $GFFVALIDGENEFEAT
24102023: get_pangenes.pl now prints number of valid & non-valid genes in each input GFF file (details saved in .gff.log files)
25102023: identical gene ids are now supported accross annotations, thanks for reporting the bug Pimmy!
10112023: improved calculation of overlap coordinates from WGA segments in different strands
14112023: genes lacking cDNA are skipped in _cluster_analysis.pl
15112023: removed bug from 25102023 that impaired removal of non-local genes, this duplicated a few clusters
23112023: match_cluster.pl now takes a complete pangene results folder ie Oryza_nivara_v1chr1_alltaxa_5neigh_algMmap_
24112023: added rename_pangenes.pl to assing pangene IDs to previously computed clusters
10012024: fixed bug in handling - strand coords in sub query2ref_coords
11012024: sub _parseCIGARfeature handles correctly 1bp CS-type SNPs when computing overlap with optional query coord
28022024: removed bug from 25102023 that misordered non-reference clusters in matrices
28022024: added -n to avoid intervining non-reference pangenes in _cluster_analysis.pl
28022024: added -f to get_pangenes.pl, which calls _cluster_analysis -v to make blocks of ref genes
24052024: check_quality.pl -h prints header
01082024: added -S to rename_pangenes.pl
25092024: added section 'Example 6: estimation of haplotype diversity'
03102024: get_pangenes.pl expects min 95% sequence identity for WGA-based gene alignments, as in GET_HOMOLOGUES-EST, to help avoid diverged tandem copies
04102024: get_pangenes.pl now set MAXDISTNEIGHBORS=2, neighbor genes in a cluster cannot be more than 2 genes away
09102024: rename_pangenes.pl -r creates all expected outfiles, tested with rice data