Releases: Ensembl/plant-scripts
04102024
This release ships with get_pangenes.pl version 04102024.
Main changes are:
25092024: added section 'Example 6: estimation of haplotype diversity'
03102024: get_pangenes.pl expects min 95% sequence identity for WGA-based gene alignments, as in GET_HOMOLOGUES-EST, to help avoid diverged tandem copies
04102024: get_pangenes.pl now set MAXDISTNEIGHBORS=2, neighbor genes in a cluster cannot be more than 2 genes away
11012024
This release ships with updates to GET_PANGENES: code changes since the publication of the manuscript, involving:
- fixed bug in handling - strand coords in sub query2ref_coords
- sub _parseCIGARfeature handles correctly 1bp CS-type SNPs when computing overlap with optional query coord
- tested rename_pangenes.pl with MAGIC16 rice dataset, check AgBioData nomenclature rules at https://github.com/Ensembl/plant-scripts/blob/df9cfdef5e49e6f463a08e7ed8ec8a04556735ff/pangenes/rename_pangenes.pl#L5C48-L5C57 ; code to update a previous cluster set not yet in place
15112023
This release ships with updates to:
-
GET_PANGENES: code and documentation changes since the publication of the manuscript, involving improved handling of input GFF files and calculation of overlap coordinates from WGA segments in different strands.
-
REST-based recipes.
pangenes_benchmark
Pangene sets of Arabidopsis (ACK), rice, wheat and barley datasets produced while benchmarking get_pangenes as described at https://doi.org/10.1186/s13059-023-03071-z and https://www.biorxiv.org/content/10.1101/2023.01.03.520531v2
The HOWTO* files contain the actual commands required to produce these results with the input FASTA & GFF files (32GB), which should be first be downloaded from
test_rice
Toy dataset to test the scripts for pan-gene analysis.
nrTEplants
Release 0.3 (Jun2020) the nrTEplants library of plant transposable elements which minimizes overlap with sequence containing protein domains known to be part of NLR genes. This sequence set was computed after combining TREP, SINEbase, REdat, RepetDB, EDTArice, EDTAmaize, SoyBaseTE, TAIR10TE, SunflowerTE, MelonTE, RosaTE and SUNREP and obtaining a non-redundant collection with GET_HOMOLOGUES-EST.
Check the code and documentation at https://github.com/Ensembl/plant_tools/tree/master/bench/repeat_libs
Citation: Contreras-Moreira,B., Filippi,C.V., Naamati,G., Girón,C.G., Allen,J.E. and Flicek,P. (2021) Efficient masking of plant genomes by combining kmer counting and curated repeats Genomics. Plant Genome https://doi.org/10.1002/tpg2.20143
23102020
This release was created to obtain a DOI from Zenodo