EDTA v2.0.0 - faster, better, and nicer!
Performance improvements
- Set to use the original LTRharvest and LTR_FINDER when
--threads 1
. It will be much faster for highly fragmented genomes (> 5,000 sequences) by reducing the number of files created (#225). Users may runEDTA_raw.pl
for each TE type with--threads 1
, then runEDTA.pl
with multi threads and--overwrite 0
. - Improve the filtering scheme for TE flanking sequences that are highly repetitive. If both flanking sequences are repetitive, filter out those with copy number > 50k on either side (Based on feedback from Zhigui Bao @baozg). This will avoid program suspension due to the long stretch of tandem repeats that exist in high-quality genomes.
- Improve and polish the filtering scheme suggested by Sergei Ryazansky @DrHogart (#136).
New features
- change the longest sequence ID limit from 15 to 13 characters to allow sequences > 100 Mb (#239).
- support renaming LTR sequences that RepeatModeler reports via
--sensitive 1
(#184). - support renaming TEsorter libraries (#184).
cleanup_nested.pl
: added the-clean
option to allow for cleaning or not cleaning nested sequences.get_consistent_TE.pl
: a new script that helps find TEs that are consistently annotated in a genome.- add more specific guides for EDTA usage installed via conda (#208).
- rename and save the existing
.EDTA.intact.fa.out
file when using the parameter--overwrite 0
. - Updated
EDTA_processI.pl
andTE_purifier.pl
: redirect RepeatMasker error msgs to STDERR suggested by Nathalie de Vries. make_panTElib.pl
: a matured script that helps to create a pan-genome TE library for pan-genome TE annotations. A documented usage example (with great details) can be found here: https://github.com/HuffordLab/NAM-genomes/tree/master/te-annotation
Issues fixed
- Resolve classification inconsistency when --curatedlib is provided
- Resolve
singularity
warnings by adding "LC_ALL=C" and author info to the Dockerfile (#122). - Fix #150 when flanking sequence is empty.
- Fixed typos in EDTA.pl and EDTA_processI.pl reported by Nathalie de Vries.
Note
If your run was successful with version 1.9.4+ and didn't notice any particular errors, you may not need to rerun it with 2.0.0. The core filtering algorithms are not very different between these versions.