Release EDTA v2.0.0 - faster, better, and nicer! · oushujun/EDTA

Performance improvements

Set to use the original LTRharvest and LTR_FINDER when --threads 1. It will be much faster for highly fragmented genomes (> 5,000 sequences) by reducing the number of files created (#225). Users may run EDTA_raw.pl for each TE type with --threads 1, then run EDTA.pl with multi threads and --overwrite 0.
Improve the filtering scheme for TE flanking sequences that are highly repetitive. If both flanking sequences are repetitive, filter out those with copy number > 50k on either side (Based on feedback from Zhigui Bao @baozg). This will avoid program suspension due to the long stretch of tandem repeats that exist in high-quality genomes.
Improve and polish the filtering scheme suggested by Sergei Ryazansky @DrHogart (#136).

New features

change the longest sequence ID limit from 15 to 13 characters to allow sequences > 100 Mb (#239).
support renaming LTR sequences that RepeatModeler reports via --sensitive 1 (#184).
support renaming TEsorter libraries (#184).
cleanup_nested.pl: added the -clean option to allow for cleaning or not cleaning nested sequences.
get_consistent_TE.pl: a new script that helps find TEs that are consistently annotated in a genome.
add more specific guides for EDTA usage installed via conda (#208).
rename and save the existing.EDTA.intact.fa.out file when using the parameter --overwrite 0.
Updated EDTA_processI.pl and TE_purifier.pl: redirect RepeatMasker error msgs to STDERR suggested by Nathalie de Vries.
make_panTElib.pl: a matured script that helps to create a pan-genome TE library for pan-genome TE annotations. A documented usage example (with great details) can be found here: https://github.com/HuffordLab/NAM-genomes/tree/master/te-annotation

Issues fixed

Resolve classification inconsistency when --curatedlib is provided
1. Added new entries and alias to the TE SO database (#219).
2. Format sequence IDs for library files provided via --curatedlib to use the TE SO system (#220).
3. check TIR classification discrepancy between candidate seq and lib seq with TE_SO name conversion.
Resolve singularity warnings by adding "LC_ALL=C" and author info to the Dockerfile (#122).
Fix #150 when flanking sequence is empty.
Fixed typos in EDTA.pl and EDTA_processI.pl reported by Nathalie de Vries.

Note

If your run was successful with version 1.9.4+ and didn't notice any particular errors, you may not need to rerun it with 2.0.0. The core filtering algorithms are not very different between these versions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EDTA v2.0.0 - faster, better, and nicer!

Performance improvements

New features

Issues fixed

Note

Contributors