Releases: oushujun/EDTA
Releases · oushujun/EDTA
v1.8.2
v1.7.8
Fix small bugs for the conda release
- Added the
--convert_seq_name
toEDTA_raw.pl
so that it can work withEDTA.pl
or independently. - Fix a small bug for input genome check.
Get ready for the conda version
This version has a couple of updates that can potentially bring EDTA into the conda world. All changes are at the coding level, while results should be unchanged.
- both LTR_retriever and LTR_FINDER were replaced by respective conda installations thanks to @Juke34's contribution. The LTR_FINDER_parallel wrapper was updated to v1.1 that can work with the conda installed LTR_FINDER.
EDTA_raw.pl
was updated to utilize conda dependencies and is able to convert overly long genome sequence names.
Resolving conda conflicts
A couple of useful updates
- Change to use the
ENV
defaultperl
instead of using/usr/bin/perl
(#47) - Replace precompiled binary GenomeTools and GenericRepeatFinder with conda recipies (contributed by @Juke34).
- Check sequence names for the input genome. Remove annotations (after the first space in the seq ID line) and shorten seq IDs to <= 15 characters. The original genome file is untouched, the modified file is named
$genome.mod
and used in the EDTA analysis (#35, #40, #44).
Some important updates
This release has a number of important updates:
- Further clean the input CDS file based on repetitiveness. If a sequence in the CDS file occurs >= 10 times in the raw TE library, then this sequence is likely a repeat sequence and removed from the CDS file.
- Purge gene-contained sequences in intact TE elements.
- Use the conda-based Python3
TEsorter
to replace the Python2TEsorter
. Thanks to @Juke34's work! (#39) - Use
Getopt
to read program parameters. Option names have the long format now. e.g., previously-genome
changed to--genome
. Contributed by @Juke34 (#46) - The docker/singularity version of EDTA has been tested and available!
Other updates
Official release of EDTA
After a couple months' public testing and solving dozens of issues, EDTA has matured to a point that is worth an official release. Thank you all for your bug reports and feature requests.
Summary of functions and features
- Identify high-quality structurally intact TEs including LTR retrotransposons, TIR transposons, and Helitrons.
- Reduce false classifications and nested insertions between intact TEs to create a homogenized TE library.
- Accept user input TE library to identify novel TEs.
- Accept user input CDS to remove gene sequences in the TE library.
- Exclude user-specified genomic regions (i.e., gene regions) from TE masking.
- Perform whole-genome TE annotation and produce a gff file with both structurally intact and homology-based TE annotations.
- Produce self-evaluation results for users to check annotation consistency.
- Automatic checkpointing, so that EDTA can automatically start from where it was interrupted.
- Multithreading-enabled. Analyzing a maize genome (2.3 Gb, >85% TE) takes less than a week (-threads 36).
- Include a companion benchmarking pipeline for developers and researchers to test the annotation quality of custom TE libraries.
Citation
Please cite our paper if you find EDTA is useful:
Ou S., Su W., Liao Y., Chougule K., Ware D., Peterson T., Jiang N.✉, Hirsch C. N.✉ and Hufford M. B.✉ (2019). Benchmarking Transposable Element Annotation Methods for Creation of a Streamlined, Comprehensive Pipeline. Genome Biol. 20(1): 275.