Formatting standard GFF3 output and more.
Major updates
- Format the GFF3 output following the standard specifications.
1.1. Add common TEs to the Sequence Ontology database.
1.2. Create an alias file to convert different TE naming system to the Sequence Ontology names. - Improve TE summary (
*.mod.EDTA.TEanno.sum
) by splitting overlapping TEs and force each bp annotated only once. Splitting rule (retaining preference): 1. Structural > homology; 2. Longer > shorter; 3. Nested inner > outer. (i.e., #98)
The split GFF3 file is located here if you want to replace the default one:*mod.EDTA.anno/*.mod.EDTA.TEanno.split.gff3
. - Add a script (make_panTElib.pl) to construct a pan-genome TE library from a list of TE libraries. This is a beta function.
Usage: perl make_panTElib.pl -liblist TElib.list [options]
Minor updates
- Detect SSRs in flanking sequences and label candidates as false. This can significantly accelerate the TIR and Helitron identification when SSRs are rich in the genome (i.e., #93 #96).
- Recover structurally intact Helitrons from the negative strand.
- Allow users to provide the path to dependencies.
How to
How to update old annotations to the current version?
- Backup old results, because the update will overwrite existing results (.gff3, .sum, etc).
- Navigate to the root of the working directory that contains EDTA working folders (i.e., .raw, combine, final, anno).
- Execute the patch script by providing the genome name (eg., genome.fa)
perl ..../EDTA/util/patch_1.8.3_to_1.9.0.pl genome.fa [threads]
- Check out the updated gff3 and summary results in the working directory.