Skip to content

Calculate LAI from EDTA GFF3 files

Shujun Ou edited this page Dec 5, 2022 · 2 revisions

If you don't have the LTR_retriever .pass.list file or the RepeatMasker .out file anymore, you may still calculate the LTR Assembly Index with the EDTA .mod.EDTA.TEanno.gff3 file. Here I assume you have multiple genomes and each of them has the EDTA GFF3 file.

1. gff to bed

for genome in genome1.fasta genome2.fasta; do perl ~/bin/EDTA/util/gff2bed.pl $genome.mod.EDTA.TEanno.gff3 structural > $genome.mod.EDTA.TEanno.struc.bed & done

2. get pass.list

for genome in genome1.fasta genome2.fasta; do grep LTR $genome.mod.EDTA.TEanno.struc.bed|grep struc|awk '{print $1":"$2".."$3"\t"$7}' > $genome.mod.EDTA.TEanno.LTR.pass.list & done

3. bed to rmout

for genome in genome1.fasta genome2.fasta; do perl -nle 'my ($chr, $s, $e, $anno, $dir, $supfam)=(split)[0,1,2,3,8,12]; print "10000 0.001 0.001 0.001 $chr $s $e NA $dir $anno $supfam"' $genome.mod.EDTA.TEanno.struc.bed > $genome.out.EDTA.TEanno.out & done

4. run LAI

for genome in genome1.fasta genome2.fasta; do nohup perl ~/bin/LTR_retriever/LAI -genome $genome -intact $genome.mod.EDTA.TEanno.LTR.pass.list -all $genome.out.EDTA.TEanno.out -q -t 7 & done

There maybe slight differences for LAIs calculated from the original way and this altinative way. The differences less than 1 are negligible.