-
Notifications
You must be signed in to change notification settings - Fork 73
Calculate LAI from EDTA GFF3 files
If you don't have the LTR_retriever .pass.list
file or the RepeatMasker .out
file anymore, you may still calculate the LTR Assembly Index with the EDTA .mod.EDTA.TEanno.gff3
file. Here I assume you have multiple genomes and each of them has the EDTA GFF3 file.
for genome in genome1.fasta genome2.fasta; do perl ~/bin/EDTA/util/gff2bed.pl $genome.mod.EDTA.TEanno.gff3 structural > $genome.mod.EDTA.TEanno.struc.bed & done
for genome in genome1.fasta genome2.fasta; do grep LTR $genome.mod.EDTA.TEanno.struc.bed|grep struc|awk '{print $1":"$2".."$3"\t"$7}' > $genome.mod.EDTA.TEanno.LTR.pass.list & done
for genome in genome1.fasta genome2.fasta; do perl -nle 'my ($chr, $s, $e, $anno, $dir, $supfam)=(split)[0,1,2,3,8,12]; print "10000 0.001 0.001 0.001 $chr $s $e NA $dir $anno $supfam"' $genome.mod.EDTA.TEanno.struc.bed > $genome.out.EDTA.TEanno.out & done
for genome in genome1.fasta genome2.fasta; do nohup perl ~/bin/LTR_retriever/LAI -genome $genome -intact $genome.mod.EDTA.TEanno.LTR.pass.list -all $genome.out.EDTA.TEanno.out -q -t 7 & done
There maybe slight differences for LAIs calculated from the original way and this altinative way. The differences less than 1 are negligible.