Do sequence heads have to match? #3

vkeggers · 2024-07-09T22:43:58Z

I'm comparing two different nematode genomes and they are a little fragmented (the main 6 chromosomes + a few extra contigs) and each have different sequence names and number of contigs. Do the sequence names of all the files have to match?

Traceback (most recent call last):
File "/home/veggers/.conda/envs/transposition_detector_detect/share/TranspositionEventDetector_deTEct/TranspositionDetector.py", line 81, in
parseSniffles_SVs(seqHeadFile, svFile, ouFile1)
File "/home/veggers/.conda/envs/transposition_detector_detect/share/TranspositionEventDetector_deTEct/ParserSniffles.py", line 46, in parseSniffles_SVs
fW.write(sequenceDictB[chrom]+"\t"+"SVIM"+"\t"+"insertion"+"\t"+start+"\t"+end+"\t"+"."+"\t"+"+"+"\t"+"."+"\t"+info)
~~~~~~~~~~~~~^^^^^^^
KeyError: 'CM021144.1_356'

I guess I could just extract the main chromosomes from all the files and standardize the names if this is the case.

DerKevinRiehl · 2024-07-10T07:59:12Z

Dear Viktoria,
thanks for for your interest in our work.

Yes you are right, the names shall be standardized.
The best would be something like "Sequence_1", as other softwares like transposon reasonate also do that when processing files, they standardize and rename the sequences of fasta files.

Please let me know if this worked for you,
Best, Kevin

vkeggers · 2024-07-25T02:41:31Z

Right, my problem is just that one of the assemblies isn't chromosome scale. I know in my first post it was just 2 genomes, but that was to keep the question simple. I actually have like 10ish remanei/latens/briggsae species. Most of these are chr scale but one is pretty fragmented.

I turned it around and annotated TEs in the reference, got a vcf from the reference and query alignment, and then all the chr names match automatically bc the reference was used for both. Previously I was annotating TEs in the query, which if not chr scale won't align with the vcf.

Anyways, I'm not sure if one way is particularly right, but I got a similar pattern using either method. The only difference was that more events were found when using TEs annotated from the reference. But like I said, the pattern is the same and similar to your paper

Thanks Kevin

DerKevinRiehl · 2024-07-28T10:07:08Z

Great to hear :-)
If the problem is solved, can we close this issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do sequence heads have to match? #3

Do sequence heads have to match? #3

vkeggers commented Jul 9, 2024

DerKevinRiehl commented Jul 10, 2024

vkeggers commented Jul 25, 2024

DerKevinRiehl commented Jul 28, 2024

Do sequence heads have to match? #3

Do sequence heads have to match? #3

Comments

vkeggers commented Jul 9, 2024

DerKevinRiehl commented Jul 10, 2024

vkeggers commented Jul 25, 2024

DerKevinRiehl commented Jul 28, 2024