Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do sequence heads have to match? #3

Open
vkeggers opened this issue Jul 9, 2024 · 3 comments
Open

Do sequence heads have to match? #3

vkeggers opened this issue Jul 9, 2024 · 3 comments

Comments

@vkeggers
Copy link

vkeggers commented Jul 9, 2024

I'm comparing two different nematode genomes and they are a little fragmented (the main 6 chromosomes + a few extra contigs) and each have different sequence names and number of contigs. Do the sequence names of all the files have to match?

Traceback (most recent call last):
File "/home/veggers/.conda/envs/transposition_detector_detect/share/TranspositionEventDetector_deTEct/TranspositionDetector.py", line 81, in
parseSniffles_SVs(seqHeadFile, svFile, ouFile1)
File "/home/veggers/.conda/envs/transposition_detector_detect/share/TranspositionEventDetector_deTEct/ParserSniffles.py", line 46, in parseSniffles_SVs
fW.write(sequenceDictB[chrom]+"\t"+"SVIM"+"\t"+"insertion"+"\t"+start+"\t"+end+"\t"+"."+"\t"+"+"+"\t"+"."+"\t"+info)
~~~~~~~~~~~~~^^^^^^^
KeyError: 'CM021144.1_356'

I guess I could just extract the main chromosomes from all the files and standardize the names if this is the case.

@DerKevinRiehl
Copy link
Owner

Dear Viktoria,
thanks for for your interest in our work.

Yes you are right, the names shall be standardized.
The best would be something like "Sequence_1", as other softwares like transposon reasonate also do that when processing files, they standardize and rename the sequences of fasta files.

Please let me know if this worked for you,
Best, Kevin

@vkeggers
Copy link
Author

Right, my problem is just that one of the assemblies isn't chromosome scale. I know in my first post it was just 2 genomes, but that was to keep the question simple. I actually have like 10ish remanei/latens/briggsae species. Most of these are chr scale but one is pretty fragmented.

I turned it around and annotated TEs in the reference, got a vcf from the reference and query alignment, and then all the chr names match automatically bc the reference was used for both. Previously I was annotating TEs in the query, which if not chr scale won't align with the vcf.

Anyways, I'm not sure if one way is particularly right, but I got a similar pattern using either method. The only difference was that more events were found when using TEs annotated from the reference. But like I said, the pattern is the same and similar to your paper

Thanks Kevin

@DerKevinRiehl
Copy link
Owner

Great to hear :-)
If the problem is solved, can we close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants