Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MEI Cluster #8

Open
johnemajor opened this issue Dec 4, 2019 · 3 comments
Open

MEI Cluster #8

johnemajor opened this issue Dec 4, 2019 · 3 comments

Comments

@johnemajor
Copy link

johnemajor commented Dec 4, 2019

Hello ! First off, thank you for building this tool. It looks extremely promising and I am including it in a benchmark study aimed towards defining the characteristics of what a clinical Whole Genome assay might be. I am running the latest docker image, and I've broken my processing up by chromosome. Chromosomes 3-Y all process in ~1hr, but chrm1 and chrm2 have been spinning for days, appearsing to be on the "Preparing MEI clusters.." steps.

I have a few questions:
(a) Can the MEI work be skipped somehow? I am mostly interested in classic CNVs right now (ins, del, dup)... but will eventually be back for these other SV types.
(b) is there any way to give tardis more resources? I am running on a very large box and it appears to only be using one core per job.

oh! and lastly, I was excited to see you used the CHM1 CHM13 datasets in validating your approach. This was a major factor in my inclusion of the tool (we sequenced CHM1 CHM13 and both combined as syndip for our validation work).

Best-
John Major

@calkan
Copy link
Member

calkan commented Dec 4, 2019

Hi John

Thanks for your comments & interest in TARDIS. Is there any way to look into the chrm1 of your BAM file to locate if there is a bug? We sometimes have problems in chrm16 because of segmental duplication content.

But here are some more practical answers:

(a) there is no --skip-mei parameter (@asylvz : please add it to your todo list, not a bad idea); but you can do a quick workaround without changing the code. By default TARDIS looks at "Alu:L1:SVA: names for MEIs, which you can change with --mei parameter. If you replace it with something like
--mei "tardisisawesome" then it will have 0-size clusters for MEI since there is not ME named tardisisawesome as far as I know :-)

Additionally you can speed it up with decreasing (--read-cluster) value (default 20); or skip split read mapping (--no-soft-clip) at the cost of reduced sensitivity.

(b) I would recommend to split up chromosomes as you did (btw, no need to split up BAMs in case you did that), and run through GNU parallel. Something like:

for i in seq 0 23;
do
echo tardis -i my.bam [other parameters] --first-chr $i --last-chr $i -o out-$i
done | parallel -j 16

should do the job and use 16 threads. The chromosome index starts at 0; in the same order as the BAM file header.

@johnemajor
Copy link
Author

Chrm1 actually completes in a few hours as well. It's just CHRM2 which gets stuck (after several attempts).

I'm not sure what I'd be doing or looking for to investigate a possible bug in my BAMs. These BAMS were used by all of the other Tardis processes with no problem (and variant callers have run from it), so I don't expect and issue, but am happy to take a peek if you sow me how.

In the time being, I'll use tour MEI hack

@johnemajor
Copy link
Author

If this solves my problem, I'm next be turning my attention to our CHM1 CHM13 data (i'd be happy to chat/share data if you'd like to take a peek at some of my results). But first another question: where can I find the canonical SV calls for each cell line independently?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants