MEI Cluster #8

johnemajor · 2019-12-04T00:06:59Z

Hello ! First off, thank you for building this tool. It looks extremely promising and I am including it in a benchmark study aimed towards defining the characteristics of what a clinical Whole Genome assay might be. I am running the latest docker image, and I've broken my processing up by chromosome. Chromosomes 3-Y all process in ~1hr, but chrm1 and chrm2 have been spinning for days, appearsing to be on the "Preparing MEI clusters.." steps.

I have a few questions:
(a) Can the MEI work be skipped somehow? I am mostly interested in classic CNVs right now (ins, del, dup)... but will eventually be back for these other SV types.
(b) is there any way to give tardis more resources? I am running on a very large box and it appears to only be using one core per job.

oh! and lastly, I was excited to see you used the CHM1 CHM13 datasets in validating your approach. This was a major factor in my inclusion of the tool (we sequenced CHM1 CHM13 and both combined as syndip for our validation work).

Best-
John Major

calkan · 2019-12-04T09:11:45Z

Hi John

Thanks for your comments & interest in TARDIS. Is there any way to look into the chrm1 of your BAM file to locate if there is a bug? We sometimes have problems in chrm16 because of segmental duplication content.

But here are some more practical answers:

(a) there is no --skip-mei parameter (@asylvz : please add it to your todo list, not a bad idea); but you can do a quick workaround without changing the code. By default TARDIS looks at "Alu:L1:SVA: names for MEIs, which you can change with --mei parameter. If you replace it with something like
--mei "tardisisawesome" then it will have 0-size clusters for MEI since there is not ME named tardisisawesome as far as I know :-)

Additionally you can speed it up with decreasing (--read-cluster) value (default 20); or skip split read mapping (--no-soft-clip) at the cost of reduced sensitivity.

(b) I would recommend to split up chromosomes as you did (btw, no need to split up BAMs in case you did that), and run through GNU parallel. Something like:

for i in seq 0 23;
do
echo tardis -i my.bam [other parameters] --first-chr $i --last-chr $i -o out-$i
done | parallel -j 16

should do the job and use 16 threads. The chromosome index starts at 0; in the same order as the BAM file header.

johnemajor · 2019-12-04T18:34:24Z

Chrm1 actually completes in a few hours as well. It's just CHRM2 which gets stuck (after several attempts).

I'm not sure what I'd be doing or looking for to investigate a possible bug in my BAMs. These BAMS were used by all of the other Tardis processes with no problem (and variant callers have run from it), so I don't expect and issue, but am happy to take a peek if you sow me how.

In the time being, I'll use tour MEI hack

johnemajor · 2019-12-04T18:35:55Z

If this solves my problem, I'm next be turning my attention to our CHM1 CHM13 data (i'd be happy to chat/share data if you'd like to take a peek at some of my results). But first another question: where can I find the canonical SV calls for each cell line independently?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MEI Cluster #8

MEI Cluster #8

johnemajor commented Dec 4, 2019 •

edited

Loading

calkan commented Dec 4, 2019

johnemajor commented Dec 4, 2019

johnemajor commented Dec 4, 2019

MEI Cluster #8

MEI Cluster #8

Comments

johnemajor commented Dec 4, 2019 • edited Loading

calkan commented Dec 4, 2019

johnemajor commented Dec 4, 2019

johnemajor commented Dec 4, 2019

johnemajor commented Dec 4, 2019 •

edited

Loading