-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Germline job never finish #47
Comments
Could you list which files in the generated working folder "germline"? Are there *.phased.vcf.gz file generated? Also, are you working on single cell data? The beagle step should not take so long time since the data is quite sparse. |
Hi, this is the files I have in the germline dir:
I am working on cell free RNAseq, so no, not single cell RNAseq data. But I am using this tool, because I expect the quality of the data and also sparsity to be similar to scRNA. |
That makes sense for the long time running of Monopogen since cell free RNA-seq may have more genome regions covered than single cell data. Could you check whether all regions are finished? For example |
cat *phased.log | grep finished | wc -l (base) Same number of regions. So I dont think its because its still running. |
And I was running it on 11 samples at the same time. Not one finished. With bulk variant callers I expect around 10k variants to be found in the bam files. So I doubt it is a problem with data size. |
Thanks for your examination. Do you mean you have only 10K variants across 22 chromosomes or in each region? I may debug this a bit. |
I ran DeepVariant on the data and post filtering (DP<10, might be too stringent...) I had 11k variants. |
Sure. Will feedback to you after I examine the job collection procedure. Give you do not have too many markers, it is better to run imputation in one whole chromosome. You can achieve this by inputting the region list with This will reduce the job collection complexity. Or you can remove the job collection module in and merge such phased vcf file out of Monopogen? Just run all sub jobs listed in joblst (Line 123-127) https://github.com/KChen-lab/Monopogen/blob/main/src/Monopogen.py |
My region list is the one from your resources :
Do you suggest I just run it as chr1 chr2 chr3, without the subdivision into chrlocation? Thank you for looking into it ! |
Yes, you can run imputation/phasing in one whole chromosome and it will further increase genotyping accuracy since the marker panel is very sparse. |
Thanks, I will try that |
@jinzhuangdou Hi again, I tried to run it again on different samples. I found out that I am in fact missing a .phased.log file (and phased.vcf file). And it seems to be for chr16 for all (checked a couple) of the samples. Could it be due to a flaw in the imputation panel file? From a quick look, it has about same size as the imputation panel file for chr17. |
Hi, I am running PreProcess and then Germline in a process in nextflow. But for some reason, the job never finish even though all the output files are written to the output directory. I get beagle finished but no "Monopogen.py Success! See instructions above." printed for germline (only preProcess)
The process:
The input files:
GRCh38.primary_assembly.genome.fa (indexed in same directory)
Directory with imputation panel files from ftp link you have provided.
Region list Monopogen/resource/GRCh38.region.lst
Htop indicates that beagle is still running (more than 12h after last 'beagle finished' in log).
Can you maybe help me to why the job wont finish?
Best regards,
Mette
The text was updated successfully, but these errors were encountered: