diff --git a/Makefile.am b/Makefile.am index 7386da5d..f675becd 100644 --- a/Makefile.am +++ b/Makefile.am @@ -87,7 +87,7 @@ clean-local: clean-local-check clean-local-check: -rm -rf src/*.gc* tests/unittest/*.gc* src/*.gcov -utilities: utilities/dataExplore.r utilities/interpretDEploid.r +utilities_SOURCES = utilities/dataExplore.r utilities/interpretDEploid.r +utilities: sed -i'.bak' -e '/#!\/usr\/bin\/env Rscript/d' -e '/rm(list=ls())/d' utilities/dataExplore.r ; echo "#!/usr/bin/env Rscript" > tmpTxt; echo "rm(list=ls()); dEploidRootDir=\"$(PWD)\"" >> tmpTxt ; cat utilities/dataExplore.r >> tmpTxt ; mv tmpTxt utilities/dataExplore.r; chmod a+x utilities/dataExplore.r; sed -i'.bak' -e '/#!\/usr\/bin\/env Rscript/d' -e '/rm(list=ls())/d' utilities/interpretDEploid.r; echo "#!/usr/bin/env Rscript" > tmpTxt; echo "rm(list=ls()); dEploidRootDir=\"$(PWD)\"" >> tmpTxt ; cat utilities/interpretDEploid.r >> tmpTxt ; mv tmpTxt utilities/interpretDEploid.r; chmod a+x utilities/interpretDEploid.r - diff --git a/configure.ac b/configure.ac index 43ad45b8..d87fdcd8 100644 --- a/configure.ac +++ b/configure.ac @@ -16,6 +16,7 @@ AM_SILENT_RULES([yes]) AC_PROG_INSTALL AC_PREREQ AC_CANONICAL_HOST +AC_PROG_F77 # Checks for programs. AC_PROG_RANLIB diff --git a/docs/Output.md b/docs/Output.md index 7e68a4cf..ebcaa15f 100644 --- a/docs/Output.md +++ b/docs/Output.md @@ -31,8 +31,17 @@ When flag ``-vcfOut`` is turned on, haplotypes are saved at the final iteration When flag ``-exportPostProb`` is turned on, posterior probabilities of the final iteration of strain [i]. +### DEploid-IBD -Example of output interpretion +When "flag" ``-ibd`` is used. 'DEploid' executes first learns the number of strain and their proportions with an identity by descent model ('DEploid-IBD'). Then it fixes the number of strains and proportions and train the haplotypes, and train the haplotypes using the original DEploid algorithm ('DEploid-classic'). The staged output are labelled with ".ibd" and ".classic" respectively, and followed by the prefix. + + +### DEploid-BEST + +When "flag" ``-best`` is used. 'DEploid-BEST' executes the deconvolution algorithms in an optimised sequence to best report the number of strains, proportions and haplotypes. The program ('DEploid-Lasso') learns the number of strain with optimised reference panel; ".chooseK" is appended to the prefix for these output (NOTE: likelihood is not tracked in this case). It ('DEploid-IBD') then fixes the number of strains and tune the strain proportions with an identity by descent model; ".ibd" is appended to the prefix for these output. Finally, the program ('DEploid-Lasso') fixes the number of strains and proportions, and uses the optimised reference panel again to train and report the haplotypes; ".final" is appended to the prefix for these output. When ``-vcfOut`` and ``-exportPostProb`` are applied, these outputs will only associate with the final haplotypes. + + +Example of output interpretation ------------------------------ ### Example 1. Standard deconvolution output diff --git a/docs/_build/man/dEploid.1 b/docs/_build/man/dEploid.1 index 44d5aa7a..de0f017d 100644 --- a/docs/_build/man/dEploid.1 +++ b/docs/_build/man/dEploid.1 @@ -1,6 +1,6 @@ .\" Man page generated from reStructuredText. . -.TH "DEPLOID" "1" "Nov 03, 2019" "v0.6-beta" "DEploid" +.TH "DEPLOID" "1" "Nov 16, 2019" "v0.6-beta" "DEploid" .SH NAME dEploid \- . @@ -559,48 +559,62 @@ Figure on the right show allele frequency within sample, compare against the pop .SS Output files .sp \fBdEploid\fP outputs text files with user\-specified prefix with flag \fB\-o\fP\&. -.INDENT 0.0 -.TP -.B \fIprefix\fP\&.log +.sp +\fB\fIprefix\fP\fP\fB\&.log\fP +.sp Log file records \fBdEploid\fP version, input file paths, parameter used and proportion estimates at the final iteration. -.TP -.B \fIprefix\fP\&.llk +.sp +\fB\fIprefix\fP\fP\fB\&.llk\fP +.sp Log likelihood of the MCMC chain. -.TP -.B \fIprefix\fP\&.prop +.sp +\fB\fIprefix\fP\fP\fB\&.prop\fP +.sp MCMC updates of the proportion estimates. -.TP -.B \fIprefix\fP\&.hap +.sp +\fB\fIprefix\fP\fP\fB\&.hap\fP +.sp Haplotypes at the final iteration in plain text file. -.TP -.B \fIprefix\fP\&.vcf +.sp +\fB\fIprefix\fP\fP\fB\&.vcf\fP +.sp When flag \fB\-vcfOut\fP is turned on, haplotypes are saved at the final iteration in VCF format. -.TP -.B \fIprefix\fP\&.single[i] -When flag \fB\-exportPostProb\fP is turned on, posterior probabilities of the final iteration of strain [i]. -.UNINDENT -.SS Example of output interpretion +.sp +\fB\fIprefix\fP\fP\fB\&.single[i]\fP +.sp +When flag \fB\-exportPostProb\fP is turned on, posterior probabilities of the final iteration of strain [i]\&. +.SS DEploid\-IBD +.sp +When "flag" \fB\-ibd\fP is used. \(aqDEploid\(aq executes first learns the number of strain and their proportions with an identity by descent model (\(aqDEploid\-IBD\(aq). Then it fixes the number of strains and proportions and train the haplotypes, and train the haplotypes using the original DEploid algorithm (\(aqDEploid\-classic\(aq). The staged output are labelled with ".ibd" and ".classic" respectively, and followed by the prefix. +.SS DEploid\-BEST +.sp +When "flag" \fB\-best\fP is used. \(aqDEploid\-BEST\(aq executes the deconvolution algorithms in an optimised sequence to best report the number of strains, proportions and haplotypes. The program (\(aqDEploid\-Lasso\(aq) learns the number of strain with optimised reference panel; ".chooseK" is appended to the prefix for these output (NOTE: likelihood is not tracked in this case). It (\(aqDEploid\-IBD\(aq) then fixes the number of strains and tune the strain proportions with an identity by descent model; ".ibd" is appended to the prefix for these output. Finally, the program (\(aqDEploid\-Lasso\(aq) fixes the number of strains and proportions, and uses the optimised reference panel again to train and report the haplotypes; ".final" is appended to the prefix for these output. When \fB\-vcfOut\fP and \fB\-exportPostProb\fP are applied, these outputs will only associate with the final haplotypes. +.SS Example of output interpretation .SS Example 1. Standard deconvolution output .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C -$ ./dEploid \-vcf data/exampleData/PG0390\-C.eg.vcf.gz \e -\-plaf data/exampleData/labStrains.eg.PLAF.txt \e -\-noPanel \-o PG0390\-CNopanel \-seed 1 -$ utilities/interpretDEploid.r \-vcf data/exampleData/PG0390\-C.eg.vcf.gz \e -\-plaf data/exampleData/labStrains.eg.PLAF.txt \e -\-dEprefix PG0390\-CNopanel \e -\-o PG0390\-CNopanel \-ring + $ ./dEploid \-vcf data/exampleData/PG0390\-C.eg.vcf.gz \e + \-plaf data/exampleData/labStrains.eg.PLAF.txt \e + \-noPanel \-o PG0390\-CNopanel \-seed 1 + $ utilities/interpretDEploid.r \-vcf data/exampleData/PG0390\-C.eg.vcf.gz \e + \-plaf data/exampleData/labStrains.eg.PLAF.txt \e + \-dEprefix PG0390\-CNopanel \e + \-o PG0390\-CNopanel \-ring + + .ft P .fi .UNINDENT .UNINDENT +.sp [image: interpretDEploidFigure.1] [image] + .sp -The top three figures are the same as figures show in data example, with a small addition of inferred WSAF marked in blue, in the top right figure. +The top three figures are the same as figures show in :ref:\fBdata example \fP, with a small addition of inferred WSAF marked in blue, in the top right figure. .INDENT 0.0 .IP \(bu 2 The bottom left figure show the relative proportion change history of the MCMC chain. @@ -609,8 +623,10 @@ The middle figure show the correlation between the expected and observed allele .IP \(bu 2 The right figure shows changes in MCMC likelihood . .UNINDENT +.sp [image: interpretDEploidFigure.2] [image] + .sp This panel figure shows all allele frequencies within sample across all 14 chromosomes. Expected and observed WSAF are marked in blue and red respectively. .SS Example 2. Haplotype painting from a given panel @@ -621,31 +637,37 @@ This panel figure shows all allele frequencies within sample across all 14 chrom .sp .nf .ft C -$ ./dEploid \-vcf data/exampleData/PG0390\-C.eg.vcf.gz \e -\-plaf data/exampleData/labStrains.eg.PLAF.txt \e -\-panel data/exampleData/labStrains.eg.panel.txt \e -\-o PG0390\-CPanel \-seed 1 \-k 3 -$ ./dEploid \-vcf data/exampleData/PG0390\-C.eg.vcf.gz \e -\-plaf data/exampleData/labStrains.eg.PLAF.txt \e -\-panel data/exampleData/labStrains.eg.panel.txt \e -\-o PG0390\-CPanel \e -\-painting PG0390\-CPanel.hap \e -\-initialP 0.8 0 0.2 \-k 3 -$ utilities/interpretDEploid.r \-vcf data/exampleData/PG0390\-C.eg.vcf.gz \e -\-plaf data/exampleData/labStrains.eg.PLAF.txt \e -\-dEprefix PG0390\-CPanel \e -\-o PG0390\-CPanel \-ring + $ ./dEploid \-vcf data/exampleData/PG0390\-C.eg.vcf.gz \e + \-plaf data/exampleData/labStrains.eg.PLAF.txt \e + \-panel data/exampleData/labStrains.eg.panel.txt \e + \-o PG0390\-CPanel \-seed 1 \-k 3 + $ ./dEploid \-vcf data/exampleData/PG0390\-C.eg.vcf.gz \e + \-plaf data/exampleData/labStrains.eg.PLAF.txt \e + \-panel data/exampleData/labStrains.eg.panel.txt \e + \-o PG0390\-CPanel \e + \-painting PG0390\-CPanel.hap \e + \-initialP 0.8 0 0.2 \-k 3 + $ utilities/interpretDEploid.r \-vcf data/exampleData/PG0390\-C.eg.vcf.gz \e + \-plaf data/exampleData/labStrains.eg.PLAF.txt \e + \-dEprefix PG0390\-CPanel \e + \-o PG0390\-CPanel \-ring + + .ft P .fi .UNINDENT .UNINDENT +.sp [image: PG0390fwdBwdRing] [image] + .SS Example 3. Deconvolution followed by IBD painting .sp In addition to lab mixed samples, here we show example of \fBdEploid\fP deconvolute field sample PD0577\-C. +.sp [image: PD0577inbreeding] [image] + .SH PF3K WORKFLOW .sp Our main work flow consist with three steps: