Skip to content

Commit

Permalink
Merge pull request #370 from DEploid-dev/issue349_update_utilities
Browse files Browse the repository at this point in the history
Issue349 update utilities
  • Loading branch information
shajoezhu authored Dec 15, 2024
2 parents 165be91 + a89687f commit c0cf33e
Show file tree
Hide file tree
Showing 3 changed files with 48 additions and 0 deletions.
44 changes: 44 additions & 0 deletions docs/FAQ.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,48 @@
Frequently asked questions
==========================

Where do I get the PLAF? or What PLAF should I use?
---------------------------------------------------

This is probably one of the most frequent questions that I was asked. To answer this question, let me explain "the role of a PLAF file" first.

The PLAF provides a loose genetic structure at the population level.
In the following example, the first two plots show the number of reference strain allele count on the x-axis for Africa and Asia respectively, and Africa PLAF and Asia PLAF on the y-axis. You can see the points form clouds on the diagonal. However, because the parasite strains from Africa and Asia are vastly different. It is shown in the third figure that PLAF pairs (at the same genome position) are scattered everywhere.

![asia_africa_plaf](_static/asia_africa_plaf_example.png "Asia Africa PLAF")

This is an extreme case to demonstrate how different PLAFs can vary at different geographical regions. PLAF provides prior information for our model to learn the exact structure of the mixed genome. Therefore, it is important to provide appropriate ones.

#### How to manually compute the PLAF?

In Pf3k studies, we use meta data to identify geographical regions of our sample, and divide samples into the following groups to extract the PLAF information.

1. Malawi, Congo.
2. Ghana (Navrongo).
3. Nigeria, Senegal, Mali.
4. The Gambia, Guinea, Ghana (Kintampo).
5. Cambodia (Pursat), Cambodia (Pailin), Thailand (Sisakhet).
6. Vietnam, Laos, Cambodia (Ratanakiri), Cambodia (Preah Vihear).
7. Bangladesh, Myanmar, Thailand (Mae Sot), Thailand (Ranong).

Since the release of the Pf6 data, I recommand to compute the PLAF in each country.

To compute the PLAF at each site, we simply take the ratio of *the sum of alternative allele of all samples* over *the sum of reference and alternative allele of all samples*. Then aggregate across all sites.

#### Extract PLAF from the VCF file directly

Since DEploid-BestPractice, we offer new DEploid functionality --- extracting PLAF from the VCF file, enabled by the flag `-plafFromVcf`. It will then extract the Allele Frequency (AF) attribute from the INFO field.

NOTE: To archive a good deconvolution result, prior knowledge of an appropriate PLAF or reference haplotypes are important. I would still recommand to go through the manual computation process when possible.


What reference panel should I use?
----------------------------------

1. Definitely use the clonal strains from your own study. Identifying the clonal strains, and inferring their haplotypes would be step one.
2. Consider to use the clonal strains from the Pf3k or Pf6 dataset as well.


Data filtering
--------------
Data filtering is an important step for deconvolution.
Expand Down Expand Up @@ -123,6 +165,8 @@ utilities/interpretDEploid.r -vcf data/exampleData/PG0400-C.eg.vcf.gz \
![#PG0400_sigma10](_static/PG0400-Csigma10.ring.png "Correct PG0400-C deconvolution")




Benchmark
---------

Expand Down
Binary file added docs/_static/asia_africa_plaf_example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions docs/doxygen/Doxyfile.in
Original file line number Diff line number Diff line change
Expand Up @@ -761,14 +761,18 @@ WARN_LOGFILE =
INPUT = @top_srcdir@/src/variantIndex.hpp @top_srcdir@/src/variantIndex.cpp \
@top_srcdir@/src/txtReader.hpp @top_srcdir@/src/txtReader.cpp \
@top_srcdir@/src/mcmc.hpp @top_srcdir@/src/mcmc.cpp \
@top_srcdir@/src/ibd.hpp @top_srcdir@/src/ibd.cpp \
@top_srcdir@/src/chooseK.hpp @top_srcdir@/src/chooseK.cpp \
@top_srcdir@/src/panel.hpp @top_srcdir@/src/panel.cpp \
@top_srcdir@/src/dEploid.cpp @top_srcdir@/src/exceptions.hpp \
@top_srcdir@/src/global.hpp @top_srcdir@/src/param.hpp \
@top_srcdir@/src/random/fastfunc.cpp @top_srcdir@/src/random/fastfunc.hpp \
@top_srcdir@/src/random/mersenne_twister.hpp @top_srcdir@/src/random/mersenne_twister.cpp \
@top_srcdir@/src/random/random_generator.hpp @top_srcdir@/src/random/random_generator.cpp \
@top_srcdir@/src/fastfunc.hpp @top_srcdir@/src/fastfunc.cpp \
@top_srcdir@/src/mersenne_twister.hpp @top_srcdir@/src/mersenne_twister.cpp \
@top_srcdir@/src/dEploidIO.hpp @top_srcdir@/src/dEploidIO.cpp \
@top_srcdir@/src/dEploidIO-operation.cpp @top_srcdir@/src/dEploidIO-workflow.cpp \
@top_srcdir@/src/updateHap.hpp @top_srcdir@/src/updateHap.cpp \
@top_srcdir@/src/utility.hpp @top_srcdir@/src/utility.cpp \
@top_srcdir@/src/export/dEploidIOExport.cpp @top_srcdir@/src/export/dEploidIOExportPosteriorProb.cpp \
Expand Down

0 comments on commit c0cf33e

Please sign in to comment.