slurm manage

SwissPedHealth-PipelineDev · Nov 20, 2024 · 88f2bf5 · 88f2bf5
2 parents a05b813 + 6aec6a2
commit 88f2bf5
Show file tree

Hide file tree

Showing 5 changed files with 327 additions and 57 deletions.
diff --git a/assets/images/Bayes_ex2_plot-1.png b/assets/images/Bayes_ex2_plot-1.png
diff --git a/pages/bayesian_example1.md b/pages/bayesian_example1.md
@@ -1,5 +1,5 @@
 ---
-title: "Bayesian probability in placenta previa"
+title: "Bayesian 1 probability in placenta previa"
 output:
   md_document:
     variant: gfm
@@ -9,11 +9,11 @@ nav_order: 5
 math: mathjax
 ---
 
-# Bayesian - Probability of a girl birth given placenta previa
+# Bayesian: Part 1 Probability of a girl birth given placenta previa
 
 Last update:
 
-    ## [1] "2024-11-13"
+    ## [1] "2024-11-17"
 
 This doc was built with:
 `rmarkdown::render("Bayesian_example1.Rmd", output_file = "../pages/bayesian_example1.md")`
@@ -57,18 +57,35 @@ prior on $$\theta$$.
 
 In Bayesian analysis, especially when dealing with proportions like the
 probability of a girl birth in this scenario, understanding the entire
-distribution is crucial. The Beta distribution, being the conjugate
-prior for binomial likelihoods, is particularly sensitive to the shape
-parameters ($$\alpha$$ and $$\beta$$), which in this case are derived
-from the observed data (437 girls, 543 boys) plus one for each due to
-the uniform prior assumption ($$\alpha = X + 1$$,
+distribution is crucial. The Beta distribution, being the **conjugate
+prior** for **binomial likelihoods**, is particularly sensitive to the
+shape parameters ($$\alpha$$ and $$\beta$$), which in this case are
+derived from the observed data (437 girls, 543 boys) plus one for each
+due to the uniform prior assumption ($$\alpha = X + 1$$,
 $$\beta = n - X + 1$$):
 
 - **Alpha (438)**: Represents the number of successes (female births)
   plus one.
 - **Beta (544)**: Represents the number of failures (male births) plus
   one.
 
+{: .note }
+
+**Binomial likelihoods**: This relates to scenarios where you have
+binary data that result from a series of trials with two possible
+outcomes (like success and failure). For example, flipping a coin
+multiple times and counting how many times it lands heads is a situation
+that would use a binomial likelihood because each flip has two possible
+outcomes (heads or tails).
+
+{: .note }
+
+**Conjugate prior**: A conjugate prior is a special type of prior that,
+when used with a particular likelihood function (like the binomial
+likelihood), results in a posterior distribution that is the same type
+of distribution as the prior. This is useful because it simplifies the
+mathematical calculations involved in updating beliefs with new data.
+
 ``` r
 library(ggplot2)
 theme_set(theme_bw())
@@ -141,34 +158,34 @@ ggplot(mapping = aes(theta, p)) +
 
 #### Calculation Methods
 
-- **Analytical Approach:** Utilising properties of the beta
-  distribution, the posterior mean is 0.446 and the posterior standard
-  deviation is 0.016.
-- **Simulation Approach:** Drawing 1000 samples from the Beta(438, 544)
+- **Analytical approach:** Using properties of the beta distribution,
+  the posterior mean is 0.446 and the posterior standard deviation is
+  0.016.
+- **Simulation approach:** Drawing 1000 samples from the Beta(438, 544)
   posterior, the sample mean and standard deviation closely match the
   analytical results.
 
 #### Confidence Intervals
 
-- **Beta Quantiles:** The 95% confidence interval for $\theta$ from beta
-  properties is \[0.415, 0.477\].
-- **Simulation-based Estimate:** Using ordered draws, the 95% interval
+- **Beta quantiles:** The 95% confidence interval for $$\theta$$ from
+  beta properties is \[0.415, 0.477\].
+- **Simulation-based estimate:** Using ordered draws, the 95% interval
   is similarly \[0.415, 0.476\].
-- **Normal Approximation:** For practical ease, a normal approximation
+- **Normal approximation:** For practical ease, a normal approximation
   gives \[0.414, 0.476\], indicating robustness of the estimate.
 
-### Enhanced Precision with Logit Transformation
+### Enhanced precision with logit transformation
 
-Transforming $\theta$ to the logit scale:
+Transforming $$\theta$$ to the logit scale:
 
 $$ \text{logit}(\theta) = \log\left(\frac{\theta}{1-\theta}\right) $$
 
 This transformation stabilises variance, especially beneficial for
-values of $\theta$ near boundaries. The logit-transformed values follow
-a normal distribution, allowing us to back-calculate the confidence
-interval for $\theta$ effectively.
+values of $$\theta$$ near boundaries. The logit-transformed values
+follow a normal distribution, allowing us to back-calculate the
+confidence interval for $$\theta$$ effectively.
 
-## Considerations on Prior Sensitivity
+## Considerations on prior sensitivity
 
 Exploring different **conjugate priors** with varying strengths of
 belief around the general population proportion (0.485), the results
@@ -179,28 +196,28 @@ intervals across various priors.
 ## Plotting decision
 
 The choice of the values for the sequence `seq(0.375, 0.525, 0.001)` in
-`df1` is designed to provide a detailed and continuous visualization of
-the posterior probability density function (pdf) of $$\theta$$ (the
-probability of a girl birth given placenta previa) over a relevant range
-of $$\theta$$ values.
+`df1` is designed to provide a visualization of the posterior
+probability density function (pdf) of $$\theta$$ (the probability of a
+girl birth given placenta previa) over a relevant range of $$\theta$$
+values.
 
-- **Start (0.375) and End (0.525)**: These values define the range over
+- **Start (0.375) and end (0.525)**: These values define the range over
   which the posterior distribution will be evaluated and plotted. The
   range is chosen to be slightly broader than the central 95% posterior
   interval calculated from the Beta distribution (Beta(438, 544)), which
   is \[0.415, 0.477\]. This broader range allows the plot to display the
   tails of the distribution, providing a complete view of how the
   density behaves towards the edges, which is informative for
   understanding the distribution’s shape and spread.
-- **Relevance to the Data**: The range centers around the expected
+- **Relevance to the data**: The range centers around the expected
   posterior mean ($$0.446$$) and includes the entire 95% confidence
   interval, thereby capturing the most statistically significant values
   of $$\theta$$ under the given model and data.
 
 ## Conclusion
 
-The Bayesian analysis, robust across several approaches, suggests that
-the probability of a female birth given placenta previa is less than the
-general population’s proportion. The findings are consistent despite
-different computational methods and prior assumptions, illustrating the
-power of Bayesian inference in real-world data interpretation.
+Based on the data, the probability of a female birth given placenta
+previa is less than the general population’s proportion. The findings
+are consistent despite different computational methods and prior
+assumptions, illustrating the power of Bayesian inference in real-world
+data interpretation.
diff --git a/pages/bayesian_example2.md b/pages/bayesian_example2.md
@@ -0,0 +1,142 @@
+---
+title: "Bayesian 2 probability in placenta previa"
+output:
+  md_document:
+    variant: gfm
+    preserve_yaml: true
+layout: default
+nav_order: 5
+math: mathjax
+---
+
+# Bayesian: Part 2 Probability of a girl birth given placenta previa
+
+Last update:
+
+    ## [1] "2024-11-17"
+
+This doc was built with:
+`rmarkdown::render("Bayesian_example2.Rmd", output_file = "../pages/bayesian_example2.md")`
+
+## Introduction to part 2
+
+The code is based on a version by Aki Vehtari.
+
+In the continuation of the analysis on placenta previa, we look at
+different priors to understand their impact on the posterior
+distribution. This part not only revisits the basic steps introduced in
+Part 1 but also expands on them by exploring alternative prior settings
+to illustrate the sensitivity of our posterior estimates to the choice
+of prior.
+
+Following the initial setup where we determined the posterior
+distribution using a uniform prior, Part 2 investigates how different
+priors influence the results. This is crucial for assessing the
+robustness of our conclusions against the assumptions we make in our
+Bayesian framework.
+
+- $$X = 437$$ - number of female births in placenta previa
+- $$Y = 543$$ - number of male births in placenta previa
+- $$n = 980$$ - total births in placenta previa
+- $$0.485$$ - the frequency of normal female births in the population
+- As in part 1, we had a posterior Beta(438,544)
+
+``` r
+a <- 437 # girls
+b <- 543 # boys
+```
+
+## Exploring the effect of different priors
+
+**Density evaluation**: We calculate the density of the posterior
+distribution over a range of theta values using a uniform prior for
+simplicity.
+
+``` r
+# Evaluate densities at evenly spaced points between 0.375 and 0.525
+df1 <- data.frame(theta = seq(0.375, 0.525, 0.001))
+
+# Posterior with Beta(1,1), ie. uniform prior
+df1$pu <- dbeta(df1$theta, a+1, b+1)
+```
+
+**Further prior variations**: We set up priors with varying strengths by
+modifying the prior counts and success ratio, reflecting different
+degrees of confidence in the prior information.
+
+``` r
+# 3 different choices for priors
+# Beta(0.485*2,(1-0.485)*2)
+# Beta(0.485*20,(1-0.485)*20)
+# Beta(0.485*200,(1-0.485)*200)
+n <- c(2, 20, 200) # prior counts
+apr <- 0.485 # prior ratio of success
+```
+
+- **Beta distribution parameters**: These parameters are set to reflect
+  increasing confidence in prior information, with the product of 0.485
+  and multipliers (2, 20, 200) determining the strength of the prior
+  belief. This adjustment in parameters influences how much the prior
+  beliefs affect the Bayesian updating process.
+- **Prior counts (`n`)**: Specifies the strength of the prior. Lower
+  counts like 2 suggest minimal prior influence, while higher counts
+  like 200 indicate a strong prior belief based on substantial evidence
+  or confidence.
+- **Prior probability of success (`apr`)**: Represents an assumed rate
+  of female births, serving as the basis for setting the Beta
+  distribution parameters, thereby impacting the shape of the posterior
+  distribution.
+
+This setup allows for the examination of how prior beliefs, quantified
+by `n` and `apr`, impact the posterior estimates. By varying these
+priors, we see the Bayesian framework’s sensitivity to initial
+assumptions, highlighting the need for careful consideration of prior
+information in Bayesian analysis.
+
+The following helper function and lapply construct compile the dataset
+as described: helperf computes prior and posterior densities for a range
+of theta values based on varying strengths of prior beliefs. This is
+combined using lapply across different prior settings, and the results
+are consolidated and reshaped for plotting
+
+``` r
+# helperf returns for given number of prior observations, prior ratio
+# of successes, number of observed successes and failures and a data
+# frame with values of theta, a new data frame with prior and posterior
+# values evaluated at points theta.
+helperf <- function(n, apr, a, b, df)
+  cbind(df, pr = dbeta(df$theta, n*apr, n*(1-apr)), po = dbeta(df$theta, n*apr + a, n*(1-apr) + b), n = n)
+
+# lapply function over prior counts n and pivot results into key-value pairs.
+df2 <- lapply(n, helperf, apr, a, b, df1) %>%
+  do.call(rbind, args = .) %>%
+  pivot_longer(!c(theta, n), names_to = "density_group", values_to = "p") %>%
+  mutate(density_group = factor(density_group, labels=c('Posterior','Prior','Posterior with unif prior')))
+
+# add correct labels for plotting
+df2$title <- factor(paste0('alpha/(alpha+beta)=0.485, alpha+beta=',df2$n))
+```
+
+``` r
+# Plot distributions
+ggplot(data = df2) +
+  geom_line(aes(theta, p, color = density_group)) +
+      # proportion of girl babies in general population
+  geom_vline(xintercept = pop_freq, linetype='dotted') +
+  annotate(geom = "label", label = lab_pop, x = pop_freq, y = 20, hjust = 0, fill = "white",  alpha = 0.5, size =2) +
+  facet_wrap(~title, ncol = 1)
+```
+
+![](../assets/images/Bayes_ex2_plot-1.png)<!-- -->
+
+The resulting plots highlight how the choice of prior affects the
+posterior, with visual cues like vertical lines at the prior mean and
+faceting by scenario.
+
+## Conclusion
+
+The exploration in part 2 demonstrates the Bayesian framework’s
+flexibility and the critical role of prior selection. By comparing
+different priors, we see how prior beliefs can substantially influence
+posterior outcomes, thereby affecting conclusions drawn from Bayesian
+analysis.