Skip to content

Commit

Permalink
slurm manage
Browse files Browse the repository at this point in the history
  • Loading branch information
DylanLawless committed Nov 20, 2024
2 parents a05b813 + 6aec6a2 commit 88f2bf5
Show file tree
Hide file tree
Showing 5 changed files with 327 additions and 57 deletions.
Binary file added assets/images/Bayes_ex2_plot-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
83 changes: 50 additions & 33 deletions pages/bayesian_example1.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "Bayesian probability in placenta previa"
title: "Bayesian 1 probability in placenta previa"
output:
md_document:
variant: gfm
Expand All @@ -9,11 +9,11 @@ nav_order: 5
math: mathjax
---

# Bayesian - Probability of a girl birth given placenta previa
# Bayesian: Part 1 Probability of a girl birth given placenta previa

Last update:

## [1] "2024-11-13"
## [1] "2024-11-17"

This doc was built with:
`rmarkdown::render("Bayesian_example1.Rmd", output_file = "../pages/bayesian_example1.md")`
Expand Down Expand Up @@ -57,18 +57,35 @@ prior on $$\theta$$.

In Bayesian analysis, especially when dealing with proportions like the
probability of a girl birth in this scenario, understanding the entire
distribution is crucial. The Beta distribution, being the conjugate
prior for binomial likelihoods, is particularly sensitive to the shape
parameters ($$\alpha$$ and $$\beta$$), which in this case are derived
from the observed data (437 girls, 543 boys) plus one for each due to
the uniform prior assumption ($$\alpha = X + 1$$,
distribution is crucial. The Beta distribution, being the **conjugate
prior** for **binomial likelihoods**, is particularly sensitive to the
shape parameters ($$\alpha$$ and $$\beta$$), which in this case are
derived from the observed data (437 girls, 543 boys) plus one for each
due to the uniform prior assumption ($$\alpha = X + 1$$,
$$\beta = n - X + 1$$):

- **Alpha (438)**: Represents the number of successes (female births)
plus one.
- **Beta (544)**: Represents the number of failures (male births) plus
one.

{: .note }

**Binomial likelihoods**: This relates to scenarios where you have
binary data that result from a series of trials with two possible
outcomes (like success and failure). For example, flipping a coin
multiple times and counting how many times it lands heads is a situation
that would use a binomial likelihood because each flip has two possible
outcomes (heads or tails).

{: .note }

**Conjugate prior**: A conjugate prior is a special type of prior that,
when used with a particular likelihood function (like the binomial
likelihood), results in a posterior distribution that is the same type
of distribution as the prior. This is useful because it simplifies the
mathematical calculations involved in updating beliefs with new data.

``` r
library(ggplot2)
theme_set(theme_bw())
Expand Down Expand Up @@ -141,34 +158,34 @@ ggplot(mapping = aes(theta, p)) +

#### Calculation Methods

- **Analytical Approach:** Utilising properties of the beta
distribution, the posterior mean is 0.446 and the posterior standard
deviation is 0.016.
- **Simulation Approach:** Drawing 1000 samples from the Beta(438, 544)
- **Analytical approach:** Using properties of the beta distribution,
the posterior mean is 0.446 and the posterior standard deviation is
0.016.
- **Simulation approach:** Drawing 1000 samples from the Beta(438, 544)
posterior, the sample mean and standard deviation closely match the
analytical results.

#### Confidence Intervals

- **Beta Quantiles:** The 95% confidence interval for $\theta$ from beta
properties is \[0.415, 0.477\].
- **Simulation-based Estimate:** Using ordered draws, the 95% interval
- **Beta quantiles:** The 95% confidence interval for $$\theta$$ from
beta properties is \[0.415, 0.477\].
- **Simulation-based estimate:** Using ordered draws, the 95% interval
is similarly \[0.415, 0.476\].
- **Normal Approximation:** For practical ease, a normal approximation
- **Normal approximation:** For practical ease, a normal approximation
gives \[0.414, 0.476\], indicating robustness of the estimate.

### Enhanced Precision with Logit Transformation
### Enhanced precision with logit transformation

Transforming $\theta$ to the logit scale:
Transforming $$\theta$$ to the logit scale:

$$ \text{logit}(\theta) = \log\left(\frac{\theta}{1-\theta}\right) $$

This transformation stabilises variance, especially beneficial for
values of $\theta$ near boundaries. The logit-transformed values follow
a normal distribution, allowing us to back-calculate the confidence
interval for $\theta$ effectively.
values of $$\theta$$ near boundaries. The logit-transformed values
follow a normal distribution, allowing us to back-calculate the
confidence interval for $$\theta$$ effectively.

## Considerations on Prior Sensitivity
## Considerations on prior sensitivity

Exploring different **conjugate priors** with varying strengths of
belief around the general population proportion (0.485), the results
Expand All @@ -179,28 +196,28 @@ intervals across various priors.
## Plotting decision

The choice of the values for the sequence `seq(0.375, 0.525, 0.001)` in
`df1` is designed to provide a detailed and continuous visualization of
the posterior probability density function (pdf) of $$\theta$$ (the
probability of a girl birth given placenta previa) over a relevant range
of $$\theta$$ values.
`df1` is designed to provide a visualization of the posterior
probability density function (pdf) of $$\theta$$ (the probability of a
girl birth given placenta previa) over a relevant range of $$\theta$$
values.

- **Start (0.375) and End (0.525)**: These values define the range over
- **Start (0.375) and end (0.525)**: These values define the range over
which the posterior distribution will be evaluated and plotted. The
range is chosen to be slightly broader than the central 95% posterior
interval calculated from the Beta distribution (Beta(438, 544)), which
is \[0.415, 0.477\]. This broader range allows the plot to display the
tails of the distribution, providing a complete view of how the
density behaves towards the edges, which is informative for
understanding the distribution’s shape and spread.
- **Relevance to the Data**: The range centers around the expected
- **Relevance to the data**: The range centers around the expected
posterior mean ($$0.446$$) and includes the entire 95% confidence
interval, thereby capturing the most statistically significant values
of $$\theta$$ under the given model and data.

## Conclusion

The Bayesian analysis, robust across several approaches, suggests that
the probability of a female birth given placenta previa is less than the
general population’s proportion. The findings are consistent despite
different computational methods and prior assumptions, illustrating the
power of Bayesian inference in real-world data interpretation.
Based on the data, the probability of a female birth given placenta
previa is less than the general population’s proportion. The findings
are consistent despite different computational methods and prior
assumptions, illustrating the power of Bayesian inference in real-world
data interpretation.
142 changes: 142 additions & 0 deletions pages/bayesian_example2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
---
title: "Bayesian 2 probability in placenta previa"
output:
md_document:
variant: gfm
preserve_yaml: true
layout: default
nav_order: 5
math: mathjax
---

# Bayesian: Part 2 Probability of a girl birth given placenta previa

Last update:

## [1] "2024-11-17"

This doc was built with:
`rmarkdown::render("Bayesian_example2.Rmd", output_file = "../pages/bayesian_example2.md")`

## Introduction to part 2

The code is based on a version by Aki Vehtari.

In the continuation of the analysis on placenta previa, we look at
different priors to understand their impact on the posterior
distribution. This part not only revisits the basic steps introduced in
Part 1 but also expands on them by exploring alternative prior settings
to illustrate the sensitivity of our posterior estimates to the choice
of prior.

Following the initial setup where we determined the posterior
distribution using a uniform prior, Part 2 investigates how different
priors influence the results. This is crucial for assessing the
robustness of our conclusions against the assumptions we make in our
Bayesian framework.

- $$X = 437$$ - number of female births in placenta previa
- $$Y = 543$$ - number of male births in placenta previa
- $$n = 980$$ - total births in placenta previa
- $$0.485$$ - the frequency of normal female births in the population
- As in part 1, we had a posterior Beta(438,544)

``` r
a <- 437 # girls
b <- 543 # boys
```

## Exploring the effect of different priors

**Density evaluation**: We calculate the density of the posterior
distribution over a range of theta values using a uniform prior for
simplicity.

``` r
# Evaluate densities at evenly spaced points between 0.375 and 0.525
df1 <- data.frame(theta = seq(0.375, 0.525, 0.001))

# Posterior with Beta(1,1), ie. uniform prior
df1$pu <- dbeta(df1$theta, a+1, b+1)
```

**Further prior variations**: We set up priors with varying strengths by
modifying the prior counts and success ratio, reflecting different
degrees of confidence in the prior information.

``` r
# 3 different choices for priors
# Beta(0.485*2,(1-0.485)*2)
# Beta(0.485*20,(1-0.485)*20)
# Beta(0.485*200,(1-0.485)*200)
n <- c(2, 20, 200) # prior counts
apr <- 0.485 # prior ratio of success
```

- **Beta distribution parameters**: These parameters are set to reflect
increasing confidence in prior information, with the product of 0.485
and multipliers (2, 20, 200) determining the strength of the prior
belief. This adjustment in parameters influences how much the prior
beliefs affect the Bayesian updating process.
- **Prior counts (`n`)**: Specifies the strength of the prior. Lower
counts like 2 suggest minimal prior influence, while higher counts
like 200 indicate a strong prior belief based on substantial evidence
or confidence.
- **Prior probability of success (`apr`)**: Represents an assumed rate
of female births, serving as the basis for setting the Beta
distribution parameters, thereby impacting the shape of the posterior
distribution.

This setup allows for the examination of how prior beliefs, quantified
by `n` and `apr`, impact the posterior estimates. By varying these
priors, we see the Bayesian framework’s sensitivity to initial
assumptions, highlighting the need for careful consideration of prior
information in Bayesian analysis.

The following helper function and lapply construct compile the dataset
as described: helperf computes prior and posterior densities for a range
of theta values based on varying strengths of prior beliefs. This is
combined using lapply across different prior settings, and the results
are consolidated and reshaped for plotting

``` r
# helperf returns for given number of prior observations, prior ratio
# of successes, number of observed successes and failures and a data
# frame with values of theta, a new data frame with prior and posterior
# values evaluated at points theta.
helperf <- function(n, apr, a, b, df)
cbind(df, pr = dbeta(df$theta, n*apr, n*(1-apr)), po = dbeta(df$theta, n*apr + a, n*(1-apr) + b), n = n)

# lapply function over prior counts n and pivot results into key-value pairs.
df2 <- lapply(n, helperf, apr, a, b, df1) %>%
do.call(rbind, args = .) %>%
pivot_longer(!c(theta, n), names_to = "density_group", values_to = "p") %>%
mutate(density_group = factor(density_group, labels=c('Posterior','Prior','Posterior with unif prior')))

# add correct labels for plotting
df2$title <- factor(paste0('alpha/(alpha+beta)=0.485, alpha+beta=',df2$n))
```

``` r
# Plot distributions
ggplot(data = df2) +
geom_line(aes(theta, p, color = density_group)) +
# proportion of girl babies in general population
geom_vline(xintercept = pop_freq, linetype='dotted') +
annotate(geom = "label", label = lab_pop, x = pop_freq, y = 20, hjust = 0, fill = "white", alpha = 0.5, size =2) +
facet_wrap(~title, ncol = 1)
```

![](../assets/images/Bayes_ex2_plot-1.png)<!-- -->

The resulting plots highlight how the choice of prior affects the
posterior, with visual cues like vertical lines at the prior mean and
faceting by scenario.

## Conclusion

The exploration in part 2 demonstrates the Bayesian framework’s
flexibility and the critical role of prior selection. By comparing
different priors, we see how prior beliefs can substantially influence
posterior outcomes, thereby affecting conclusions drawn from Bayesian
analysis.
Loading

0 comments on commit 88f2bf5

Please sign in to comment.