diff --git a/assets/images/altman_bland_analysis_of_methods.png b/assets/images/altman_bland_analysis_of_methods.png new file mode 100644 index 0000000..92d239c Binary files /dev/null and b/assets/images/altman_bland_analysis_of_methods.png differ diff --git a/assets/images/altman_bland_repeated_observations.png b/assets/images/altman_bland_repeated_observations.png new file mode 100644 index 0000000..d20fab2 Binary files /dev/null and b/assets/images/altman_bland_repeated_observations.png differ diff --git a/assets/images/receiver_operating_characteristic_curve.png b/assets/images/receiver_operating_characteristic_curve.png new file mode 100644 index 0000000..2c542a5 Binary files /dev/null and b/assets/images/receiver_operating_characteristic_curve.png differ diff --git a/assets/images/receiver_operating_characteristic_plots.png b/assets/images/receiver_operating_characteristic_plots.png new file mode 100644 index 0000000..895d9bd Binary files /dev/null and b/assets/images/receiver_operating_characteristic_plots.png differ diff --git a/pages/altman_bland_analysis_of_methods.md b/pages/altman_bland_analysis_of_methods.md new file mode 100644 index 0000000..12c6769 --- /dev/null +++ b/pages/altman_bland_analysis_of_methods.md @@ -0,0 +1,213 @@ +--- +nav_order: 5 +math: mathjax +layout: default +title: Stats Analysis of methods +created: 16 Sept 2023 +--- +{{ page.title }} +================ + +* TOC +{:toc} + +Altman, D. G., and J. M. Bland. "Measurement in Medicine: The Analysis of Method Comparison Studies." Journal of the Royal Statistical Society. Series D (The Statistician), vol. 32, no. 3, 1983, pp. 307–17. JSTOR, . + +The paper is a pivotal guide discussing the analysis of method comparison studies, particularly in the field of medicine. +It proposes a pragmatic approach to analyze such studies, stressing the importance of simplicity especially when the results need to be explained to non-statisticians. +I work through it here to compare if a new methods is as good or better than an existing one for clinical sepsis scores with example data. + +For more similar papers see the series of BMJ statistical notes by Altman & Bland +([ lit-altman_bland.md ]( https://github.com/DylanLawless/notes/blob/main/202106291417-lit-altman_bland.md )). + +## The fianl comparison results +### Scores per sample and Bland-Altman plot + + +**Figure 1.** Scores per sample and Bland-Altman plot as produced by the provided R code. + +### Correlation test result and repeatability coefficients +Results of the analysis as produced by the provided R code: + +```R +$Repeatability_Score1 +[1] 4.674812 + +$Repeatability_Score2 +[1] 4.267645 + +$Correlation_Test + + Pearson's product-moment correlation + +data: df$score1 and df$score2 +t = 21.693, df = 197, p-value < 2.2e-16 +alternative hypothesis: true correlation is not equal to 0 +95 percent confidence interval: + 0.7931229 0.8763442 +sample estimates: + cor +0.8395928 +``` + +**Repeatability Scores:** + +- **Repeatability Score 1:** 4.674812 +- **Repeatability Score 2:** 4.267645 + +The repeatability scores are measures of how consistent each method is. In this case, both methods seem to have fairly similar repeatability scores, indicating a similar level of consistency or reliability within each method. Without context or a benchmark to compare to, it's challenging to definitively say whether these scores are good or not, but the similarity suggests comparable repeatability. + +**Correlation Test:** + +- **Pearson's Correlation Coefficient (r):** 0.8395928 +- **95% Confidence Interval for r:** [0.7931229, 0.8763442] +- **P-value:** $$< 2.2e-16$$ + +The Pearson's correlation coefficient is quite high, indicating a strong positive linear relationship between the scores from the two methods. The nearly zero p-value (less than 2.2e-16) strongly suggests that this correlation is statistically significant, and it's highly unlikely that this observed correlation occurred by chance. + +**Conclusion:** + +Considering the high correlation coefficient and the comparable repeatability scores, it seems that the new method is quite similar to the old one in terms of both reliability (as indicated by the repeatability scores) and agreement (as indicated by the correlation coefficient). + +## Code + +```R +# Here is a really cool set of notes in BMJ about all kinds of clinical data analysis. +# https://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm + +# For example, if you every like to go further with the score that you working on, this paper is very famous for showing how to compare two such clinical score methods to show if the new one performs better. + +# https://sci-hub.hkvisa.net/10.2307/2987937 +# (Measurement in Medicine: The Analysis of Method Comparison Studies Author(s): D. G. Altman and J. M. Bland 1983) + +# Example ---- +# psofa.score - Total number of organ failures according to 2017 pSOFA definitions (Matics et al. (2017) (PMID 28783810)). The classification of organ failures is based on the worst vital signs and the worst lab values during the first 7 days from blood culture sampling. + +# pelod.score - Total number of organ failures according to PELOD-2 definitions (Leteurtre et al. (2013) (PMID 23685639)). The classification of organ failures is based on the worst vital signs and the worst lab values during the first 7 days from blood culture sampling. + +library(ggplot2) +library(dplyr) +library(tidyr) +library(patchwork) + +# Read data +df <- read.csv(file = "../data/example_sepsis_scores.tsv", sep = " ") + +# Take 200 rows as an example +df <- df[1:200, ] + +# Renaming columns and adding index column +df$score1 <- df$psofa.score +df$score2 <- df$pelod.score +df$index <- rownames(df) + +# Adding a small amount of random noise to each value +# NOTE: THIS WOULD BE USED FOR ADDING SOME NOISE TO MAKE THE DATA MORE ANONYMOUS FOR PUBLISHING AN EXAMPLE FIGURE +set.seed(123) # Setting a seed to ensure reproducibility +df$score1 <- df$score1 + rnorm(nrow(df), mean = 0, sd = 1) +df$score2 <- df$score2 + rnorm(nrow(df), mean = 0, sd = 1) + +# Calculate necessary statistics: the average and the difference of score1 and score2. +df$means <- (df$score1 + df$score2) / 2 +df$diffs <- df$score1 - df$score2 + +# Compute repeatability coefficients for each method separately +repeatability_score1 <- sd(df$score1, na.rm = TRUE) +repeatability_score2 <- sd(df$score2, na.rm = TRUE) + +# Compute correlation to check independence of repeatability from the size of the measurements +cor_test <- cor.test(df$score1, df$score2, method = "pearson") + +# Average difference (aka the bias) +bias <- mean(df$diffs, na.rm = TRUE) + +# Sample standard deviation +sd <- sd(df$diffs, na.rm = TRUE) + +# Limits of agreement +upper_loa <- bias + 1.96 * sd +lower_loa <- bias - 1.96 * sd + +# Additional statistics for confidence intervals +n <- nrow(df) +conf_int <- 0.95 +t1 <- qt((1 - conf_int) / 2, df = n - 1) +t2 <- qt((conf_int + 1) / 2, df = n - 1) +var <- sd^2 +se_bias <- sqrt(var / n) +se_loas <- sqrt(3 * var / n) +upper_loa_ci_lower <- upper_loa + t1 * se_loas +upper_loa_ci_upper <- upper_loa + t2 * se_loas +bias_ci_lower <- bias + t1 * se_bias +bias_ci_upper <- bias + t2 * se_bias +lower_loa_ci_lower <- lower_loa + t1 * se_loas +lower_loa_ci_upper <- lower_loa + t2 * se_loas + +# Create Bland-Altman plot +p1 <- df |> + ggplot(aes(x = means, y = diffs)) + + geom_point(color = "red", alpha = .5) + + ggtitle("Bland-Altman plot for PELOD and pSOFA scores") + + ylab("Difference between\ntwo scores") + + xlab("Average of two scores") + + theme_bw() + + geom_hline(yintercept = c(lower_loa, bias, upper_loa), linetype = "solid") + + geom_hline(yintercept = c(lower_loa_ci_lower, lower_loa_ci_upper, upper_loa_ci_lower, upper_loa_ci_upper), linetype="dotted") + + geom_hline(yintercept = se_bias) + +# Create scatter plot with a line of equality +p2 <- df |> + ggplot(aes(x=score1, y=score2)) + + geom_point(color = "red", alpha = .5) + + geom_abline(intercept = 0, slope = 1) + + theme_bw() + + ggtitle("PELOD and pSOFA scores per sample") + + ylab("pelod.score") + + xlab("psofa.score") + +# Combine plots +patch1 <- (p2 / p1) + plot_annotation(tag_levels = 'A') +patch1 + +# Output the correlation test result and repeatability coefficients +list( + Repeatability_Score1 = repeatability_score1, + Repeatability_Score2 = repeatability_score2, + Correlation_Test = cor_test +) +``` + +## Summary of the analysis of method comparison studies + +In their landmark paper, D.G. Altman and J.M. Bland outline a structured approach to evaluating whether a new method of medical measurement is as good as or better than an existing one. +The approach encapsulates several critical components, emphasizing not only statistical analyses but also the importance of effective communication, especially to non-expert audiences. +Key points from the paper: + +1. **Bland-Altman Plot** +- Introduced as a graphical method to assess the agreement between two different measurement techniques. This method involves plotting the difference between two methods against their mean, which assists in identifying any biases and analyzing the limits of agreement between the two methods. + +2. **Bias and Limits of Agreement** +- The authors recommend calculating the bias (the mean difference between two methods) and limits of agreement (bias ± 1.96 times the standard deviation of the differences) to quantify the agreement between the two methods. A smaller bias and narrower limits of agreement generally indicate that a new method might be comparable or superior to the existing one. + +3. **Investigating Relationship with Measurement Magnitude** +- Encourages the investigation of whether the differences between the methods relate to the measurement's magnitude. Transformations or regression approaches might be necessary depending on the observed association, to correct for it. + +4. **Repeatability** +- **Assessment**: It's crucial to assess repeatability for each method separately using replicated measurements on a sample of subjects. This analysis derives from the within-subject standard deviation of the replicates. +- **Graphical Methods and Correlation Tests**: Apart from Bland-Altman plots, graphical methods (like plotting standard deviation against the mean) and correlation coefficient tests are suggested for examining the independence of repeatability from the size of measurements. +- **Potential Influences**: It highlights the possible influences on measurements, such as observer variability, time of day, or the position of the subject, and differentiates between repeatability and reproducibility (agreement of results under different conditions). + +5. **Comparison of Methods** +- The core emphasis is on directly comparing results obtained by different methods to determine if one can replace another without compromising accuracy for the intended purpose of the measurement. Initial data plotting is encouraged, ideally plotting the difference between methods against the average of the methods, providing insight into disagreement, outliers, and potential trends. Testing for independence between method differences and the size of measurements is necessary, as it can influence the analysis and interpretation of bias and error. + +6. **Addressing Alternative Analyses** +- The paper discusses alternative approaches like least squares regression, principal component analysis, and regression models with errors in both variables, but finds them to generally add complexity without necessarily improving the simple comparison intended. + +7. **Effective Communication** +- The authors emphasize the importance of communicating results effectively to non-experts, such as clinicians, to facilitate practical application of the findings. + +8. **Challenges in Method Comparison Studies** +- The paper highlights the challenges faced in method comparison studies, primarily due to the lack of professional statistical expertise and reliance on incorrect methods replicated from existing literature. It calls for improved awareness among statisticians about this issue and encourages journals to foster the use of appropriate techniques through peer review. + +Thus, we can perform an objective evaluation of whether a new measurement method is as good or potentially better than an existing one by assessing agreement, bias, and repeatability, among other factors. + diff --git a/pages/altman_bland_ci_from_p.md b/pages/altman_bland_ci_from_p.md new file mode 100644 index 0000000..b538f24 --- /dev/null +++ b/pages/altman_bland_ci_from_p.md @@ -0,0 +1,65 @@ + +nav_order: 5 +math: mathjax + + + + + + + + + + + + + +Confidence intervals (CIs) are widely used in reporting statistical analyses of research data, and are usually considered to be more informative than P values from significance tests.1 2 Some published articles, however, report estimated effects and P values, but do not give CIs (a practice BMJ now strongly discourages). Here we show how to obtain the confidence interval when only the observed effect and the P value were reported. + +The method is outlined in the box below in which we have distinguished two cases. + +Steps to obtain the confidence interval (CI) for an estimate of effect from the P value and the estimate (Est) +(a) CI for a difference + 1 calculate the test statistic for a normal distribution test, z, from P3: z = −0.862 + √[0.743 − 2.404×log(P)] + 2 calculate the standard error: SE = Est/z (ignoring minus signs) + 3 calculate the 95% CI: Est –1.96×SE to Est + 1.96×SE. +(b) CI for a ratio + For a ratio measure, such as a risk ratio, the above formulas should be used with the estimate Est on the log scale (eg, the log risk ratio). Step 3 gives a CI on the log scale; to derive the CI on the natural scale we need to exponentiate (antilog) Est and its CI.4 +Notes + +All P values are two sided. + +All logarithms are natural (ie, to base e). 4 + +For a 90% CI, we replace 1.96 by 1.65; for a 99% CI we use 2.57. + +(a) Calculating the confidence interval for a difference +We consider first the analysis comparing two proportions or two means, such as in a randomised trial with a binary outcome or a measurement such as blood pressure. + +For example, the abstract of a report of a randomised trial included the statement that “more patients in the zinc group than in the control group recovered by two days (49% v 32%, P=0.032).”5 The difference in proportions was Est = 17 percentage points, but what is the 95% confidence interval (CI)? + +Following the steps in the box we calculate the CI as follows: + + z = –0.862+ √[0.743 – 2.404×log(0.032)] = 2.141; + SE = 17/2.141 = 7.940, so that 1.96×SE = 15.56 percentage points; + 95% CI is 17.0 – 15.56 to 17.0 + 15.56, or 1.4 to 32.6 percentage points. +(b) Calculating the confidence interval for a ratio (log transformation needed) +The calculation is trickier for ratio measures, such as risk ratio, odds ratio, and hazard ratio. We need to log transform the estimate and then reverse the procedure, as described in a previous Statistics Note.6 + +For example, the abstract of a report of a cohort study includes the statement that “In those with a [diastolic blood pressure] reading of 95-99 mm Hg the relative risk was 0.30 (P=0.034).”7 What is the confidence interval around 0.30? + +Following the steps in the box we calculate the CI as follows: + + z = –0.862+ √[0.743 – 2.404×log(0.034)] = 2.117; + Est = log (0.30) = −1.204; + SE = −1.204/2.117 = −0.569 but we ignore the minus sign, so SE = 0.569, and 1.96×SE = 1.115; + 95% CI on log scale = −1.204 − 1.115 to −1.204 + 1.115 = −2.319 to −0.089; + 95% CI on natural scale = exp (−2.319) = 0.10 to exp (−0.089) = 0.91. + Hence the relative risk is estimated to be 0.30 with 95% CI 0.10 to 0.91. +Limitations of the method +The methods described can be applied in a wide range of settings, including the results from meta-analysis and regression analyses. The main context where they are not correct is in small samples where the outcome is continuous and the analysis has been done by a t test or analysis of variance, or the outcome is dichotomous and an exact method has been used for the confidence interval. However, even here the methods will be approximately correct in larger studies with, say, 60 patients or more. + +P values presented as inequalities +Sometimes P values are very small and so are presented as P<0.0001 or something similar. The above method can be applied for small P values, setting P equal to the value it is less than, but the z statistic will be too small, hence the standard error will be too large and the resulting CI will be too wide. This is not a problem so long as we remember that the estimate is better than the interval suggests. + +When we are told that P>0.05 or the difference is not significant, things are more difficult. If we apply the method described here, using P=0.05, the confidence interval will be too narrow. We must remember that the estimate is even poorer than the confidence interval calculated would suggest. diff --git a/pages/altman_bland_correlation.md b/pages/altman_bland_correlation.md new file mode 100644 index 0000000..311ad3c --- /dev/null +++ b/pages/altman_bland_correlation.md @@ -0,0 +1,259 @@ +--- +nav_order: 5 +math: mathjax +layout: default +title: Stats Correlation, regression and repeated data +created: 29 June 2021 +--- +{{ page.title }} +================ + +* TOC +{:toc} + +## Introduction + +This topic is introduced as the first paper +bland1994correlation +in a series of BMJ statistical notes by Altman & Bland +([ lit-altman_bland.md ]( https://github.com/DylanLawless/notes/blob/main/202106291417-lit-altman_bland.md )): +1\. Bland JM, Altman DG. (1994) Correlation, regression and repeated data. 308, 896. +[1](#f1) + +It concerns the analysis of paired data where there is more than one observation per subject. +They point out that it could be highly misleading to analyse such data by combining repeated observations from several subjects and then calculating the correlation coefficient as if the data were a simple sample. + +Many researchers would assume that it is acceptable to gather repeated measurements for individuals and put all the data together. + +They use simulated data showing five pairs of measurements of two uncorrelated variables X and Y for subjects 1, 2, 3, 4, and 5. +Using each subject's mean values, they show correlation coefficient r=-0.67, df=3, P=0.22. +However, when they put all 25 observations together they get r=-0.47, df=23, P=0.02. +When the calculation is performed as if they have 25 subjects, the number of degrees of freedom for the significance test is increased incorrectly and a spurious significant difference is produced. +Thus demonstrating that one should not mix observations from different subjects indiscriminately, whether using correlation or the closely related regression analysis. + +## Correlation within subjects, part 1 + +The methods to use in these circumstances are later discussed in another note for the BMJ series +bland1995statistics, +number 11 on the list [ lit-altman_bland.md ]( https://github.com/DylanLawless/notes/blob/main/202106291417-lit-altman_bland.md ): +11\. Bland JM, Altman DG. (1995) Calculating correlation coefficients with repeated observations: Part 1, correlation within subjects. 310, 446. + +_Notes: I am replacing their terms for my notes:_ +* _X = Paco2_ +* _Y = pHi_ + +In this note they show an example table using 8 subjects with _4-8 observations_ for X and Y (**table I**): + +| Subject | Y | X | +|:--------|:---:|:---:| +| 1 | 6.68 | 3.97 | +| 1 | 6.53 | 4.12 | +| ... | ... | ... | + +* If subject's average Y is related to the subject's average X + - We can use the correlation between the subject means, which they shall describe in a subsequent note. +* If an increase in Y within the individual was associated with an increase in X + - We want to remove the differences between subjects and look only at changes within. + +To look at variation within the subject we can use multiple regression. + +* Make one variable, X or Y, the outcome variable and the other variable and the subject the predictor variables. +* The subject is treated as a categorical factor using dummy variables and so has seven degrees of freedom. + +* Using an analysis of variance table for the regression (table II) shows how the variability in Y can be partitioned into components due to different sources. + - Also known as analysis of covariance + - Equivalent to fitting parallel lines through each subject's data (**Figure 1.**) + + +
+ +| Source of variation | Degrees of freedom | Sum of squares | Mean square | Variance ratio (F) | Probability | +|:--------------------|-------------------:|---------------:|------------:|-------------------:|------------:| +| Subjects | 7 | 2.9661 | 0.4237 | 48.3 | $$<$$0.0001 | +| X | 1 | 0.1153 | 0.1153 | 13.1 | 0.0008 | +| Residual | 38 | 0.3337 | 0.0088 | | | +| Total | 46 | 3.3139 | 0.0720 | | | + +
+ +Table II. Analysis of variance for the data in table I (as shown in Altman & Bland). At the end of this page, this table is reproduced based on the original data and new R code, as shown. (Note that there is a slight varition between the published version and my replicated version of table II and Figure 1 below; probably due to a minor data entry error by the publisher or authors). + +* The residual sum of squares in **table II** represents the variation about regression lines. +* This removes the variation due to subjects (and any other nuisance variables which might be present) and express the variation in Y due to X as a proportion of what's left: + - (Sum of squares for X)/(Sum of squares for X + residual sum of squares). +* The magnitude of the correlation coefficient within subjects is the square root of this proportion. + - For **table II** this is: $$\sqrt{ \frac{0.1153}{0.1153+0.3337} } = 0.51 $$ + - The sign of the correlation coefficient is given by the sign of the regression coefficient for X. +* Regression slope is -0.108 +* So the correlation coefficient within subjects is -0.51. +* The P value is found either from: + - F test in the associated analysis of variance table, + - t test for the regression slope. + - It doesn't matter which variable we regress on which; + - we get the same correlation coefficient and P value either way. + +Incorrectly calculating the correlation coefficient +by ignoring the fact that we have 47 observations on only 8 subjects, would produce -0.07, P=0.7. + + +**Figure 1.** Recreation of "(Y) pH against (X) PaCO2 for eight subjects, with parallel lines fitted for each subject" as used in +bland1995statistics. +Interestingly, replotting this data shows that their figure was not fully accurate (forgivable before the days of Rstudio in 1995, and not important for this example). + +## Correlation within subjects, part 2 + +The second part shows how to find the correlation between the subject means +bland1995calculating, +number 12 on the list [ lit-altman_bland.md ]( https://github.com/DylanLawless/notes/blob/main/202106291417-lit-altman_bland.md ): +12\. Bland JM, Altman DG. (1995) Calculating correlation coefficients with repeated observations: Part 2, correlation between subjects. 310, 633. + +In this note they show the example table using the same 8 subjects with _one mean observation_ for X and Y: + +|Subject | Y | X | Number | +|:------:|:---:|:----:|:------:| +| 1 | 6.49 | 4.04 | 4 | +| 2 | 7.05 | 5.37 | 4 | +| 3 | 7.36 | 4.83 | 9 | +| 4 | 7.33 | 5.31 | 5 | +| 5 | 7.31 | 4.40 | 8 | +| 6 | 7.32 | 4.92 | 6 | +| 7 | 6.91 | 6.60 | 3 | +| 8 | 7.12 | 4.78 | 8 | + +* If subject's average Y is related to the subject's average X + - We can use the correlation between the subject means. + +They calculate the usual correlation coefficient for the mean Y and mean X; r=0.09, P=0.8. +Does not take into account the different numbers of measurements on each subject. + +* Does this matter?: + - Depends on how different the numbers of observations are + - whether the measurements within subjects vary much compared with the means between subjects + +We can calculate a weighted correlation coefficient using the number of observations as weights. +Many computer programs will calculate this, but it is not difficult to do by hand. + +* They denote the mean Y and X for subject i by $$\bar{x}_i$$ and $$\bar{y}_i$$, +* the number of observations for subject i by $$m_i$$, +* and the number of subjects by $$n$$. +* The weighted mean of the $$\bar{x}_i$$ is +$$\frac{ \sum{ m_i \bar{x}_i } }{ \sum{ m_i } }$$ + + +In the usual case, where there is one observation per subject, the $$m_i$$ +are all one and this formula gives the usual mean +$$\frac{ \sum{\bar{x}_i} }{n} $$. + + +An easy way to calculate the weighted correlation coefficient is to replace each individual observation by its subject mean. +Thus the table would yield 47 pairs of observations, +the first four of which would each be pH=6.49 and Paco2=4.04, and so on. + +If we use the usual formula for the correlation coefficient on the expanded data we will get the weighted correlation coefficient. +However, we must be careful when it comes to the P value. +We have only 8 observations (n in general), not 47. +We should ignore any P value printed by our computer program, and use a statistical table instead. + +The formula for a weighted correlation coefficient is: + + +
+$$ +\frac{ +\sum{m_i \bar{x}_i \bar{y}_i} +- +\sum{m_i \bar{x}_i} \sum{m_i \bar{y}_i} +\mathbin{/} +\sum{m_i} +}{ +\sqrt{ + ( + \sum{m_i \bar{y}_i^2} + - + (\sum{m_i \bar{y}_i})^2 + \mathbin{/} + \sum{m_i} + ) + ( + \sum{m_i \bar{y}_i^2} + - + (\sum{m_i \bar{y}_i})^2 + \mathbin{/} + \sum{m_i} + )} + } + $$ +
+ + +where all summations are from $$i=1$$ to $$n$$. +When all the $$m_i$$ are equal they cancel out, +giving the usual formula for a correlation coefficient. + +For the data in the table the weighted correlation coefficient is r=0.08, P=0.9. +There is no evidence that subjects with a high Y also have a high X. +However, as they have already shown in part 1, +within the subject a rise in Y was associated with a fall in X. + + +``` R +## Code and raw data for Table I, Analysis of variance table II, and Figure 1 +df <- data.frame ( + Subject = c("1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3", "3", "3", "3", "3", "3", "4", "4", "4", "4", "4", "5", "5", "5", "5", "5", "5", "5", "5", "6", "6", "6", "6", "6", "6", "7", "7", "7", "8", "8", "8", "8", "8", "8", "8", "8"), + + Y = c(6.68, 6.53, 6.43, 6.33, 6.85, 7.06, 7.13, 7.17, 7.4, 7.42, 7.41, 7.37, 7.34, 7.35, 7.28, 7.3, 7.34, 7.36, 7.33, 7.29, 7.3, 7.35, 7.35, 7.3, 7.3, 7.37, 7.27, 7.28, 7.32, 7.32, 7.38, 7.3, 7.29, 7.33, 7.31, 7.33, 6.86, 6.94, 6.92, 7.19, 7.29, 7.21, 7.25, 7.2, 7.19, 6.77, 6.82), + + X = c(3.97, 4.12, 4.09, 3.97, 5.27, 5.37, 5.41, 5.44, 5.67, 3.64, 4.32, 4.73, 4.96, 5.04, 5.22, 4.82, 5.07, 5.67, 5.1, 5.53, 4.75, 5.51, 4.28, 4.44, 4.32, 3.23, 4.46, 4.72, 4.75, 4.99, 4.78, 4.73, 5.12, 4.93, 5.03, 4.93, 6.85, 6.44, 6.52, 5.28, 4.56, 4.34, 4.32, 4.41, 3.69, 6.09, 5.58) +) + +# Run the Analysis of Variance with mutiple variable +name=aov(Y ~ Subject + X, data = df) #runs the ANOVA test +ls(name) #lists the items stored by the test. +summary(name) #give the basic ANOVA output. + +# Output the column totals to match Altman & Bland table +df <- as.data.frame(unlist( summary(name) )) + +sum(df[1:3,]) # Total Degrees of freedom +sum(df[4:6,]) # Total Sum Sq +sum(df[7:9,]) # Total Mean Sq + +``` + +
+ +| | Degrees of freedom | Sum of squares | Mean Square | Variance ratio (F) | Probability | +|------------|-----|-----------|------------|-----------|---------| +| df$Subject | 7 | 2.8648 | 0.4093 | 46.60 | < 2e-16 | +| df$X | 1 | 0.1153 | 0.1153 | 13.13 | 0.000847 | +| Residuals | 38 | 0.3337 | 0.0088 | | | + +
+ +Replicated version of Table II. Analysis of variance for the data in table I. +Default R output headings modified: +Degrees of freedom (Df), +Sum of squares (Sum Sq), +Mean square (Mean Sq), +Variance ratio F (F value), +Probability (Pr(>F)). +(Repeated note: there is a slight varition between the published version and my replicated version of table II and Figure 1; probably due to a minor data entry error by the publisher or authors). + +``` R +# code used to produce Figure 1. + +require(ggplot2) +ggplot(df, aes(x = X, y = Y, group=Subject, color = Subject) ) + + geom_point() + + geom_smooth(aes(color = Subject), method = "lm", formula = y ~ x, , se = FALSE) + +# The dataset is cited by Bland & Altman 1995 as: "Boyd O, Mackay CJ, Lamb G, Bland JM, Grounds RM, Bennett ED.Comparison of clinical information gained from routine blood-gas analysis and from gastric tonometry for intramural pH.Lancet1993;341:142–6." + +``` + +## References + + +**Footnote** +1 This article is almost identical to the original version in acknowledgment to Altman and Bland. It is adapted here as part of a set of curated, consistent, and minimal examples of statistics required for human genomic analysis. +[↩](#a1) diff --git a/pages/altman_bland_cp.sh b/pages/altman_bland_cp.sh new file mode 100644 index 0000000..a0242ed --- /dev/null +++ b/pages/altman_bland_cp.sh @@ -0,0 +1,17 @@ +#!/bin/bash +nav_order: 5 +math: mathjax + +# cp ~/web/notes/202*-perm-altman_bland*md ./ + +name=test +file=202107191052-lit-sensitivity_specificity.md + +cat ./.altman_bland_head > altman_bland_$name.md + +printf "[1](#f1)" >> ./altman_bland_$name.md + +cat ~/web/notes/$file >> ./altman_bland_$name.md + + +printf "**Footnote**\n1 This article is almost identical to the original version in acknowledgment to Altman and Bland. It is adapted here as part of a set of curated, consistent, and minimal examples of statistics required for human genomic analysis.\n[↩](#a1)" >> ./altman_bland_$name.md diff --git a/pages/altman_bland_odds_ratios.md b/pages/altman_bland_odds_ratios.md new file mode 100644 index 0000000..2bba7de --- /dev/null +++ b/pages/altman_bland_odds_ratios.md @@ -0,0 +1,145 @@ +--- +nav_order: 5 +math: mathjax +layout: default +title: Stats Odds ratios, SE & CI +created: 4 July 2021 +--- +{{ page.title }} +================ + +* TOC +{:toc} + +## Introduction + +Altman & Bland review the use of odds ratio (OR), standard error (SE), and confidence interval (CI) with some examples in +bland2000odds; +the 42nd paper on the list of statistical notes in their BMJ series, +([ lit-altman_bland.md ]( https://github.com/DylanLawless/notes/blob/main/202106291417-lit-altman_bland.md )): +42\. Bland JM, Altman DG. (2000) The odds ratio. 320, 1468. +[1](#f1) + +In reproducing their examples I use $$X$$ and $$Y$$; +* _X = Eczema_ +* _Y = Hay fever_ + +As an example dataset, they cite the following table; Association between hay fever (Y) and eczema (X) in 11 year old children. + +| | Y Yes | Y No | Y Total | +|:---|---:|---:|---:| +| X Yes | 141 | 420 | 561 | +| X No | 928 | 13 525 | 14 453 | +| X Total | 1069 | 13 945 | 15 522 | + +## Odds ratio + +### Perspective 1 +The _probability_ that a child with X will also have Y +is estimated by the proportion $$\dfrac{141}{561}$$ (25.1%) +and _odds_ is estimated by $$\dfrac{141}{420}$$. + +Similarly, for children without X the _probability_ of having Y is estimated by $$\dfrac{928}{14 453}$$ (6.4%) +and the odds is $$\dfrac{928}{13 525}$$. + +They compare the groups in several ways: +* By the difference between the proportions +$$\dfrac{141}{561} - \dfrac{928}{14 453} = 0.187$$ (or 18.7 percentage points). +* The ratio of the proportions (also called the relative risk) +$$\dfrac{ \left(\dfrac{141}{561}\right) }{ \left(\dfrac{928}{14 453}\right) } = 3.91 $$. +* The OR +$$\dfrac{ \left(\dfrac{141}{420}\right) }{ \left(\dfrac{928}{13 525}\right) } = 4.89$$. + +### Perspective 2 +Looking at the table the other way round, +What is the _probability_ that a child with Y will also have X? + +The _proportion_ is $$\dfrac{141}{1069}$$ (13.2%) +and the _odds_ is $$\dfrac{141}{928}$$. + +For a child without Y, the _proportion_ with X is +$$\dfrac{420}{13 945}$$ (3.0%) +and the _odds_ is $$\dfrac{420}{13 525}$$. + +Comparing the proportions this way, +* The difference is +$$\dfrac{141}{1069} - \dfrac{420}{13 945} = 0.102$$ (or 10.2 percentage points); + +* The ratio (relative risk) +$$\dfrac{ \left(\dfrac{141}{1069}\right) }{ \left(\dfrac{420}{13 945}\right) } = 4.38$$; +* The OR +$$\dfrac{ \left(\dfrac{141}{928}\right) }{ \left(\dfrac{420}{13 525}\right) } = 4.89$$. + +The OR is the same whichever way round we look at the table, +but the difference and ratio of proportions are not. +This is because the two OR are + +$$\dfrac{ 141 / \textbf{420} }{ \textbf{928} / 13 525 }$$ and +$$\dfrac{ 141 / \textbf{928} }{ \textbf{420} / 13 525 }$$ +which can both be rearranged to give +$$\dfrac{ 141 \times 13 525 }{ 928 \times 420 }$$. + +Swapping orders for rows and columns produces the same OR. + +Swapping orders for either only rows or only columns produces the the reciprocal of the OR, $$1/4.89 = 0.204$$. + +Thus, OR can indicate the strength of the relationship. +OR cannot be negative but is not limited in the positive direction, producing a skew distribution. +Reversing the order of categories for one variables simply results in a reversed sign of log OR: + +$$log(4.89) = 1.59$$, + +$$log(0.204) = - 1.59$$. + +## Standard error +The standard error (SE) can be calculated for the log OR and hence a confidence interval (CI). + +The SE of log OR is simply estimated by the square root of the sum of the reciprocals of the four frequencies. +For the example, + +
+ +$$\text{SE(}log \text{OR)} = +\sqrt{ +\frac{1}{141} + +\frac{1}{420} + +\frac{1}{928} + +\frac{1}{13 525}} += +0.103 +$$ + +
+ +## Confidence interval + +A 95% confidence interval (CI) for the log OR is obtained as 1.96 standard errors on either side of the estimate. + +For the example, +the log OR is +$$log_{e} (4.89) = 1.588$$ +and the confidence interval is +$$ 1.588 \pm 1.96 \times 0.103 $$, +which gives $$1.386$$ to $$1.790$$. + +The antilog of these limits to give a 95% CI for the OR itself, +as +$$ exp(1.386) = 4.00 $$ to +$$ exp(1.790) = 5.99 $$. + +The observed OR, 4.89, +is not in the centre of the confidence interval because of the asymmetrical nature of the OR scale. +For this reason, in graphs ORs are often plotted using a logarithmic scale. +The OR is 1 when there is no relationship. +We can test the null hypothesis that the OR is 1 by the usual +$${\chi}^2$$ test for a two by two table. + +Despite their usefulness, ORs can cause difficulties in interpretation. +Altman & Bland review this debate and also discuss ORs in logistic regression and case-control studies in future Statistics Notes. + +## References + + +**Footnote** +1 This article is almost identical to the original version in acknowledgment to Altman and Bland. It is adapted here as part of a set of curated, consistent, and minimal examples of statistics required for human genomic analysis. +[↩](#a1) diff --git a/pages/altman_bland_roc_curve.md b/pages/altman_bland_roc_curve.md new file mode 100644 index 0000000..ec96c9c --- /dev/null +++ b/pages/altman_bland_roc_curve.md @@ -0,0 +1,135 @@ +--- +nav_order: 5 +math: mathjax +layout: default +title: Stats Receiver operating characteristic plots +created: 2021-07-16 +--- +{{ page.title }} +================ + +* TOC +{:toc} + +## Receiver operating characteristic plots +This article covers the fifth paper in the series of statistics notes altman1994diagnostic +([ lit-altman_bland.md ]( https://github.com/DylanLawless/notes/blob/main/202106291417-lit-altman_bland.md )): 5. Altman DG, Bland JM. (1994) Diagnostic tests 3: receiver operating characteristic plots. 309, 188, +and concerns _quantitative diagnostic tests_. +[1](#f1) +Diagnosis based on yes or no answers are covered in another note by Bland and Altman. +The same statistical methods for quantifying yes or no answers can be applied here when there is a cut off threshold for defining _normal_ and _abnormal_ test results. +For simplicity, I will call someone who is diagnosed by a clinical test a "_case_" and someone who is not diagnosed by a test/healthy/normal, a "_control_". +These terms are incorrect but much simpler to repeatedly read than "people who are diagnosed by a test". + +The receiver operating characteristic (ROC) plot can be used measure how test results compare between cases and controls. +Altman and Bland mention that this method was developed in the 1950s for evaluating radar signal detection. +An aside for history buffs, from [wikipedia](https://en.wikipedia.org/wiki/Receiver_operating_characteristic): +> The ROC curve was first used during World War II for the analysis of radar signals before it was employed in signal detection theory.[56] Following the attack on Pearl Harbor in 1941, the United States army began new research to increase the prediction of correctly detected Japanese aircraft from their radar signals. For these purposes they measured the ability of a radar receiver operator to make these important distinctions, which was called the Receiver Operating Characteristic.[57] + + + +The example shown in Figure 1 uses graft versus host disease, with an index measurement whose definition is not important. +The _Yes_ indicate _cases_ and _No_ indicate _controls_ in our terminology, respectively. +The usefulness of the test for predicting graft versus host disease will clearly relate to the degree of non-overlap between the two distributions. +A ROC plot is obtained by calculating the +* sensitivity and +* specificity +of every observed data value and plotting, as in Figure 2, +* Y axis = sensitivity, +* X axis = 1 - specificity. + +A test that perfectly defines cases and cotrols would have a curve that aligns withe Y axis and top. +A test that does not work would produce a straight line matching the centerline. +In practice, overlaps always occur such that the curve usually lies somewhere between, as shown in Figure 2. + +The performance of the test (diagnostic accuracy) is reported as the _area under the ROC curve_. +The area is equal to the probability that a random case has a higher measurement than that of a control. +This probability is .5 for a test that does not work (e.g. coin-toss; straight line curve). +This discriminatory power assessment is important for a clinical test if it is to be sufficient to discriminate cases and controls. + +At this stage we have the global assessment of discriminatory power showing that a test can divide cases and control. +A cut off for clinical use also requires a local assessment. +As per Altman and Bland; the simple approach of minimising "errors" (equivalent to maximising the sum of the sensitivity and specificity) is not necessarily best. +We must consider any type of costs of +* false negatives +* false positives +* and prevalence of disease in the test cohort. + +In their example: +* cancer in general population + - most cases should be detected (high sensitivity) + - many false positives (low specificity), who could then be eliminated by a further test. + +For comparing two or more measures, the ROC plot is useful. +The curve wholly above another is clearly the better test. +Altman and Bland cite a review for methods for comparing the areas under two curves for both paired and unpaired data. + +In my (reccomended) pocket-sized copy of +_Oxford handbook of medical statistics_ +peacock2011oxford, +a clinical example uses a chosen cut-off of sensitivity $$>81\%$$ and specificity $$28\%$$. +The area under ROC curve was .65, thus a moderately high predictive power. +The accuracy (proportion of all correctly identified cases) was +$$\frac{ 30 + 42 }{ 185 } = 39\%$$ + +
+ +$$\frac{\text{No. cases above cutoff} + \text{No. controls below cutoff }}{ \text{cohort total} }$$ + +
+ +## Example ROC cuvre +To implement this method, I include here an example in R code. + + + +``` R +# Modified example from https://stackoverflow.com/questions/31138751/roc-curve-from-training-data-in-caret + +library(caret) +library(mlbench) + +# Dataset +data(Sonar) +# An example dataset for classification of sonar signals using a neural network. The task is to train a network to discriminate between sonar signals bounced off a metal cylinder and those bounced off a roughly cylindrical rock. Each pattern is a set of 60 numbers in the range 0.0 to 1.0. Each number represents the energy within a particular frequency band, integrated over a certain period of time. Labels: "R" if the object is a rock and "M" if it is a mine (metal cylinder). + +ctrl <- trainControl(method="cv", + summaryFunction=twoClassSummary, + classProbs=T, + savePredictions = T) + +rfFit <- train(Class ~ ., data=Sonar, + method="rf", preProc=c("center", "scale"), + trControl=ctrl) + +library(pROC) +# Select a parameter setting +selectedIndices <- rfFit$pred$mtry == 2 + +# Plot: +plot.roc(rfFit$pred$obs[selectedIndices], + rfFit$pred$M[selectedIndices], + print.auc=TRUE, + print.thres=TRUE, #thresh .557 shown + legacy.axes=TRUE) + +# With ggplot +library(ggplot2) +library(plotROC) +ggplot(rfFit$pred[selectedIndices, ], + aes(d=( ifelse(rfFit$pred$obs[selectedIndices]=="M",1,0) ), + m = M + )) + + geom_roc(hjust = -0.4, vjust = 1.5) + + coord_equal() + + style_roc() + + annotate("text", x=0.75, y=0.25, label=paste("AUC =", round((calc_auc(g))$AUC, 4))) + +``` + +## References + + +**Footnote** +1 This article is almost identical to the original version in acknowledgment to Altman and Bland. It is adapted here as part of a set of curated, consistent, and minimal examples of statistics required for human genomic analysis. +[↩](#a1) diff --git a/pages/altman_bland_sensitivity_specificity.md b/pages/altman_bland_sensitivity_specificity.md new file mode 100644 index 0000000..148f009 --- /dev/null +++ b/pages/altman_bland_sensitivity_specificity.md @@ -0,0 +1,80 @@ +--- +nav_order: 5 +math: mathjax +layout: default +title: Stats Sensitivity and specificity +created: 19 July 2021 +--- +{{ page.title }} +================ + +* TOC +{:toc} + +## Sensitivity and specificity +The third paper on the list of BMJ statistics notes by Altman and Bland, +([ lit-altman_bland.md ]( https://github.com/DylanLawless/notes/blob/main/202106291417-lit-altman_bland.md )), +altman1994diagnostic1 3. Altman DG, Bland JM. (1994) +Diagnostic tests 1: sensitivity and specificity. 308, 1552. +[1](#f1) + +The simple diagnostic test such as an x-ray is used to classify patients into two groups: +* Presence of a symptom or sign + - Yes + - No + +Altman and Bland use the following cited example; +The results of a scan (**test**) compared to the correct diagnosis (**true positive**) based on either necropsy, biopsy, or surgical inspection. +i.e. How good is the scan for correct diagnosis? + +**Table 1.** _Relation between results of liver scan and correct diagnosis._ + +
+ +| Liver scan |: Pathology (diagnosis) :||| +|:---|:---:|:---:|:---:| +| | **Abnormal (+)** | **Normal (-)** | **Total** | +| **Abnormal(+)** | _231_ | 32 | 263 | +| **Normal(-)** | 27 | _54_ | 81 | +| **Total** | 258 | 86 | 344 | + +
+ +Patients who are correctly labelled are: +* Disease signs and abnormal liver + - 258 true positives +* No signs and healthy liver + - 86 true negatives + +The proportions of these two groups that were also correctly diagnosed by the scan were $$231/258=0.90$$ and $$54/86=0.63$$, respectively. + +* Sensitivity + - Proportion of **true positives** that are **correctly identified** by the test. + +* Specificity + - Proportion of **true negatives** that are **correctly identified** by the test. + +Based on Altman and Bland's example sample, +we expect 90% true positives (_patients with abnormal pathology to have abnormal (positive) liver scans_), +and 63% true negatives (_those with normal pathology would have normal (negative) liver scans_). + +## Confidence intervals +The sensitivity and specificity are proportions, so confidence intervals can be calculated. +This uses standard methods for proportions +gardner1989calculating. + +## Quantifying the diagnostic ability +Sensitivity and specificity are one approach to quantifying the diagnostic ability of the test. +In this case, we already have the final results of **tests** and **diagnosis** from the sample set. +For an individual patient we only have the **test** result. +We want to quantify how well the test can predict true positives. + +This is answered in the next statistical note; _predictive values_. +It defines _positive_ and _negative predictive values_ and requires the use of _sensitivity,_ _specificity,_ and _prevalence_. + +## References + + +**Footnote** +1 This article is almost identical to the original version in acknowledgment to Altman and Bland. It is adapted here as part of a set of curated, consistent, and minimal examples of statistics required for human genomic analysis. +[↩](#a1) diff --git a/pages/lawlessgenomics_to_docs.sh b/pages/lawlessgenomics_to_docs.sh new file mode 100644 index 0000000..00164cc --- /dev/null +++ b/pages/lawlessgenomics_to_docs.sh @@ -0,0 +1,29 @@ +#!/bin/bash + +output=~/web/swisspedhealth_pipedev/docs/pages +input=~/web/DylanLawless.github.io/_topic/statistics + + +for file in $input/altman_bland_* +do + # Define the destination file path + dest_file="$output/$(basename $file)" + +# Use sed to transform the file content as required and save it to the new location +sed -e 's/layout: topic/layout: default/' \ +-e '1a\ +nav_order: 5' \ +-e '1a\ +math: mathjax' \ +-e '/tags:/d' \ +-e '/status:/d' \ +-e 's/title:/title: Stats/g' \ +-e '/subject:/d' \ +-e '/{% bibliography --cited %}/d' \ +-e 's/link images\//link assets\/images\//g' \ +-e 's/{% cite \([^%]*\) %}/\1/g' \ +$file > $dest_file +done + +cp ~/web/DylanLawless.github.io/images/altman* ~/web/swisspedhealth_pipedev/docs/assets/images/ +cp ~/web/DylanLawless.github.io/images/receiver_operating_* ~/web/swisspedhealth_pipedev/docs/assets/images/ diff --git a/src/bayesian_example1.Rmd b/src/bayesian_example1.Rmd index 8c3756d..477aecf 100644 --- a/src/bayesian_example1.Rmd +++ b/src/bayesian_example1.Rmd @@ -1,5 +1,5 @@ --- -title: "Bayesian 1 probability in placenta previa" +title: "Bayes 1 probability in placenta previa" output: md_document: variant: gfm @@ -170,4 +170,4 @@ The choice of the values for the sequence `seq(0.375, 0.525, 0.001)` in `df1` i ## Conclusion -Based on the data, the probability of a female birth given placenta previa is less than the general population's proportion. The findings are consistent despite different computational methods and prior assumptions, illustrating the power of Bayesian inference in real-world data interpretation. \ No newline at end of file +Based on the data, the probability of a female birth given placenta previa is less than the general population's proportion. The findings are consistent despite different computational methods and prior assumptions, illustrating the power of Bayesian inference in real-world data interpretation. diff --git a/src/bayesian_example2.Rmd b/src/bayesian_example2.Rmd index 097dd8b..249ccf5 100644 --- a/src/bayesian_example2.Rmd +++ b/src/bayesian_example2.Rmd @@ -1,5 +1,5 @@ --- -title: "Bayesian 2 probability in placenta previa" +title: "Bayes 2 probability in placenta previa" output: md_document: variant: gfm diff --git a/src/bayesian_genetic_carrier.Rmd b/src/bayesian_genetic_carrier.Rmd index eeb5b03..00ebf4d 100644 --- a/src/bayesian_genetic_carrier.Rmd +++ b/src/bayesian_genetic_carrier.Rmd @@ -1,5 +1,5 @@ --- -title: "Bayesian discrete probability example in genetics" +title: "Bayes discrete probability example in genetics" output: md_document: variant: gfm @@ -192,4 +192,4 @@ plot_bayesian_updates <- function(data) { # Plotting the combined data plot_bayesian_updates(all_data) -``` \ No newline at end of file +```