01_White_replication.Rmd

---
title: "Slowdown hypothesis"
output:
  html_document:
    df_print: paged
---

# Introduction 

The aim of this notebook is to extend and update the analyses presented in [White (2002)](https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1728-4457.2002.00059.x). 

The figures/analyses are as follows:

* Table 1: How well does a linear time trend in 21 high-income countries explain change in life exptectancy at birth between 1955 and 1991? 
* Figure 1: Life expectancy at birth over time, unweighted averages of 21 high-income countries, 1955-96
* Table 2: Change in age-specific death rates between 1955 and 1991, unweighted averages for 21 high-income countries
* Figure 2: Annual change in life expectancy at birth, average of 21 high-income countries, 1955-96
* Figure 3: relationship between life expectancy at birth in 1995 and annual change in life expectancy in 21 high-income countries, 1955-96
* Table 3: Life expectancy at birth and difference from average $e_{0}$ in 21 high-income countries, 1955 to 1995
* Figure 4: Variance betwene life expectancies for 21 high-income countries, 1955-96
* Figure 5: Relationship between early (1955-75) and late (1975-95) gains in life expectancy for 21 high-income countries 

All but one of the above can be replicated with the period life expectancy data alone

## Prereqs

```{r}
pacman::p_load(
  tidyverse, HMDHFDplus, here, 
  ggrepel, kableExtra, openxlsx
)

dta_e0 <- read_rds(here("tidy_data","e0_period.rds"))

```

## First step: define countries 

The high income countries were: 

* Australia (AUS)
* Austria (AUT)
* Belgium (BEL)
* Canada (CAN)
* Denmark (DNK)
* Finland (FIN)
* France (FRATNP)
* Germany (West) (DEUTW)
* Greece (GRC)
* Ireland (IRL)
* Italy (ITA)
* Japan (JPN)
* Netherlands (NLD)
* New Zealand (NZL_NP)
* Norway (NOR)
* Portugal (PRT)
* Spain (ESP)
* Sweden (SWE)
* Switzerland (CHE)
* United Kingdom (GBR_NP)
* United States (USA)


```{r}
source(here("scripts", "country_definitions.R"))

```

# Replications 

## Table 1

To do

## Figure 1

Life expectancy at birth over time, unweighted averages of 21 high-income countries, 1955-96

```{r}
fig_01_dta <- dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 1996)) %>% 
  group_by(year) %>% 
  summarise(mean_e0 = mean(e0, na.rm = T)) %>% 
  ungroup() 

fig_01_dta %>% 
  ggplot(aes(x = year, y = mean_e0)) +
  geom_point() +
  scale_x_continuous(breaks = seq(1955, 2015, by = 5)) + 
  scale_y_continuous(breaks = seq(65, 85, by = 1)) + 
  stat_smooth(method = "lm", se = F, colour = "black", linetype = "dashed") +
  labs(
    x = "Year", 
    y = "Life expectancy in years", 
    title = "Mean life expectancy of 21 high income countries from 1955 to 1996",
    subtitle = "Dashed line is linear regression trend line",
    caption = "Unweighted average of 21 countries used in White (2002), replication of figure 1"
  )

ggsave(here("figures", "pt1_white_orig.png"), height = 12, width = 20, units = "cm", dpi = 300)

fig_01_dta %>% 
  ggplot(aes(x = year, y = mean_e0)) +
  geom_point() +
  scale_x_continuous(breaks = seq(1955, 2015, by = 5)) + 
  scale_y_continuous(breaks = seq(65, 85, by = 1)) + 
  stat_smooth(method = "lm", se = F, colour = "black", linetype = "dashed") +
  labs(
    x = "Year", 
    y = "Life expectancy in years" 
  )

ggsave(here("figures", "pt1_white_orig_plain.png"), height = 12, width = 20, units = "cm", dpi = 300)


```

And what's the average improvement per year?

```{r}
fig_01_dta %>% 
  lm(mean_e0 ~ year, data = .) %>% 
  summary()

```

Around  0.211 years/year, not the 0.208 given in the paper. (So close enough given changes in data quality and HMD protocol changes)

```{r}
fig_01_dta %>% 
  lm(mean_e0 ~ I(year-1950), data = .) %>% 
  summary()

```


Now how has this continued?

```{r}
tmp <- dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016)) %>% 
  group_by(year) %>% 
  summarise(mean_e0 = mean(e0, na.rm = T)) %>% 
  ungroup() 

tmp %>% 
  ggplot(aes(x = year, y = mean_e0)) +
  geom_point() +
  scale_x_continuous(breaks = seq(1955, 2015, by = 5)) + 
  scale_y_continuous(breaks = seq(65, 85, by = 1)) + 
  stat_smooth(method = "lm", se = F, colour = "black", linetype = "dashed") +
  labs(
    x = "Year", 
    y = "Life expectancy in years", 
    title = "Mean life expectancy of 21 high income countries from 1955 to 2016",
    subtitle = "Dashed line is linear regression trend line",
    caption = "Unweighted average of 21 countries used in White (2002), replication and updating of figure 1"
  )

ggsave(here("figures", "pt1_white_update.png"), height = 12, width = 20, units = "cm", dpi = 300)
ggsave(here("figures", "e0_limits", "fig_1_a_all_highincome.png"), height = 12, width = 20, units = "cm", dpi = 300)
  
tmp %>% 
  ggplot(aes(x = year, y = mean_e0)) +
  geom_point() +
  scale_x_continuous(breaks = seq(1955, 2015, by = 5)) + 
  scale_y_continuous(breaks = seq(65, 85, by = 1)) + 
  stat_smooth(method = "lm", se = F, colour = "black", linetype = "dashed") +
  labs(
    x = "Year", 
    y = "Life expectancy in years" 
  )

ggsave(here("figures", "pt1_white_update_plain.png"), height = 12, width = 20, units = "cm", dpi = 300)
ggsave(here("figures", "e0_limits", "fig_1_a_all_highincome_plain.png"), height = 12, width = 20, units = "cm", dpi = 300)

```

The near perfect linearlity has continued beyond 1996. 


What does this look like for the earlier and later period? 

```{r}

tmp %>% 
  lm(mean_e0 ~ year, .) %>% 
  summary()

tmp %>% 
  filter(year < 1993) %>% 
  lm(mean_e0 ~ year, .) %>% 
  summary()

tmp %>% 
  filter(year >= 1993) %>% 
  lm(mean_e0 ~ year, .) %>% 
  summary()


```

The R-squared value is over 99% for all years, the earlier period, and the latter period.

The rate of annual increase actually increased rather than fell. 

Let's do this for each decade

```{r}
dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016)) %>%
  mutate(decade = cut(year, seq(1955, 2015, by = 10), include.lowest = TRUE)) %>% 
  group_by(year, decade) %>% 
  summarise(mean_e0 = mean(e0, na.rm = T)) %>% 
  group_by(decade) %>% 
  nest() %>% 
  filter(!is.na(decade)) %>% 
  mutate(mod = map(data, ~lm(mean_e0 ~ I(year - min(year)), data = .))) %>% 
  mutate(tdy_mod = map(mod, broom::tidy)) %>% 
  select(decade, tdy_mod) %>% 
  unnest()

```

So, there is no evidence in slowdown of the trend for the mean of high income countries.

* From 1955-1964 the average improvement was 0.204 years/year. 
* From 1965-1974 the average improvement was 0.188 years/year.
* From 1975-1984 the average improvement was 0.248 years/year. 
* From 1985-1994 the average improvement was 0.196 years/year.
* From 1995-2004 the average improvement was 0.245 years/year.
* From 2005-2014 the average improvement was 0.189 years/year. 

By subtracting the minimum year in each decade, the intercepts now show the life expectancy at the start of the period, and so are more meaningful. This increased from 69 in 1955 to 80 in 2005. 

Table of the above 

```{r}
show_2dp <- function(x){format(round(x, 2), nsmall = 2)}


make_wide_table <- function(x = dta_e0, 
                            country_selection, 
                            sex_selection = c("total", "male", "female"),
                            period_selection = c(1955, 2016),
                            cut_selection = seq(1955, 2015, by = 10)
                            ){
  x %>% 
    filter(code %in% country_selection) %>% 
    filter(sex == sex_selection) %>% 
    filter(between(year, period_selection[1], period_selection[2])) %>%
    mutate(decade = cut(year, cut_selection, include.lowest = TRUE)) %>% 
    group_by(year, decade) %>% 
    summarise(mean_e0 = mean(e0, na.rm = T)) %>% 
    group_by(decade) %>% 
    nest() %>% 
    filter(!is.na(decade)) %>% 
    mutate(mod = map(data, ~lm(mean_e0 ~ I(year - min(year)), data = .))) %>% 
    mutate(tdy_mod = map(mod, broom::tidy)) %>% 
    select(decade, tdy_mod) %>% 
    unnest() %>% 
    mutate(estimate = round(estimate, 2), lower = round(estimate - 2 * std.error, 2), upper = round(estimate + 2 * std.error, 2)) %>% 
    mutate(text = glue::glue(
      "{show_2dp(estimate)} [{show_2dp(lower)} to {show_2dp(upper)}]"
     )
    ) %>% 
    select(decade, term, text) %>% 
    spread(term, text) %>% 
    rename(
      `Decade` = `decade`,
      `Life expectancy at start of decade (+/- 2 SEs)` = `(Intercept)`,
      `Mean gain in life expectancy per year (+/- 2 SEs)` = `I(year - min(year))`
    )
}

save_format_dt <- function(wb, sheet_name, dta, cols = 1:3){
  wb %>% 
    addWorksheet(sheetName = sheet_name)
  wb %>% writeDataTable(sheet = sheet_name, x = dta)
  wb %>% setColWidths(sheet = sheet_name, cols = cols, widths = "auto")
}

wb <- createWorkbook()

# wb %>% addWorksheet(sheetName = "Decadal gains, high income")
# wb %>% openxlsx::writeDataTable(sheet = "Decadal gains, high income", 
#     x = make_wide_table(country_selection = high_income_countries)
#   )

# wb %>% addWorksheet(sheetName = "Decadal gains, all")
# wb %>% openxlsx::writeDataTable(sheet = "Decadal gains, all", 
#     x = make_wide_table(country_selection = all_distinct_countries)
#   )
# wb %>% setColWidths(sheet = "Decadal gains, all", cols = 1:3, widths = "auto")
save_format_dt(wb = wb,
  sheet_name = "Decadal gains, high income", 
  dta = make_wide_table(country_selection = high_income_countries)
)

save_format_dt(wb = wb,
  sheet_name = "Decadal gains, high income, fem", 
  dta = make_wide_table(country_selection = high_income_countries, sex_selection = "female" )
)


save_format_dt(wb = wb,
  sheet_name = "Decadal gains, high_income, mal", 
  dta = make_wide_table(country_selection = high_income_countries, sex_selection = "male")
)

save_format_dt(wb = wb,
  sheet_name = "Decadal gains, all", 
  dta = make_wide_table(country_selection = all_distinct_countries)
)

save_format_dt(wb = wb,
  sheet_name = "Decadal gains, all, fem", 
  dta = make_wide_table(country_selection = all_distinct_countries, sex_selection = "female" )
)


save_format_dt(wb = wb,
  sheet_name = "Decadal gains, all, mal", 
  dta = make_wide_table(country_selection = all_distinct_countries, sex_selection = "male")
)


saveWorkbook(wb, file = here("tables", "decadal_summaries.xlsx"), overwrite = TRUE)

```

As above, all countries

```{r}
tmp <- dta_e0 %>% 
  filter(code %in% all_distinct_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016)) %>% 
  group_by(year) %>% 
  summarise(mean_e0 = mean(e0, na.rm = T)) %>% 
  ungroup() 

tmp %>% 
  ggplot(aes(x = year, y = mean_e0)) +
  geom_point() +
  scale_x_continuous(breaks = seq(1955, 2015, by = 5)) + 
  scale_y_continuous(breaks = seq(65, 85, by = 1)) + 
  stat_smooth(method = "lm", se = F, colour = "black", linetype = "dashed") +
  labs(
    x = "Year", 
    y = "Life expectancy in years", 
    title = "Mean life expectancy of all HMD countries from 1955 to 2016",
    subtitle = "Dashed line is linear regression trend line",
    caption = "Unweighted average of all available countries, replication and updating of figure 1"
  )

ggsave(here("figures", "e0_limits", "appendix", "fig_1_a_all_hmd.png"), height = 12, width = 20, units = "cm", dpi = 300)
  
tmp %>% 
  ggplot(aes(x = year, y = mean_e0)) +
  geom_point() +
  scale_x_continuous(breaks = seq(1955, 2015, by = 5)) + 
  scale_y_continuous(breaks = seq(65, 85, by = 1)) + 
  stat_smooth(method = "lm", se = F, colour = "black", linetype = "dashed") +
  labs(
    x = "Year", 
    y = "Life expectancy in years"
  )

ggsave(here("figures", "e0_limits", "appendix", "fig_1_a_all_hmd_plain.png"), height = 12, width = 20, units = "cm", dpi = 300)
  

```

The fit is clearly less linear over time. This is likely due to changing composition of the countries over time, perhaps due to many Eastern European countries appearing post 1990ish, pulling the average down, but then increasing quickly thereafter. 

Again, what's the $R^2$ overall and by decade?

```{r}
tmp %>% 
  lm(mean_e0 ~ year, data = .) %>% 
  summary()
```

Now the $R^2$ is 'only' around 0.98. 

What does this look like for the earlier and later period? 

```{r}

tmp %>% 
  lm(mean_e0 ~ year, .) %>% 
  summary()

tmp %>% 
  filter(year < 1993) %>% 
  lm(mean_e0 ~ year, .) %>% 
  summary()

tmp %>% 
  filter(year >= 1993) %>% 
  lm(mean_e0 ~ year, .) %>% 
  summary()


```

Since 1993, the improvement for all available countries has been almost perfect ($R^2$ > 0.99), with an average improvement of around 0.255 years/year.  This is a general phemomena, not confined to historically rich nations.

And again, by decade

```{r}
dta_e0 %>% 
  filter(code %in% all_distinct_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016)) %>%
  mutate(decade = cut(year, seq(1955, 2015, by = 10), include.lowest = TRUE)) %>% 
  group_by(year, decade) %>% 
  summarise(mean_e0 = mean(e0, na.rm = T)) %>% 
  group_by(decade) %>% 
  nest() %>% 
  filter(!is.na(decade)) %>% 
  mutate(mod = map(data, ~lm(mean_e0 ~ I(year - min(year)), data = .))) %>% 
  mutate(tdy_mod = map(mod, broom::tidy)) %>% 
  select(decade, tdy_mod) %>% 
  unnest()

```

So, there is no evidence in slowdown of the trend for the mean of all distinct countries.

* From 1955-1964 the average improvement was 0.220 years/year. 
* From 1965-1974 the average improvement was 0.103 years/year.
* From 1975-1984 the average improvement was 0.164 years/year. 
* From 1985-1994 the average improvement was 0.067 years/year.
* From 1995-2004 the average improvement was 0.228 years/year.
* From 2005-2014 the average improvement was 0.287 years/year. 

The expansion of the range of countries in the HMD appears to have led to a slowdown from 1985-1994, but this is almost certainly compositional rather than genuine for any particular country. From 2005-2014 the average rate of improvement was faster than in the previous decadal period.

And as a more nicely formatted table

```{r}
dta_e0 %>% 
  filter(code %in% all_distinct_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016)) %>%
  mutate(decade = cut(year, seq(1955, 2015, by = 10), include.lowest = TRUE)) %>% 
  group_by(year, decade) %>% 
  summarise(mean_e0 = mean(e0, na.rm = T)) %>% 
  group_by(decade) %>% 
  nest() %>% 
  filter(!is.na(decade)) %>% 
  mutate(mod = map(data, ~lm(mean_e0 ~ I(year - min(year)), data = .))) %>% 
  mutate(tdy_mod = map(mod, broom::tidy)) %>% 
  select(decade, tdy_mod) %>% 
  unnest() %>% 
  mutate(text = glue::glue(
    "{show_2dp(estimate)} [{show_2dp(estimate - 2 * std.error)} to {show_2dp(estimate + 2 * std.error)}]"
   )
  ) %>% 
  select(decade, term, text) %>% 
  spread(term, text) %>% 
  rename(
    `Decade` = `decade`,
    `Life expectancy at start of decade (+/- 2 SEs)` = `(Intercept)`,
    `Mean gain in life expectancy per year (+/- 2 SEs)` = `I(year - min(year))`
  )

```


Let's look at how the UK compares with this. 

```{r}
tmp <- dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016)) %>% 
  group_by(year) %>% 
  summarise(mean_e0 = mean(e0, na.rm = T)) %>% 
  ungroup() 

tmp %>% 
  ggplot(aes(x = year, y = mean_e0)) +
  geom_point() +
  stat_smooth(method = "lm", se = F, colour = "black", linetype = "dashed") +
  scale_x_continuous(breaks = seq(1955, 2015, by = 5)) + 
  scale_y_continuous(breaks = seq(65, 85, by = 1)) + 
  geom_point(
    aes(x = year, y = e0),
    inherit.aes = F, 
    data = dta_e0 %>% 
      filter(sex == "total") %>% 
      filter(code == "GBR_NP") %>% 
      filter(between(year, 1955, 2016)) ,
    colour = "darkred", shape = 1
  ) + 
  labs(
    x = "Year", 
    y = "Life expectancy",
    title = "Life expectancy over time for UK and high income average",
    subtitle = "Red unfilled: UK; black filled: mean of 21 high income nations",
    caption = "Black dashed line: linear regression of mean of 21 high income countries. Source: HMD"
    
  )
ggsave(here("figures", "pt1_white_update_uk.png"), height = 12, width = 20, units = "cm", dpi = 300)
ggsave(here("figures", "e0_limits", "fig_01_b_highincomeuk.png"), height = 12, width = 20, units = "cm", dpi = 300)

tmp %>% 
  ggplot(aes(x = year, y = mean_e0)) +
  geom_point() +
  stat_smooth(method = "lm", se = F, colour = "black", linetype = "dashed") +
  scale_x_continuous(breaks = seq(1955, 2015, by = 5)) + 
  scale_y_continuous(breaks = seq(65, 85, by = 1)) + 
  geom_point(
    aes(x = year, y = e0),
    inherit.aes = F, 
    data = dta_e0 %>% 
      filter(sex == "total") %>% 
      filter(code == "GBR_NP") %>% 
      filter(between(year, 1955, 2016)) ,
    colour = "darkred", shape = 1
  ) + 
  labs(
    x = "Year", 
    y = "Life expectancy"
  )
ggsave(here("figures", "pt1_white_update_uk_plain.png"), height = 12, width = 20, units = "cm", dpi = 300)
ggsave(here("figures", "e0_limits", "fig_01_b_highincomeuk_plain.png"), height = 12, width = 20, units = "cm", dpi = 300)


```

So the UK fell below the rich country average in the late 1960s, and hasn't returned since. 

It looked like the UK was starting to close the gap in the late 2000s, but it then fell back in recent years. 


And the UK compared with all high income countries.

```{r}
tmp <- dta_e0 %>% 
  filter(code %in% all_distinct_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016)) %>% 
  group_by(year) %>% 
  summarise(mean_e0 = mean(e0, na.rm = T)) %>% 
  ungroup() 

tmp %>% 
  ggplot(aes(x = year, y = mean_e0)) +
  geom_point() +
  stat_smooth(method = "lm", se = F, colour = "black", linetype = "dashed") +
  geom_point(
    aes(x = year, y = e0),
    inherit.aes = F, 
    data = dta_e0 %>% 
      filter(sex == "total") %>% 
      filter(code == "GBR_NP") %>% 
      filter(between(year, 1955, 2016)) ,
    colour = "darkred", shape = 17
  ) + 
  labs(
    x = "Year", 
    y = "Life expectancy",
    title = "Life expectancy over time for average of all available countries (black) and the UK (red)",
    
    caption = "Black line is unweighted average of all countries available in HMD for each year."
    
  )

ggsave(here("figures", "e0_limits", "appendix", "fig_01_b_allhmd.png"), height = 12, width = 20, units = "cm", dpi = 300)


tmp %>% 
  ggplot(aes(x = year, y = mean_e0)) +
  geom_point() +
  stat_smooth(method = "lm", se = F, colour = "black", linetype = "dashed") +
  geom_point(
    aes(x = year, y = e0),
    inherit.aes = F, 
    data = dta_e0 %>% 
      filter(sex == "total") %>% 
      filter(code == "GBR_NP") %>% 
      filter(between(year, 1955, 2016)) ,
    colour = "darkred", shape = 17
  ) + 
  labs(
    x = "Year", 
    y = "Life expectancy"
  )

ggsave(here("figures", "e0_limits", "appendix", "fig_01_b_allhmd_plain.png"), height = 12, width = 20, units = "cm", dpi = 300)

```


And what was the rate of improvement in UK only?

```{r}
dta_e0 %>% 
  filter(code== "GBR_NP") %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016)) %>%
  mutate(decade = cut(year, seq(1955, 2015, by = 10), include.lowest = TRUE)) %>% 
  group_by(decade) %>% 
  nest() %>% 
  filter(!is.na(decade)) %>% 
  mutate(mod = map(data, ~lm(e0 ~ I(year - min(year)), data = .))) %>% 
  mutate(tdy_mod = map(mod, broom::tidy)) %>% 
  select(decade, tdy_mod) %>% 
  unnest()


```

And nicely formatted

```{r}
dta_e0 %>% 
  filter(code== "GBR_NP") %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016)) %>%
  mutate(decade = cut(year, seq(1955, 2015, by = 10), include.lowest = TRUE)) %>% 
  group_by(decade) %>% 
  nest() %>% 
  filter(!is.na(decade)) %>% 
  mutate(mod = map(data, ~lm(e0 ~ I(year - min(year)), data = .))) %>% 
  mutate(tdy_mod = map(mod, broom::tidy)) %>% 
  select(decade, tdy_mod) %>% 
  unnest() %>% 
  mutate(text = glue::glue(
    "{show_2dp(estimate)} [{show_2dp(estimate - 2 * std.error)} to {show_2dp(estimate + 2 * std.error)}]"
   )
  ) %>% 
  select(decade, term, text) %>% 
  spread(term, text) %>% 
  rename(
    `Decade` = `decade`,
    `Life expectancy at start of decade (+/- 2 SEs)` = `(Intercept)`,
    `Mean gain in life expectancy per year (+/- 2 SEs)` = `I(year - min(year))`
  )

```

### Life expectancy for best country (labelled by this country)


A similar approach to White (2002) is presented in [Christenson 2009](https://linkinghub.elsevier.com/retrieve/pii/S0140673609614604) Their approach is to look at life expectancy for the top performing country, rather than the average of these. Let's now replicate this 


```{r}
tmp <- dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016)) %>% 
  group_by(year) %>% 
  summarise(
    max_e0 = max(e0, na.rm = T),
    which_code = code[e0 == max(e0)]
    ) %>% 
  ungroup() 

tmp %>% 
  ggplot(aes(x = year, y = max_e0)) +
  geom_point() +
  geom_text(
    aes(x = year, y = max_e0, label = which_code), 
    inherit.aes = F,
    data = tmp %>% filter(year %in% seq(1955, 2015, by = 5)), 
    nudge_y = 0.5
    
  ) + 
  stat_smooth(method = "lm", se = F, colour = "black", linetype = "dashed") +
  geom_vline(xintercept = 1996)

```

Japan has remained the high income country with the highest life expectancy since the mid 1980s. 


What's the linear rate of improvement for the best performing line compared with the average improvement line? 

```{r}
tmp %>% 
  filter(between(year, 1955, 2016)) %>% 
  lm(max_e0 ~ I(year - min(year)), .) %>% 
  summary()

```

So, the rate of improvement has been about 0.198 years/year for the best performing line, compared with 0.191 years/ year for the average of the 21 countries. 

Let's now plot the mean, upper and lower together. 

```{r}
tmp <- dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016)) %>% 
  group_by(year) %>% 
  summarise(
    max_e0 = max(e0, na.rm = T),
    which_code_max = code[e0 == max(e0)],
    min_e0 = min(e0, na.rm = T),
    which_code_min = code[e0 == min(e0)],
    mean_e0 = mean(e0, na.rm = T),
    uk_e0 = e0[code == "GBR_NP"]
    ) %>% 
  ungroup() 

tmp %>% 
  ggplot(aes(x = year)) +
  geom_point(aes(y = max_e0), colour = "darkgreen") +
  geom_text(
    aes(x = year, y = max_e0, label = which_code_max), 
    inherit.aes = F,
    colour = "darkgreen",
    data = tmp %>% filter(year %in% seq(1955, 2015, by = 5)), 
    nudge_y = 0.5
  ) + 
  geom_point(aes(y = mean_e0)) +
  geom_point(aes(y = min_e0), colour = "red") +
  geom_point(aes(y = uk_e0), colour = "darkred", shape = 17) +
  geom_text(
    aes(x = year, y = min_e0, label = which_code_min), 
    inherit.aes = F,
    colour = "red",
    data = tmp %>% filter(year %in% seq(1955, 2015, by = 5)), 
    nudge_y = - 0.5
  ) + 
  stat_smooth(aes(y = max_e0), method = "lm", se = F, colour = "darkgreen", linetype = "dashed") +
  stat_smooth(aes(y = min_e0), method = "lm", se = F, colour = "red", linetype = "dashed") +
  stat_smooth(aes(y = mean_e0), method = "lm", se = F, colour = "black", linetype = "dashed") +
  stat_smooth(aes(y = uk_e0), method = "lm", se = F, colour = "darkred", linetype = "dashed") +
  labs(
    x = "Year", y = "Life expectancy in years",
    title = "Life expectancy trends for best performing, worst performing, average high income, and UK, 1955-2016",
    subtitle = "Best performing: Green; Worst performing: Red; Average: Black; UK: Dark Red",
    caption = "Unweighted average of 21 high income countries in White 2002. Linear trends as dashed lines"
  )

ggsave(here("figures", "pt1_sandwich.png"), height = 12, width = 20, units = "cm", dpi = 300)
ggsave(here("figures", "e0_limits", "fig_1_c_sandwich.png"), height = 12, width = 20, units = "cm", dpi = 300)

tmp %>% 
  ggplot(aes(x = year)) +
  geom_point(aes(y = max_e0), colour = "darkgreen") +
  geom_text(
    aes(x = year, y = max_e0, label = which_code_max), 
    inherit.aes = F,
    colour = "darkgreen",
    data = tmp %>% filter(year %in% seq(1955, 2015, by = 5)), 
    nudge_y = 0.5
  ) + 
  geom_point(aes(y = mean_e0)) +
  geom_point(aes(y = min_e0), colour = "red") +
  geom_point(aes(y = uk_e0), colour = "darkred", shape = 17) +
  geom_text(
    aes(x = year, y = min_e0, label = which_code_min), 
    inherit.aes = F,
    colour = "red",
    data = tmp %>% filter(year %in% seq(1955, 2015, by = 5)), 
    nudge_y = - 0.5
  ) + 
  stat_smooth(aes(y = max_e0), method = "lm", se = F, colour = "darkgreen", linetype = "dashed") +
  stat_smooth(aes(y = min_e0), method = "lm", se = F, colour = "red", linetype = "dashed") +
  stat_smooth(aes(y = mean_e0), method = "lm", se = F, colour = "black", linetype = "dashed") +
  stat_smooth(aes(y = uk_e0), method = "lm", se = F, colour = "darkred", linetype = "dashed") +
  labs(
    x = "Year", y = "Life expectancy in years"
  )

ggsave(here("figures", "pt1_sandwich_plain.png"), height = 12, width = 20, units = "cm", dpi = 300)
ggsave(here("figures", "e0_limits", "fig_1_c_sandwich_plain.png"), height = 12, width = 20, units = "cm", dpi = 300)


```

In the figure above, the red dots show the worst performing of the 21 high income countries, the green dots that for the best performing country. The black dots show the average (mean) for these countries, and the blue dots show the life expectancies for the UK. Every five years, the best and worst performing country is labelled. 

Table summarising above 

* Intercept: $e_0$ in 1955
* Slope: Average $e_0$ gain per year since
* Fit: $R^2$

For worst, best, mean

```{r}
summary_best_worst_mean_hi <- 
  dta_e0 %>% 
    filter(code %in% high_income_countries) %>% 
    filter(sex == "total") %>% 
    filter(between(year, 1955, 2016)) %>% 
    group_by(year) %>% 
    summarise(
      max_e0 = max(e0, na.rm = T),
      which_code_max = code[e0 == max(e0)],
      min_e0 = min(e0, na.rm = T),
      which_code_min = code[e0 == min(e0)],
      mean_e0 = mean(e0, na.rm = T),
      uk_e0 = e0[code == "GBR_NP"]
      ) %>% 
    ungroup() %>% 
    select(year, min_e0, max_e0, mean_e0) %>% 
    gather(-year, key = "popn", value = "e0") %>% 
    group_by(popn) %>% 
    nest() %>% 
    mutate(model = map(data, ~lm(e0 ~ year, data = .))) %>% 
    mutate(mdl_tidy = map(model, broom::tidy)) %>% 
    mutate(mdl_diag = map(model, broom::glance))

```

```{r, results='asif'}
pt1 <- 
  summary_best_worst_mean_hi %>% 
    select(popn, mdl_tidy) %>% 
    unnest(mdl_tidy) %>% 
    filter(term == "year") 

pt2 <- 
  summary_best_worst_mean_hi %>% 
      select(popn, mdl_diag) %>% 
      unnest(mdl_diag) %>% 
    select(popn, r.squared, adj.r.squared)

pt3 <- 
  dta_e0 %>% 
    filter(code %in% high_income_countries) %>% 
    filter(sex == "total") %>% 
    filter(between(year, 1955, 2016)) %>% 
    group_by(year) %>% 
    summarise(
      max_e0 = max(e0, na.rm = T),
      which_code_max = code[e0 == max(e0)],
      min_e0 = min(e0, na.rm = T),
      which_code_min = code[e0 == min(e0)],
      mean_e0 = mean(e0, na.rm = T),
      uk_e0 = e0[code == "GBR_NP"]
      ) %>% 
    ungroup() %>% 
    select(year, min_e0, max_e0, mean_e0) %>% 
    filter(year == 1955) %>% 
    gather(-year, key = "popn", value = "e0_1955") %>% 
    select(-year)

pt1 %>% left_join(pt2) %>% left_join(pt3) %>% 
  select(-term) %>%
  ungroup() %>% 
  mutate(popn = case_when(
    popn == "min_e0"  ~ "Lowest", 
    popn == "max_e0"  ~ "Highest", 
    popn == "mean_e0"  ~ "Average", 
    TRUE ~ NA_character_
  )) %>% 
  rename(slope = estimate) %>% 
  select(popn, e0_1955, slope, everything(), -p.value) %>% 
  kable(digits = 3,
        col.names = c("Population", "e0 in 1955", "Average annual gain", "SE", "t value", "R squared", "Adj. R Squared")) %>% 
  kable_styling() 
```

Life expectancy for all countries 

```{r}
summary_best_worst_mean_hi <- 
  dta_e0 %>% 
    filter(code %in% all_distinct_countries) %>% 
    filter(sex == "total") %>% 
    filter(between(year, 1955, 2016)) %>% 
    group_by(year) %>% 
    summarise(
      max_e0 = max(e0, na.rm = T),
      which_code_max = code[e0 == max(e0)],
      min_e0 = min(e0, na.rm = T),
      which_code_min = code[e0 == min(e0)],
      mean_e0 = mean(e0, na.rm = T),
      uk_e0 = e0[code == "GBR_NP"]
      ) %>% 
    ungroup() %>% 
    select(year, min_e0, max_e0, mean_e0) %>% 
    gather(-year, key = "popn", value = "e0") %>% 
    group_by(popn) %>% 
    nest() %>% 
    mutate(model = map(data, ~lm(e0 ~ year, data = .))) %>% 
    mutate(mdl_tidy = map(model, broom::tidy)) %>% 
    mutate(mdl_diag = map(model, broom::glance))

```

```{r, results = 'asis'}
pt1 <- 
  summary_best_worst_mean_hi %>% 
    select(popn, mdl_tidy) %>% 
    unnest(mdl_tidy) %>% 
    filter(term == "year") 

pt2 <- 
  summary_best_worst_mean_hi %>% 
      select(popn, mdl_diag) %>% 
      unnest(mdl_diag) %>% 
    select(popn, r.squared, adj.r.squared)

pt3 <- 
  dta_e0 %>% 
    filter(code %in% all_distinct_countries) %>% 
    filter(sex == "total") %>% 
    filter(between(year, 1955, 2016)) %>% 
    group_by(year) %>% 
    summarise(
      max_e0 = max(e0, na.rm = T),
      which_code_max = code[e0 == max(e0)],
      min_e0 = min(e0, na.rm = T),
      which_code_min = code[e0 == min(e0)],
      mean_e0 = mean(e0, na.rm = T),
      uk_e0 = e0[code == "GBR_NP"]
      ) %>% 
    ungroup() %>% 
    select(year, min_e0, max_e0, mean_e0) %>% 
    filter(year == 1955) %>% 
    gather(-year, key = "popn", value = "e0_1955") %>% 
    select(-year)

pt1 %>% left_join(pt2) %>% left_join(pt3) %>% 
  select(-term) %>%
  ungroup() %>% 
  mutate(popn = case_when(
    popn == "min_e0"  ~ "Lowest", 
    popn == "max_e0"  ~ "Highest", 
    popn == "mean_e0"  ~ "Average", 
    TRUE ~ NA_character_
  )) %>% 
  rename(slope = estimate) %>% 
  select(popn, e0_1955, slope, everything(), -p.value) %>% 
  kable(digits = 3,
        col.names = c("Population", "e0 in 1955", "Average annual gain", "SE", "t value", "R squared", "Adj. R Squared")) %>% 
  kable_styling() 
```

Let's do the above, but for all countries 

```{r}

all_countries_summary <- 
  dta_e0 %>% 
    filter(code %in% all_distinct_countries) %>% 
    mutate(high_income = ifelse(code %in% high_income_countries, "High Income", "Other")) %>% 
    filter(between(year, 1955, 2016)) %>%
    mutate(start_year = year - min(year)) %>% 
    group_by(code, high_income, sex) %>% 
    nest() %>% 
    mutate(linmod = map(data, ~lm(e0 ~ start_year, data = .))) %>% 
    mutate(
      tidied  = map(linmod, broom::tidy), 
      glanced = map(linmod, broom::glance)
    ) %>% 
    select(code, sex, high_income, tidied, glanced) %>% 
    unnest(cols = c(tidied, glanced), names_sep = "_") %>% 
    select(code, high_income, sex, 
           term = tidied_term, estimate = tidied_estimate, SE = tidied_std.error, t_value = tidied_statistic,
           r_squared = glanced_r.squared, adj_r_squared = glanced_adj.r.squared
    ) %>% 
    arrange(high_income, code, sex)


wb <- createWorkbook()
addWorksheet(wb, sheetName = "All Countries Summaries")
writeData(wb = wb, sheet = "All Countries Summaries", x = all_countries_summary)
openxlsx::saveWorkbook(wb, file = here("tables", "all_summaries.xlsx"), overwrite = TRUE)

all_countries_summary
```


```{r}
tmp %>% 
  lm(max_e0 ~ I(year - min(year)), .) %>% 
  summary()

tmp %>% 
  lm(min_e0 ~ I(year - min(year)), .) %>% 
  summary()

```

The average annual improvement for the best-performing country is 0.198 years/year, with an R-squared of 0.979.

For the worst-performing country, the average annual improvement is 0.299 years/year, indicating a tendency towards convergence. The R-squared is somewhat lower, at 0.966. 

The USA has been the worst performing high income country since the early 2000s, and its life expectancy has fallen steadily below the long-term trend for worst-performing countries. 

The UK's life expectancy was that projected for the worst-performing country trend in 2015, and in 2016 fell below the linear projection for the worst performing country. 

Let's now repeat the exercise for all available hmd countries. This is likely to change the worst performing but not the best. 

```{r}
tmp <- dta_e0 %>% 
  filter(code %in% all_distinct_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016)) %>% 
  group_by(year) %>% 
  summarise(
    max_e0 = max(e0, na.rm = T),
    which_code_max = code[e0 == max(e0)],
    min_e0 = min(e0, na.rm = T),
    which_code_min = code[e0 == min(e0)],
    mean_e0 = mean(e0, na.rm = T),
    uk_e0 = e0[code == "GBR_NP"]
    ) %>% 
  ungroup() 

tmp %>% 
  ggplot(aes(x = year)) +
  geom_point(aes(y = max_e0), colour = "darkgreen") +
  geom_text(
    aes(x = year, y = max_e0, label = which_code_max), 
    inherit.aes = F,
    colour = "darkgreen",
    data = tmp %>% filter(year %in% seq(1955, 2015, by = 5)), 
    nudge_y = 0.5
  ) + 
  geom_point(aes(y = mean_e0)) +
  geom_point(aes(y = min_e0), colour = "red") +
  geom_point(aes(y = uk_e0), colour = "darkred", shape = 17) +
  geom_text(
    aes(x = year, y = min_e0, label = which_code_min), 
    inherit.aes = F,
    colour = "red",
    data = tmp %>% filter(year %in% seq(1955, 2015, by = 5)), 
    nudge_y = - 0.5
  ) + 
  stat_smooth(aes(y = max_e0), method = "lm", se = F, colour = "darkgreen", linetype = "dashed") +
  stat_smooth(aes(y = min_e0), method = "lm", se = F, colour = "red", linetype = "dashed") +
  stat_smooth(aes(y = mean_e0), method = "lm", se = F, colour = "black", linetype = "dashed") +
  stat_smooth(aes(y = uk_e0), method = "lm", se = F, colour = "darkred", linetype = "dashed") +
  labs(
    x = "Year", y = "Life expectancy in years",
    title = "Life expectancy trends for best performing, worst performing, average HMD country, and UK, 1955-2016",
    subtitle = "Best performing: Green; Worst performing: Red; Average: Black; UK: Dark Red",
    caption = "Unweighted average of all countries available for a year from HMD. Linear trends as dashed lines"
  )

ggsave("figures/e0_limits/appendix/fig_1_c_sandwich_all_hmd.png", height = 12, width = 20, units = "cm", dpi = 300)

tmp %>% 
  ggplot(aes(x = year)) +
  geom_point(aes(y = max_e0), colour = "darkgreen") +
  geom_text(
    aes(x = year, y = max_e0, label = which_code_max), 
    inherit.aes = F,
    colour = "darkgreen",
    data = tmp %>% filter(year %in% seq(1955, 2015, by = 5)), 
    nudge_y = 0.5
  ) + 
  geom_point(aes(y = mean_e0)) +
  geom_point(aes(y = min_e0), colour = "red") +
  geom_point(aes(y = uk_e0), colour = "darkred", shape = 17) +
  geom_text(
    aes(x = year, y = min_e0, label = which_code_min), 
    inherit.aes = F,
    colour = "red",
    data = tmp %>% filter(year %in% seq(1955, 2015, by = 5)), 
    nudge_y = - 0.5
  ) + 
  stat_smooth(aes(y = max_e0), method = "lm", se = F, colour = "darkgreen", linetype = "dashed") +
  stat_smooth(aes(y = min_e0), method = "lm", se = F, colour = "red", linetype = "dashed") +
  stat_smooth(aes(y = mean_e0), method = "lm", se = F, colour = "black", linetype = "dashed") +
  stat_smooth(aes(y = uk_e0), method = "lm", se = F, colour = "darkred", linetype = "dashed") +
  labs(
    x = "Year", y = "Life expectancy in years"
  )

ggsave("figures/e0_limits/appendix/fig_1_c_sandwich_all_hmd_plain.png", height = 12, width = 20, units = "cm", dpi = 300)


```

Now Russia has been the worst performing country from the early 1970s. Belarus (?) appears to be for the last couple of years but this may be because the records from Russia aren't available for these years, as indicated by the discontinuity. 

## Figure 2


```{r}
fig_02_dta <- dta_e0 %>% 
  filter(sex == "total") %>%
  filter(code %in% high_income_countries) %>% 
  filter(between(year, 1954, 1996)) %>% 
  group_by(year) %>% 
  summarise(mean_e0 = mean(e0, na.rm = T)) %>% 
  ungroup() %>% 
  arrange(year) %>% 
  mutate(ch_e0 = mean_e0 - lag(mean_e0)) 
  
fig_02_dta %>% 
  ggplot(aes(x = year, y = ch_e0)) + 
  geom_point() + geom_line() +
  geom_hline(yintercept = 0) + 
  labs(x = "Year", y = "Annual change in life expectancy", 
       title = "Average annual change in life expectancy over time, 1955-1996",
       caption = "Unweighted average (mean) of 21 high income countries used in White 2002")


```

And how about subsequent to this?

Without bars, then with bars

```{r}
tmp <- dta_e0 %>% 
  filter(sex == "total") %>%
  filter(code %in% high_income_countries) %>% 
  filter(between(year, 1954, 2016)) %>% 
  group_by(year) %>% 
  summarise(mean_e0 = mean(e0, na.rm = T)) %>% 
  ungroup() %>% 
  arrange(year) %>% 
  mutate(ch_e0 = mean_e0 - lag(mean_e0)) 
  
fig_ch <- tmp %>% 
  ggplot(aes(x = year, y = ch_e0)) + 
  geom_point() + geom_line() +
  geom_hline(yintercept = 0) + 
  labs(x = "Year", y = "Annual change in life expectancy", 
       title = "Average annual change in life expectancy over time, 1955-2016",
       caption = "Unweighted average (mean) of 21 high income countries used in White 2002")

print(fig_ch)

ggsave("figures/supp_avg_ch_e0.png", height = 12, width = 29, units = "cm", dpi = 300)

fig_ch +
    geom_hline(aes(yintercept = mean(ch_e0, na.rm = TRUE)), size = 1.5, linetype = "dashed") + 
  annotate(geom = "rect",
      xmin = -Inf, xmax = Inf,           
      ymin = mean(tmp$ch_e0, na.rm = TRUE) - 0.5 * sd(tmp$ch_e0, na.rm = TRUE),
      ymax = mean(tmp$ch_e0, na.rm = TRUE) + 0.5 * sd(tmp$ch_e0, na.rm = TRUE),
    alpha = 0.1, fill = "blue"
  ) + 
  annotate(geom = "rect",
      xmin = -Inf, xmax = Inf,           
      ymin = mean(tmp$ch_e0, na.rm = TRUE) - 1.0 * sd(tmp$ch_e0, na.rm = TRUE),
      ymax = mean(tmp$ch_e0, na.rm = TRUE) + 1.0 * sd(tmp$ch_e0, na.rm = TRUE),
    alpha = 0.1, fill = "blue"
  ) + 
  annotate(geom = "rect",
      xmin = -Inf, xmax = Inf,           
      ymin = mean(tmp$ch_e0, na.rm = TRUE) - 1.5 * sd(tmp$ch_e0, na.rm = TRUE),
      ymax = mean(tmp$ch_e0, na.rm = TRUE) + 1.5 * sd(tmp$ch_e0, na.rm = TRUE),
    alpha = 0.1, fill = "blue"
  ) + 
  annotate(geom = "rect",
      xmin = -Inf, xmax = Inf,           
      ymin = mean(tmp$ch_e0, na.rm = TRUE) - 2.0 * sd(tmp$ch_e0, na.rm = TRUE),
      ymax = mean(tmp$ch_e0, na.rm = TRUE) + 2.0 * sd(tmp$ch_e0, na.rm = TRUE),
    alpha = 0.1, fill = "blue"
  ) + 
  geom_hline(aes(yintercept = mean(ch_e0, na.rm = TRUE) - 1.96 * sd(ch_e0, na.rm = TRUE)), linetype = "dashed") + 
  geom_hline(aes(yintercept = mean(ch_e0, na.rm = TRUE) + 1.96 * sd(ch_e0, na.rm = TRUE)), linetype = "dashed") + 


ggsave("figures/supp_avg_ch_e0_bars.png", height = 12, width = 29, units = "cm", dpi = 300)

```

It seems clear that life expectancy improvements accelarted after the early 1990s in the 21 high income countries. However also that the 2014-15 fall represented the first year of falling life expectancy (on average) since 1992-3. 

Let's see how the UK compares with this 


```{r}
tmp <- dta_e0 %>% 
  filter(sex == "total") %>% 
  filter(code %in% high_income_countries) %>% 
  filter(between(year, 1954, 2016)) %>% 
  group_by(year) %>% 
  summarise(mean_e0 = mean(e0, na.rm = T)) %>% 
  ungroup() %>% 
  arrange(year) %>% 
  mutate(ch_e0 = mean_e0 - lag(mean_e0))
  
tmp2 <- dta_e0 %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1954, 2016)) %>% 
  filter(code == "GBR_NP") %>% 
  mutate(ch_e0 = e0 - lag(e0))

tmp %>%
  ggplot(aes(x = year, y = ch_e0)) +
  geom_point() + geom_line() +
  geom_point(
    aes(x = year, y = ch_e0),
    colour = "red", shape = 2, 
    data = tmp2,
    inherit.aes = F
  ) + 
  geom_line(
    aes(x = year, y = ch_e0),
    colour = "red", linetype = "dashed", 
    data = tmp2,
    inherit.aes = F
  ) + 
  geom_hline(yintercept = 0) + 
  labs(
    title = "Average annual change in life expectancy for high income countries, and the UK, 1955-2016",
    subtitle = "Average of 21 countries: Black circles; UK: Red triangles",
    x = "Year", y = "Annual change in life expectancy", 
    caption = "Unweighted average (mean) of 21 countries from White 2002"
  )

ggsave("figures/supp_avg_ch_e0_cf_uk.png", height = 12, width = 29, units = "cm", dpi = 300)

```

UK, no bars

```{r}

tmp <- dta_e0 %>% 
  filter(sex == "total") %>%
  filter(code %in% "GBR_NP") %>% 
  filter(between(year, 1954, 2016)) %>% 
  group_by(year) %>% 
  summarise(mean_e0 = mean(e0, na.rm = T)) %>% 
  ungroup() %>% 
  arrange(year) %>% 
  mutate(ch_e0 = mean_e0 - lag(mean_e0)) 
  
tmp %>% 
  ggplot(aes(x = year, y = ch_e0)) + 
  geom_point() + geom_line() +


  geom_hline(yintercept = 0) + 
  labs(x = "Year", y = "Annual change in life expectancy", 
       title = "Annual change in life expectancy over time, UK, 1955-2016")

ggsave("figures/supp_uk_ch_e0.png", height = 12, width = 29, units = "cm", dpi = 300)


```


Bars for UK

```{r}


tmp <- dta_e0 %>% 
  filter(sex == "total") %>%
  filter(code %in% "GBR_NP") %>% 
  filter(between(year, 1954, 2016)) %>% 
  group_by(year) %>% 
  summarise(mean_e0 = mean(e0, na.rm = T)) %>% 
  ungroup() %>% 
  arrange(year) %>% 
  mutate(ch_e0 = mean_e0 - lag(mean_e0)) 
  
tmp %>% 
  ggplot(aes(x = year, y = ch_e0)) + 
  geom_point() + geom_line() +
  geom_hline(aes(yintercept = mean(ch_e0, na.rm = TRUE)), size = 1.5, linetype = "dashed") + 
  annotate(geom = "rect",
      xmin = -Inf, xmax = Inf,           
      ymin = mean(tmp$ch_e0, na.rm = TRUE) - 0.5 * sd(tmp$ch_e0, na.rm = TRUE),
      ymax = mean(tmp$ch_e0, na.rm = TRUE) + 0.5 * sd(tmp$ch_e0, na.rm = TRUE),
    alpha = 0.1, fill = "blue"
  ) + 
  annotate(geom = "rect",
      xmin = -Inf, xmax = Inf,           
      ymin = mean(tmp$ch_e0, na.rm = TRUE) - 1.0 * sd(tmp$ch_e0, na.rm = TRUE),
      ymax = mean(tmp$ch_e0, na.rm = TRUE) + 1.0 * sd(tmp$ch_e0, na.rm = TRUE),
    alpha = 0.1, fill = "blue"
  ) + 
  annotate(geom = "rect",
      xmin = -Inf, xmax = Inf,           
      ymin = mean(tmp$ch_e0, na.rm = TRUE) - 1.5 * sd(tmp$ch_e0, na.rm = TRUE),
      ymax = mean(tmp$ch_e0, na.rm = TRUE) + 1.5 * sd(tmp$ch_e0, na.rm = TRUE),
    alpha = 0.1, fill = "blue"
  ) + 
  annotate(geom = "rect",
      xmin = -Inf, xmax = Inf,           
      ymin = mean(tmp$ch_e0, na.rm = TRUE) - 2.0 * sd(tmp$ch_e0, na.rm = TRUE),
      ymax = mean(tmp$ch_e0, na.rm = TRUE) + 2.0 * sd(tmp$ch_e0, na.rm = TRUE),
    alpha = 0.1, fill = "blue"
  ) + 

  geom_hline(aes(yintercept = mean(ch_e0, na.rm = TRUE) - 1.96 * sd(ch_e0, na.rm = TRUE)), linetype = "dashed") + 
  geom_hline(aes(yintercept = mean(ch_e0, na.rm = TRUE) + 1.96 * sd(ch_e0, na.rm = TRUE)), linetype = "dashed") + 

  geom_hline(yintercept = 0) + 
  labs(x = "Year", y = "Annual change in life expectancy", 
       title = "Annual change in life expectancy over time, UK, 1955-2016")

ggsave("figures/supp_uk_ch_e0_bars.png", height = 12, width = 29, units = "cm", dpi = 300)

```


And the USA


```{r}


tmp <- dta_e0 %>% 
  filter(sex == "total") %>%
  filter(code %in% "USA") %>% 
  filter(between(year, 1954, 2016)) %>% 
  group_by(year) %>% 
  summarise(mean_e0 = mean(e0, na.rm = T)) %>% 
  ungroup() %>% 
  arrange(year) %>% 
  mutate(ch_e0 = mean_e0 - lag(mean_e0)) 
  
tmp %>% 
  ggplot(aes(x = year, y = ch_e0)) + 
  geom_point() + geom_line() +
  geom_hline(aes(yintercept = mean(ch_e0, na.rm = TRUE)), size = 1.5, linetype = "dashed") + 
  annotate(geom = "rect",
      xmin = -Inf, xmax = Inf,           
      ymin = mean(tmp$ch_e0, na.rm = TRUE) - 0.5 * sd(tmp$ch_e0, na.rm = TRUE),
      ymax = mean(tmp$ch_e0, na.rm = TRUE) + 0.5 * sd(tmp$ch_e0, na.rm = TRUE),
    alpha = 0.1, fill = "blue"
  ) + 
  annotate(geom = "rect",
      xmin = -Inf, xmax = Inf,           
      ymin = mean(tmp$ch_e0, na.rm = TRUE) - 1.0 * sd(tmp$ch_e0, na.rm = TRUE),
      ymax = mean(tmp$ch_e0, na.rm = TRUE) + 1.0 * sd(tmp$ch_e0, na.rm = TRUE),
    alpha = 0.1, fill = "blue"
  ) + 
  annotate(geom = "rect",
      xmin = -Inf, xmax = Inf,           
      ymin = mean(tmp$ch_e0, na.rm = TRUE) - 1.5 * sd(tmp$ch_e0, na.rm = TRUE),
      ymax = mean(tmp$ch_e0, na.rm = TRUE) + 1.5 * sd(tmp$ch_e0, na.rm = TRUE),
    alpha = 0.1, fill = "blue"
  ) + 
  annotate(geom = "rect",
      xmin = -Inf, xmax = Inf,           
      ymin = mean(tmp$ch_e0, na.rm = TRUE) - 2.0 * sd(tmp$ch_e0, na.rm = TRUE),
      ymax = mean(tmp$ch_e0, na.rm = TRUE) + 2.0 * sd(tmp$ch_e0, na.rm = TRUE),
    alpha = 0.1, fill = "blue"
  ) + 

  geom_hline(aes(yintercept = mean(ch_e0, na.rm = TRUE) - 1.96 * sd(ch_e0, na.rm = TRUE)), linetype = "dashed") + 
  geom_hline(aes(yintercept = mean(ch_e0, na.rm = TRUE) + 1.96 * sd(ch_e0, na.rm = TRUE)), linetype = "dashed") + 

  geom_hline(yintercept = 0) + 
  labs(x = "Year", y = "Annual change in life expectancy", 
       title = "Annual change in life expectancy over time, USA, 1955-2016")

ggsave("figures/supp_usa_ch_e0_bars.png", height = 12, width = 29, units = "cm", dpi = 300)

```


And now what about Scotland?

```{r}


tmp <- dta_e0 %>% 
  filter(sex == "total") %>% 
  filter(code %in% high_income_countries) %>% 
  filter(between(year, 1954, 2016)) %>% 
  group_by(year) %>% 
  summarise(mean_e0 = mean(e0, na.rm = T)) %>% 
  ungroup() %>% 
  arrange(year) %>% 
  mutate(ch_e0 = mean_e0 - lag(mean_e0))
  
tmp2 <- dta_e0 %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1954, 2016)) %>% 
  filter(code == "GBR_SCO") %>% 
  mutate(ch_e0 = e0 - lag(e0))

tmp %>%
  ggplot(aes(x = year, y = ch_e0)) +
  geom_point() + geom_line() +
  geom_point(
    aes(x = year, y = ch_e0),
    colour = "blue", shape = 2, 
    data = tmp2,
    inherit.aes = F
  ) + 
  geom_line(
    aes(x = year, y = ch_e0),
    colour = "blue", linetype = "dashed", 
    data = tmp2,
    inherit.aes = F
  ) + 
  geom_hline(yintercept = 0) +
    labs(
    title = "Average annual change in life expectancy for high income countries, and Scotland, 1955-2016",
    subtitle = "Average of 21 countries: Black circles; Scotland: Blue triangles",
    x = "Year", y = "Annual change in life expectancy", 
    caption = "Unweighted average (mean) of 21 countries from White 2002"
  )

ggsave(here("figures", "supp_avg_ch_e0_cf_scot.png"), height = 12, width = 29, units = "cm", dpi = 300)


```

How about Japan and the USA?

```{r}


tmp <- dta_e0 %>% 
  filter(sex == "total") %>%
  filter(code %in% high_income_countries) %>% 
  filter(between(year, 1954, 2016)) %>% 
  group_by(year) %>% 
  summarise(mean_e0 = mean(e0, na.rm = T)) %>% 
  ungroup() %>% 
  arrange(year) %>% 
  mutate(Average = mean_e0 - lag(mean_e0))
  
tmp2 <- dta_e0 %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1954, 2016)) %>% 
  filter(code == "JPN") %>% 
  mutate(Japan = e0 - lag(e0))

tmp3 <- dta_e0 %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1954, 2016)) %>% 
  filter(code == "USA") %>% 
  mutate(USA = e0 - lag(e0))

tmp4 <- tmp %>% select(year, Average) %>%  
  left_join(tmp2 %>% select(year, Japan)) %>% 
  left_join(tmp3 %>% select(year, USA)) %>% 
  gather(key = "population", value = "ch_e0", Average:USA)

tmp4 %>%
  ggplot(aes(x = year, y = ch_e0, group = population, colour = population, linetype = population, shape = population)) +
  geom_point() + geom_line() +
  geom_hline(yintercept = 0) +
  labs(
    x = "Year", y = "Annual change in life expectancy", 
    title  = "Annual changes in life expectancy, Average, USA, and Japan, 1955-2016", 
    subtitle = "Average: Black; USA: Red; Japan: Green",
    caption = "Unweighted average (mean) of 21 high-income countries used in White (2002)"
  ) +
  scale_colour_manual(values = c("black", "darkgreen", "red"))


```

The departure in the form the average is very clear in this figure, with the annual changes in the USA declining after 2008, and not improving at the average rate in the early 2000s either. Since the 1980s the USA appears to slow down worse than the average in the down-turns, then not recover as quickly as the average in the up-turns! 


And the following does this for each of the countries 


```{r}
tmp <- dta_e0 %>% 
  filter(sex == "total") %>% 
  mutate(code %in% high_income_countries) %>% 
  filter(between(year, 1954, 2016)) %>% 
  group_by(year) %>% 
  summarise(mean_e0 = mean(e0, na.rm = T)) %>% 
  ungroup() %>% 
  arrange(year) %>% 
  mutate(mean_ch_e0 = mean_e0 - lag(mean_e0)) %>% 
  select(year, mean_ch_e0)

tmp2 <- dta_e0 %>% 
  filter(sex == "total") %>%
  filter(code %in% high_income_countries) %>% 
  filter(between(year, 1954, 2016)) %>% 
  group_by(code) %>% 
  arrange(year) %>% 
  mutate(ch_e0 = e0 - lag(e0)) %>%
  ungroup() %>% 
  select(year, code, ch_e0)

tmp3 <- left_join(tmp2, tmp, by = c("year" = "year"))

tmp3

tmp3 %>% 
  ggplot(aes(x = year)) + 
  geom_point(aes(y = mean_ch_e0), colour = "grey", alpha = 0.5) + 
  geom_line(aes(y = mean_ch_e0), colour = "grey", alpha = 0.5) + 
  geom_point(aes(y = ch_e0)) + geom_line(aes(y = ch_e0)) + 
  facet_wrap(~code) +
  geom_hline(yintercept = 0) +
  labs(
    x = "Year", 
    y = "Annual change in life expectancy", 
    title = "Annual change in life expectancy for 21 high income countries compared with average", 
    subtitle = "Black: Country labelled in facet. Grey: Average of 21 countries",
    caption = "Unweighted average of 21 countries used in White (2002)"
  )

```


## Figure 3

```{r}
dta_for_fig03 <- dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 1996)) %>% 
  group_by(code) %>% 
  arrange(year) %>% 
  mutate(val_in_1955 = ifelse(1955 %in% year, e0[year == 1955], NA)) %>% 
  mutate(ch_e0 = e0 - lag(e0)) %>% 
  mutate(mean_ch_e0 = mean(ch_e0, na.rm = T)) %>% 
  filter(year == min(year)) %>% 
  ungroup() %>% 
  select(code, val_in_1955, mean_ch_e0) 

fig03 <- dta_for_fig03 %>% 
  ggplot(aes(x = val_in_1955, y = mean_ch_e0)) +
  geom_point() + 
  scale_y_continuous(limits = c(0, 0.4)) + 
  ggrepel::geom_text_repel(aes(label = code)) +
  labs(
    title = "Life expectancy against change in life expectancy from 1955 to 1996",
    x = "Life expectancy in 1955 (years)", 
    y = "Average annual change in life expectancy",
    caption = "Replication of figure 3 from White (2002)"
    )  

print(fig03)

ggsave("figures/pt2_white_orig.png", height = 12, width = 20, units = "cm", dpi = 300)

ggsave("figures/e0_limits/fig_2_a_change_start_1996.png", height =12, width =20, units = "cm", dpi = 300)

fig03 + 
  stat_smooth(method = "lm", se = F, colour = "red", linetype = "dashed",
              data = dta_for_fig03 %>% filter(code != "PRT")) + 
  geom_smooth(se = F, colour = "blue",
              data = dta_for_fig03 %>% filter(code != "PRT"))

ggsave(here("figures", "pt2_white_orig_smoother.png"), height = 12, width = 20, units = "cm", dpi = 300)

ggsave(here("figures", "e0_limits", "appendix", "fig_2_a_change_start_1996_smoother.png"), height =12, width =20, units = "cm", dpi = 300)


```

What about for all HMD countries?


```{r}
fig <- dta_e0 %>% 
  filter(code %in% all_distinct_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 1996)) %>% 
  group_by(code) %>% 
  arrange(year) %>% 
  mutate(val_in_1955 = ifelse(1955 %in% year, e0[year == 1955], NA)) %>% 
  mutate(ch_e0 = e0 - lag(e0)) %>% 
  mutate(mean_ch_e0 = mean(ch_e0, na.rm = T)) %>% 
  filter(year == min(year)) %>% 
  ungroup() %>% 
  select(code, val_in_1955, mean_ch_e0) %>% 
  ggplot(aes(x = val_in_1955, y = mean_ch_e0)) +
  geom_point() + 
  scale_y_continuous(limits = c(0, 0.4)) + 
  ggrepel::geom_text_repel(aes(label = code)) +
  labs(
    title = "Life expectancy in 1955 against change in life expectancy (up to 1996)", 
    subtitle = "All countries in HMD with life expectancies recorded in 1955", 
    x = "Life expectancy in 1955 (years)", 
    y = "Average annual change in life expectancy",
    caption = "Replication of figure 3 from White (2002) but with all available countries"
    )  

print(fig)


ggsave(here("figures", "e0_limits", "appendix", "fig_2_a_change_start_1996_allhmd.png"), height =12, width =20, units = "cm", dpi = 300)


fig + 
  stat_smooth(method = "lm", se = F, colour = "red", linetype = "dashed",
              data = dta_for_fig03 %>% filter(code != "PRT")) + 
  geom_smooth(se = F, colour = "blue",
              data = dta_for_fig03 %>% filter(code != "PRT"))

ggsave(here("figures", "e0_limits", "appendix", "fig_2_a_change_start_1996_allhmd_smoother.png"), height =12, width =20, units = "cm", dpi = 300)


```

And what about more recently?

```{r}
tmp <- dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016)) %>% 
  group_by(code) %>% 
  arrange(year) %>% 
  mutate(val_in_1955 = ifelse(1955 %in% year, e0[year == 1955], NA)) %>% 
  mutate(ch_e0 = e0 - lag(e0)) %>% 
  mutate(mean_ch_e0 = mean(ch_e0, na.rm = T)) %>% 
  filter(year == min(year)) %>% 
  ungroup() %>% 
  select(code, val_in_1955, mean_ch_e0) 

fig03_update <- tmp %>% 
  ggplot(aes(x = val_in_1955, y = mean_ch_e0)) +
  geom_point() + 
  scale_y_continuous(limits = c(0, 0.4)) + 
  ggrepel::geom_text_repel(aes(label = code)) +
  labs(
    title = "Life expectancy in 1955 against average change in life expectancy up to 2016", 
    subtitle = "21 high income countries",
    x = "Life expectancy in 1955 (years)", y = "Average annual change in life expectancy",
    caption = "Updating of figure 3 from White (2002) to include data up to 2016"
    )  

print(fig03_update)

ggsave(here("figures", "pt2_white_update.png"), height = 12, width = 20, units = "cm", dpi = 300)
ggsave(here("figures","e0_limits","fig2b_start_change_2016.png"), height = 12, width = 20, units = "cm", dpi = 300)

fig03_update + 
  stat_smooth(method = "lm", se = F, colour = "red", linetype = "dashed", data = tmp %>% filter(code != "PRT")) + 
  geom_smooth(se = F, colour = "blue", data = tmp %>% filter(code != "PRT"))

ggsave(here("figures","pt2_white_update_smoother.png"), height = 12, width = 20, units = "cm", dpi = 300)
ggsave(here("figures","e0_limits","appendix","fig2b_start_change_2016.png"), height = 12, width = 20, units = "cm", dpi = 300)


```

The important feature to note here is that, though the negative association is still apparent, it appears to have a 'floor' value of around 0.15, i.e. there are still continued improvements even in countries that had higher life expectancies in 1955 (such as Sweden, Denmark, the Netherlands and Norway.)

How about since 2010 only?

```{r}
tmp <- dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 2010, 2016)) %>% 
  group_by(code) %>% 
  arrange(year) %>% 
  mutate(val_in_2010 = ifelse(2010 %in% year, e0[year == 2010], NA)) %>% 
  mutate(ch_e0 = e0 - lag(e0)) %>% 
  mutate(mean_ch_e0 = mean(ch_e0, na.rm = T)) %>% 
  filter(year == min(year)) %>% 
  ungroup() %>% 
  select(code, val_in_2010, mean_ch_e0) 

tmp %>% 
  ggplot(aes(x = val_in_2010, y = mean_ch_e0)) +
  geom_point() + 
  scale_y_continuous(limits = c(0, 0.4)) + 
  ggrepel::geom_text_repel(aes(label = code)) +
  labs(
    title = "Life expectancy in 2010 compared with subsequent annual change",
    subtitle = "21 high income countries",
    caption = "Based on figure 3 of White (2002)",
    x = "Life expectancy in 2010 (years)", y = "Average annual change in life expectancy"
    )  

ggsave(here("figures","pt1_supp_white_2010plus.png"), height = 12, width = 20, units = "cm", dpi = 300)

ggsave(here("figures","e0_limits","appendix","ch_since_2010.png"), height = 12, width = 20, units = "cm", dpi = 300)

```

This is an important 'dog that didn't bark': there's no apparent association between the rate of slowdown since 2010 and the life expectancy in 2010. Though the long-term tendency may be for there to be a slowdown, it's not something that explains what happened this time. 

Requestion addition: change from 1997 to 2016 

```{r}
tmp <- dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1997, 2016)) %>% 
  group_by(code) %>% 
  arrange(year) %>% 
  mutate(val_in_1997 = ifelse(1997 %in% year, e0[year == 1997], NA)) %>% 
  mutate(ch_e0 = e0 - lag(e0)) %>% 
  mutate(mean_ch_e0 = mean(ch_e0, na.rm = T)) %>% 
  filter(year == min(year)) %>% 
  ungroup() %>% 
  select(code, val_in_1997, mean_ch_e0) 

fig03_update_b <- tmp %>% 
  ggplot(aes(x = val_in_1997, y = mean_ch_e0)) +
  geom_point() + 
  scale_y_continuous(limits = c(0, 0.4)) + 
  ggrepel::geom_text_repel(aes(label = code)) +
  labs(
    title = "Life expectancy in 1997 against average change in life expectancy up to 2016", 
    subtitle = "21 high income countries",
    x = "Life expectancy in 1997 (years)", y = "Average annual change in life expectancy"
    )  

print(fig03_update_b)

ggsave(here("figures","pt2_white_update_b.png"), height = 12, width = 20, units = "cm", dpi = 300)
ggsave(here("figures","e0_limits","fig2b_start_change_2016_b.png"), height = 12, width = 20, units = "cm", dpi = 300)


fig03_update_b_plain <- tmp %>% 
  ggplot(aes(x = val_in_1997, y = mean_ch_e0)) +
  geom_point() + 
  scale_y_continuous(limits = c(0, 0.4)) + 
  ggrepel::geom_text_repel(aes(label = code)) +
  labs(
    x = "Life expectancy in 1997 (years)", y = "Average annual change in life expectancy"
    )  

print(fig03_update_b_plain)

ggsave(here("figures","pt2_white_update_b_plain.png"), height = 12, width = 20, units = "cm", dpi = 300)
ggsave(here("figures","e0_limits","fig2b_start_change_2016_b_plain.png"), height = 12, width = 20, units = "cm", dpi = 300)


fig03_update_b + 
  stat_smooth(method = "lm", se = F, colour = "red", linetype = "dashed", data = tmp %>% filter(code != "PRT")) + 
  geom_smooth(se = F, colour = "blue", data = tmp %>% filter(code != "PRT"))

ggsave(here("figures","pt2_white_update_smoother_b.png"), height = 12, width = 20, units = "cm", dpi = 300)
ggsave(here("figures","e0_limits","appendix","fig2b_start_change_2016_b.png"), height = 12, width = 20, units = "cm", dpi = 300)


```


Now all countries, up to 2016


```{r}
tmp <- dta_e0 %>% 
  filter(code %in% all_distinct_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016)) %>% 
  group_by(code) %>% 
  arrange(year) %>% 
  mutate(val_in_1955 = ifelse(1955 %in% year, e0[year == 1955], NA)) %>% 
  mutate(ch_e0 = e0 - lag(e0)) %>% 
  mutate(mean_ch_e0 = mean(ch_e0, na.rm = T)) %>% 
  filter(year == min(year)) %>% 
  ungroup() %>% 
  select(code, val_in_1955, mean_ch_e0) 

fig03_update <- tmp %>% 
  ggplot(aes(x = val_in_1955, y = mean_ch_e0)) +
  geom_point() + 
  scale_y_continuous(limits = c(0, 0.4)) + 
  ggrepel::geom_text_repel(aes(label = code)) +
  labs(
    title = "Life expectancy in 1955 against average change in life expectancy up to 2016", 
    subtitle = "All countries with life expectancy in 1955",
    x = "Life expectancy in 1955 (years)", y = "Average annual change in life expectancy",
    caption = "Updating of figure 3 from White (2002) to include data up to 2016"
    )  

print(fig03_update)


ggsave(here("figures","e0_limits","appendix","fig2b_start_change_2016_allhmd.png"), height = 12, width = 20, units = "cm", dpi = 300)

fig03_update + 
  stat_smooth(method = "lm", se = F, colour = "red", linetype = "dashed", data = tmp %>% filter(code != "PRT")) + 
  geom_smooth(se = F, colour = "blue", data = tmp %>% filter(code != "PRT"))

ggsave(here("figures", "e0_limits", "appendix","fig2b_start_change_2016_allhmd_smoother.png"), height = 12, width = 20, units = "cm", dpi = 300)


```

Let's see if this were the case for other ten year periods

```{r}
tmp <- dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2015)) %>% 
  mutate(Decade = cut(year, seq(1955, 2015, by = 10), include.lowest = T)) %>% 
  group_by(code, Decade) %>% 
  arrange(year) %>% 
  mutate(ch_e0 = e0 - lag(e0)) %>% 
  mutate(mean_ch_e0 = mean(ch_e0, na.rm = T)) %>% 
  filter(year == min(year))

tmp2 <- tmp %>% filter(code == "GBR_NP")

tmp %>% 
  ggplot(
    aes(x = e0, y = mean_ch_e0, group = Decade, colour = Decade, shape = Decade)
    ) + 
  geom_point() + 
  geom_point(
    aes(x = e0, y = mean_ch_e0, group = Decade, colour = Decade, shape = Decade),
    inherit.aes = FALSE, 
    data = tmp2,
    shape = 19, size = 5, alpha = 0.25
  ) +
  stat_smooth(method = "lm", se = F) +
  labs(
    title = "Life expectancy against average annual change in life expectancy over 6 decades", 
    subtitle = "21 high income countries. UK is highlighted for each decade",
    x = "Life expectancy at start of decade", 
    y = "Average annual change in life expectancy over decade",
    caption = "Based on figure 3 of White (2002)"
  )

ggsave(here("figures", "pt1_white_sliced.png"), height = 12, width = 20, units = "cm", dpi = 300)
ggsave(here("figures","e0_limits","sliced_startchange.png"), height = 12, width = 20, units = "cm", dpi = 300)

tmp %>% 
  ggplot(
    aes(x = e0, y = mean_ch_e0, group = Decade, colour = Decade, shape = Decade)
    ) + 
  geom_point() + 
  geom_point(
    aes(x = e0, y = mean_ch_e0, group = Decade, colour = Decade, shape = Decade),
    inherit.aes = FALSE, 
    data = tmp2,
    shape = 19, size = 5, alpha = 0.25
  ) +
  stat_smooth(method = "lm", se = F) +
  labs(
    x = "Life expectancy at start of decade", 
    y = "Average annual change in life expectancy over decade"
  )

ggsave(here("figures", "pt1_white_sliced_plain.png"), height = 12, width = 20, units = "cm", dpi = 300)
ggsave(here("figures","e0_limits","sliced_startchange_plain.png"), height = 12, width = 20, units = "cm", dpi = 300)


tmp %>% 
  ggplot(
    aes(x = e0, y = mean_ch_e0, group = Decade, shape = Decade)
    ) + 
  geom_point() + 
  geom_point(
    aes(x = e0, y = mean_ch_e0, group = Decade, shape = Decade),
    inherit.aes = FALSE, 
    data = tmp2,
    shape = 19, size = 5, alpha = 0.25
  ) +
  stat_smooth(method = "lm", se = F, colour = "black") +
  labs(
    x = "Life expectancy at start of decade", 
    y = "Average annual change in life expectancy over decade"
  ) 
  

ggsave(here("figures", "pt1_white_sliced_plain_blackwhite.png"), height = 12, width = 20, units = "cm", dpi = 300)
ggsave(here("figures","e0_limits","sliced_startchange_plain_blackwhite.png"), height = 12, width = 20, units = "cm", dpi = 300)

```
This figure shows that the degree to which the limits-to-growth argument holds in high income countries is falling, rather than increasing, again suggesting it is not an important factor in explaining the recent slowdown in the UK (highlighted with a translucent symbol). 

And now, all countries 

```{r}
tmp <- dta_e0 %>% 
  filter(code %in% all_distinct_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2015)) %>% 
  mutate(Decade = cut(year, seq(1955, 2015, by = 10), include.lowest = T)) %>% 
  group_by(code, Decade) %>% 
  arrange(year) %>% 
  mutate(ch_e0 = e0 - lag(e0)) %>% 
  mutate(mean_ch_e0 = mean(ch_e0, na.rm = T)) %>% 
  filter(year == min(year))

tmp2 <- tmp %>% filter(code == "GBR_NP")

tmp %>% 
  ggplot(
    aes(x = e0, y = mean_ch_e0, group = Decade, colour = Decade, shape = Decade)
    ) + 
  geom_point() + 
  geom_point(
    aes(x = e0, y = mean_ch_e0, group = Decade, colour = Decade, shape = Decade),
    inherit.aes = FALSE, 
    data = tmp2,
    shape = 19, size = 5, alpha = 0.25
  ) +
  stat_smooth(method = "lm", se = F) +
  labs(
    title = "Life expectancy against average annual change in life expectancy over 6 decades", 
    subtitle = "All countries with records at start of each mid-decade. UK is highlighted for each decade",
    x = "Life expectancy at start of decade", 
    y = "Average annual change in life expectancy over decade",
    caption = "Based on figure 3 of White (2002)"
  )

ggsave(here("figures", "e0_limits","appendix","sliced_startchange_allhmd.png"), height = 12, width = 20, units = "cm", dpi = 300)

tmp %>% 
  ggplot(
    aes(x = e0, y = mean_ch_e0, group = Decade, colour = Decade, shape = Decade)
    ) + 
  geom_point() + 
  geom_point(
    aes(x = e0, y = mean_ch_e0, group = Decade, colour = Decade, shape = Decade),
    inherit.aes = FALSE, 
    data = tmp2,
    shape = 19, size = 5, alpha = 0.25
  ) +
  stat_smooth(method = "lm", se = F) +
  labs(
    x = "Life expectancy at start of decade", 
    y = "Average annual change in life expectancy over decade"
  )

ggsave(here("figures", "e0_limits","appendix","sliced_startchange_allhmd_plain.png"), height = 12, width = 20, units = "cm", dpi = 300)


```

The changing composition of the countries means there's no longer a clear pattern. 


# Changing availability of countries over time 

The following figure will show if a country is present or absent for a particular year 


```{r}
dta_e0 %>% 
  filter(code %in% all_distinct_countries) %>% 
  filter(sex == "total") %>% 
  select(-sex) %>% 
  mutate(data_present = is.finite(e0)) %>% 
  select(code, year, data_present) %>% 
  right_join(tibble(year = 1955:2016)) %>% 
  ggplot(aes(y = code, x = year, fill = data_present)) +
  geom_tile() + 
  scale_fill_manual(values = c(`TRUE` = 'blue', `FALSE` = 'green')) + 
  theme_bw() + theme(legend.position = 'none') +
  labs(title = "Data availability by year",
       subtitle = "Availability in HMD. For years between 1955 and 2016 inclusive", x = "Year", y = "HMD Country code",
       caption = "HMD: Human Mortality Database") +
  scale_x_continuous(minor_breaks = 1955:2016)

ggsave("figures/e0_limits/appendix/data_availability_map.png", height = 15, width = 15, units = "cm", dpi = 300)
```


## Tadpole charts and derivatives 

To start let's just show e0 against change in e0 for all countries, just as points 


```{r}
dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016)) %>% 
  group_by(code) %>% 
  arrange(year) %>% 
  mutate(ch_e0 = e0 - lag(e0)) %>%
  ungroup() %>% 
  ggplot(aes(x = e0, y = ch_e0)) + 
  geom_point() + 
  geom_hline(yintercept = 0) +
  stat_smooth(method = "lm", se = F) +
  labs(
    title = "Life expectancy against change in life expectancy for 21 countries, 1955-2016",
    subtitle = "Countries used in White (2002)", 
    x = "Life expectancy in years", 
    y = "Change in life expectancy from previous year"
  )
```

Here there seems to be no relationship whatsoever. 

But what if there's a Simpson's Paradox style issue here? What do the relationships look like for each of the groups?


```{r}
tmp <- dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2015)) %>% 
  group_by(code) %>% 
  arrange(year) %>% 
  mutate(ch_e0 = e0 - lag(e0)) %>%
  ungroup()

tmp %>% 
  ggplot(aes(x = e0, y = ch_e0, group = code)) + 
  geom_hline(yintercept = 0) +
  geom_line(
    stat = "smooth", method = "lm", se = F, 
    colour = "black", alpha = 0.5) + 
  stat_smooth(
    method = "lm", se = F, 
    data = tmp %>% filter(code == "GBR_NP")
  ) +
  labs(
    title = "Long-term relationship between life expectancy and annual change in life expectancy",
    subtitle = "21 high income countries. UK highlighted in blue", 
    x = "Life expectancy in years",
    y = "Annual change in life expectancy"
  )

```

Once again, there's no obvious trend indicating a limit to life expectancy, and the UK is one of a number of countries where the long-term correlation between life expectancy and change in life expectancy is positive rather than negative, meaning higher life expectancies are (weakly) associated with slightly faster rates of improvement. 

Perhaps this issue is confounded by the negative autocorrelation found between annual lags. To investigate this we could try the average of two years rather than a single year. 

```{r}
dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2015)) %>% 
  group_by(code) %>% 
  arrange(year) %>% 
  mutate(ch_e0 = e0 - lag(e0)) %>%
  mutate(avg_e0 = (e0 + lag(e0)) / 2) %>% 
  mutate(lag_ch_e0 = lag(ch_e0)) %>% 
  mutate(avg_ch_e0 = (ch_e0 + lag_ch_e0) / 2) %>% 
  ungroup() %>% 
  ggplot(aes(x = avg_e0, y = avg_ch_e0)) + 
  geom_point() + 
  geom_hline(yintercept = 0) +
  stat_smooth(method = "lm", se = F)


```
Again this makes no appreciable difference

```{r}

tmp <- dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2015)) %>% 
  group_by(code) %>% 
  arrange(year) %>% 
  mutate(ch_e0 = e0 - lag(e0)) %>%
  mutate(lag_ch_e0 = lag(ch_e0)) %>% 
  mutate(avg_e0 = (e0 + lag(e0) / 2)) %>% 
  mutate(avg_ch_e0 = (ch_e0 + lag_ch_e0) / 2) %>% 
  ungroup()

tmp %>% 
  ggplot(aes(x = avg_e0, y = avg_ch_e0, group = code)) + 
  geom_hline(yintercept = 0) +
  geom_line(
    stat = "smooth", method = "lm", se = F, 
    colour = "black", alpha = 0.5) + 
  stat_smooth(
    method = "lm", se = F, 
    data = tmp %>% filter(code == "GBR_NP")
  )

```

Again, this does not seem to make any difference. 


## Table 3

To do or not bother. I think we have what we need for a short paper. 

## Figure 4

Variance between life expectancies for 21 high-income countries, 1955-96

```{r}
dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  mutate(group = ifelse(!(code %in% c("GRC", "IRL", "PRT", "ESP")), "inc", "exc")) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 1996)) %>% 
  group_by(group, year) %>% 
  mutate(mean_e0 = mean(e0, na.rm = T)) %>% 
  mutate(diff_from_mean = e0 - mean_e0) %>% 
  summarise(var = var(diff_from_mean)) %>% 
  ggplot(aes(x = year, y = var, group = group, shape = group)) + 
  geom_point() + geom_line() +
  scale_y_continuous(limits = c(0, 12))

```

First question is what does this now look like?

```{r}
dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  mutate(group = ifelse(!(code %in% c("GRC", "IRL", "PRT", "ESP")), "inc", "exc")) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2015)) %>% 
  group_by(group, year) %>% 
  mutate(mean_e0 = mean(e0, na.rm = T)) %>% 
  mutate(diff_from_mean = e0 - mean_e0) %>% 
  summarise(var = var(diff_from_mean)) %>% 
  ggplot(aes(x = year, y = var, group = group, shape = group)) + 
  geom_point() + geom_line() +
  scale_y_continuous(limits = c(0, 12))

```

So, there's been a steady increase in the variance between countries after 2010. *This is an important finding*. 

Secondly, what does this look like for all countries?

```{r}
dta_e0 %>% 
  filter(code %in% all_distinct_countries) %>% 
  mutate(group = ifelse(!(code %in% c("GRC", "IRL", "PRT", "ESP")), "inc", "exc")) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2015)) %>% 
  group_by(group, year) %>% 
  mutate(mean_e0 = mean(e0, na.rm = T)) %>% 
  mutate(diff_from_mean = e0 - mean_e0) %>% 
  summarise(var = var(diff_from_mean)) %>% 
  ggplot(aes(x = year, y = var, group = group, shape = group)) + 
  geom_point() + geom_line() +
  scale_y_continuous(limits = c(0, 20))

```

This looks to be undermined by changing population composition. 


For the higher income countries in the original selection, let's plot the difference for each individual country

```{r}
dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016)) %>% 
  group_by(year) %>% 
  mutate(mean_e0 = mean(e0, na.rm = T)) %>% 
  mutate(diff_from_mean = e0 - mean_e0) %>%
  ungroup() %>% 
  ggplot(aes(x = year, y = diff_from_mean)) + 
  facet_wrap(~code) + 
  geom_line() +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 1996)

```

Now let's present this all at once

```{r}
tmp <- dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016)) %>% 
  group_by(year) %>% 
  mutate(mean_e0 = mean(e0, na.rm = T)) %>% 
  mutate(diff_from_mean = e0 - mean_e0) %>%
  ungroup()

tmp %>% 
  ggplot(aes(x = year, y = diff_from_mean, group = code)) + 
  geom_line(alpha = 0.6) +
  geom_line(aes(x = year, y = diff_from_mean, group = Country, colour = Country), 
            inherit.aes = F, size = 1.5, 
            data = tmp %>% 
              filter(code %in% c("USA", "GBR_NP")) %>% 
              mutate(Country = case_when(
                code == "USA" ~ "USA",
                code == "GBR_NP" ~ "UK"
              ))
            ) +
  geom_hline(yintercept = 0) +
  labs(
    x = "Year", y = "Difference in life expectancy from high-income average",
    title = "Difference in life expectancy from high-income average over time", 
    subtitle = "UK (red) and USA (blue) highlighted", 
    caption = "Difference from unweighted verage of 21 high income countries used in White (2002)."
  )


```

Both the UK and USA are distinct, but in different ways. 

The USA had been declining in relative terms for many years, with a post 2010 acceleration.

Whereas the UK had been stagnating in relative terms for many years, then falling more rapidly post 2012ish. 

Let's just replicate the above for only 1997 onwards

```{r}
tmp <- dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1997, 2016)) %>% 
  group_by(year) %>% 
  mutate(mean_e0 = mean(e0, na.rm = T)) %>% 
  mutate(diff_from_mean = e0 - mean_e0) %>%
  ungroup()

tmp %>% 
  ggplot(aes(x = year, y = diff_from_mean, group = code)) + 
  geom_line(alpha = 0.6) +
  geom_line(aes(x = year, y = diff_from_mean, group = code, colour = code), 
            inherit.aes = F, size = 1.5, 
            data = tmp %>% 
              filter(code %in% c("USA", "GBR_NP"))
            ) +
  geom_hline(yintercept = 0) 


```
This also seems to be an important paper, as it shows how both the UK and the USA have been falling from the rich country mean more rapidly after 2012, though faster and from a lower starting point (of longer term relative decline) for the USA. 


## Figure 5


#New figures requested by Rebecca 

## High income countries, side by side, original and updated period, linear regression line 

```{r}
dta_for_fighi_upto1996 <- dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 1996)) %>% 
  group_by(code) %>% 
  arrange(year) %>% 
  mutate(val_in_1955 = ifelse(1955 %in% year, e0[year == 1955], NA)) %>% 
  mutate(ch_e0 = e0 - lag(e0)) %>% 
  mutate(mean_ch_e0 = mean(ch_e0, na.rm = T)) %>% 
  filter(year == min(year)) %>% 
  ungroup() %>% 
  select(code, val_in_1955, mean_ch_e0) 

fig_hi_upto1996 <- dta_for_fighi_upto1996 %>% 
  ggplot(aes(x = val_in_1955, y = mean_ch_e0)) +
  geom_point() + 
  scale_y_continuous(limits = c(0, 0.4)) + 
  ggrepel::geom_text_repel(aes(label = code)) +
  labs(
    title = "Life expectancy against change in life expectancy from 1955 to 1996",
    subtitle = "Linear regression line on all countries except Portugal (PRT)",
    x = "Life expectancy in 1955 (years)", 
    y = "Average annual change in life expectancy",
    caption = "Replication of figure 3 from White (2002)"
    ) + 
  stat_smooth(method = "lm", se = F, colour = "red", linetype = "dashed",
              data = dta_for_fig03 %>% filter(code != "PRT")) 

print(fig_hi_upto1996)

dta_for_fighi_upto2016 <- dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016)) %>% 
  group_by(code) %>% 
  arrange(year) %>% 
  mutate(val_in_1955 = ifelse(1955 %in% year, e0[year == 1955], NA)) %>% 
  mutate(ch_e0 = e0 - lag(e0)) %>% 
  mutate(mean_ch_e0 = mean(ch_e0, na.rm = T)) %>% 
  filter(year == min(year)) %>% 
  ungroup() %>% 
  select(code, val_in_1955, mean_ch_e0) 

fig_hi_upto2016 <- dta_for_fighi_upto2016 %>% 
  ggplot(aes(x = val_in_1955, y = mean_ch_e0)) +
  geom_point() + 
  scale_y_continuous(limits = c(0, 0.4)) + 
  ggrepel::geom_text_repel(aes(label = code)) +
  labs(
    title = "Life expectancy against change in life expectancy from 1955 to 2016",
    subtitle = "Linear regression line on all countries except Portugal (PRT)",
    x = "Life expectancy in 1955 (years)", 
    y = "Average annual change in life expectancy",
    caption = "Replication of figure 3 from White (2002)"
    ) + 
  stat_smooth(method = "lm", se = F, colour = "red", linetype = "dashed",
              data = dta_for_fig03 %>% filter(code != "PRT")) 

print(fig_hi_upto2016)


png(filename = "figures/2x1_hi_e0_che0.png", width = 35, height = 15, res = 300, units = "cm") 
gridExtra::grid.arrange(fig_hi_upto1996, fig_hi_upto2016, nrow = 1)
dev.off()

```


## All HMD countries, side by side, original and updated period, linear regression line 

```{r}
dta_for_figall_upto1996 <- dta_e0 %>% 
  filter(code %in% all_distinct_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 1996)) %>% 
  group_by(code) %>% 
  arrange(year) %>% 
  mutate(val_in_1955 = ifelse(1955 %in% year, e0[year == 1955], NA)) %>% 
  mutate(ch_e0 = e0 - lag(e0)) %>% 
  mutate(mean_ch_e0 = mean(ch_e0, na.rm = T)) %>% 
  filter(year == min(year)) %>% 
  ungroup() %>% 
  select(code, val_in_1955, mean_ch_e0) 

fig_all_upto1996 <- dta_for_figall_upto1996 %>% 
  ggplot(aes(x = val_in_1955, y = mean_ch_e0)) +
  geom_point() + 
  scale_y_continuous(limits = c(0, 0.4)) + 
  ggrepel::geom_text_repel(aes(label = code)) +
  labs(
    title = "Life expectancy against change in life expectancy from 1955 to 1996",
    subtitle = "Linear regression line on all countries except Portugal (PRT)",
    x = "Life expectancy in 1955 (years)", 
    y = "Average annual change in life expectancy",
    caption = "Replication of figure 3 from White (2002) with all HMD countries"
    ) + 
  stat_smooth(method = "lm", se = F, colour = "red", linetype = "dashed",
              data = dta_for_fig03 %>% filter(code != "PRT")) 

print(fig_all_upto1996)

dta_for_figall_upto2016 <- dta_e0 %>% 
  filter(code %in% all_distinct_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016)) %>% 
  group_by(code) %>% 
  arrange(year) %>% 
  mutate(val_in_1955 = ifelse(1955 %in% year, e0[year == 1955], NA)) %>% 
  mutate(ch_e0 = e0 - lag(e0)) %>% 
  mutate(mean_ch_e0 = mean(ch_e0, na.rm = T)) %>% 
  filter(year == min(year)) %>% 
  ungroup() %>% 
  select(code, val_in_1955, mean_ch_e0) 

fig_all_upto2016 <- dta_for_figall_upto2016 %>% 
  ggplot(aes(x = val_in_1955, y = mean_ch_e0)) +
  geom_point() + 
  scale_y_continuous(limits = c(0, 0.4)) + 
  ggrepel::geom_text_repel(aes(label = code)) +
  labs(
    title = "Life expectancy against change in life expectancy from 1955 to 2016",
    subtitle = "Linear regression line on all countries except Portugal (PRT)",
    x = "Life expectancy in 1955 (years)", 
    y = "Average annual change in life expectancy",
    caption = "Replication of figure 3 from White (2002) with all HMD countries"
    ) + 
  stat_smooth(method = "lm", se = F, colour = "red", linetype = "dashed",
              data = dta_for_fig03 %>% filter(code != "PRT")) 

print(fig_all_upto2016)


png(filename = here("figures","2x1_all_e0_che0.png"), width = 35, height = 15, res = 300, units = "cm") 
gridExtra::grid.arrange(fig_all_upto1996, fig_all_upto2016, nrow = 1)
dev.off()

```

## Table of High income countries, ranking by decadal period

```{r}
tmp <- dta_e0 %>% 
  filter(code %in% high_income_countries) %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2015)) %>% 
  mutate(Decade = cut(year, seq(1955, 2015, by = 10), include.lowest = T)) %>% 
  group_by(code, Decade) %>% 
  arrange(year) %>% 
  mutate(ch_e0 = e0 - lag(e0)) %>% 
  mutate(mean_ch_e0 = mean(ch_e0, na.rm = T)) %>% 
  filter(year == min(year))

lookup_table <- read_csv("https://raw.githubusercontent.com/JonMinton/hmd_explorer/master/hmd_explorer/data/country_codes_lookup.csv")

tmp %>% 
  left_join(lookup_table, by = c("code"= "code")) %>% 
  select(country = name, Decade, e0) %>% 
  group_by(Decade) %>% 
  mutate(Rank = rank(desc(e0))) %>% 
  write_csv(path = "tables/e0_tables.csv")

```