02_Mx_calc.Rmd

---
title: "Age Specific Mortality Rate Trends "
output: html_notebook
---

# Intro

This doc will describe some of the average trends at specific ages from 21 high income countries.

It is inspired by [Christensen 2010](https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(09)61460-4/fulltext), which showed

* Probability of dying in next 12 months
* At age 80 and age 90
* Males and females 
* Selected countries 
    * England & Wales
    * France
    * East Germany
    * West Germany
    * Japan
    * Sweden
    * USA
  
  
However, as per the figures in [White 2002](https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1728-4457.2002.00059.x)), the average of 21 high income countries will be used instead, as well as the best-performing country at these different age groups. 

Additionally, the 12 month mortality probabilities at the following ages will also be calculated:

* 0-1 years
* 40 years

The aim will be to determine how strongly the (log of these) trends are correlated. 

# Pre reqs


```{r}
pacman::p_load(
  tidyverse, HMDHFDplus,
  ggrepel, plotly
)

dta_Mx <- read_rds("tidy_data/Mx_data.rds")

```

 Define countries 
```{r}
source("scripts/country_definitions.R")


```

# Replication of Fig 2 of Christenson et al 2010

```{r}

dta_Mx %>% 
  filter(sex != "total") %>% 
  filter(age %in% c(80, 90)) %>% 
  filter(
    code %in% c("GBRTENW", "FRATNP", "DEUTE", "DEUTW", "JPN", "SWE", "USA")
  ) %>%      
  filter(between(year, 1950, 2003)) %>% 
  ggplot(
    aes(x = year, y = Mx, colour = code, group = code)
  ) + 
  geom_line() +
  facet_grid(age ~ sex)

```

And how has this developed since?

```{r}

dta_Mx %>% 
  filter(sex != "total") %>% 
  filter(age %in% c(80, 90)) %>% 
  filter(
    code %in% c("GBRTENW", "FRATNP", "DEUTE", "DEUTW", "JPN", "SWE", "USA")
  ) %>%      
  filter(between(year, 1950, 2017)) %>% 
  ggplot(
    aes(x = year, y = Mx, colour = code, group = code)
  ) + 
  geom_line() +
  facet_grid(age ~ sex)

```

It's interesting that life expectancies have at age 80 have converged, for males especially. 
And that they've levelled off in Japan over the 2000s, while continued to improve in the USA. 

In England/Wales they Mx at age 90 seem to have increased for females especially in recent years. 

Let's look at this for the average of the 21 countries 

Now let's do this for the average of the 21 high income countries 

```{r}

dta_Mx %>% 
  filter(sex != "total") %>% 
  filter(age %in% c(80, 90)) %>% 
  filter(code %in% high_income_countries) %>% 
  filter(between(year, 1950, 2016)) %>% 
  group_by(year, age, sex) %>% 
  summarise(mean_Mx = mean(Mx, na.rm = T)) %>% 
  ungroup() %>% 
  ggplot(aes(x = year, y = mean_Mx, colour = sex)) + 
  facet_wrap(~age) +
  geom_point()


```

Let's now use log rather than linear 

```{r}
dta_Mx %>% 
  filter(sex != "total") %>% 
  filter(age %in% c(80, 90)) %>% 
  filter(code %in% high_income_countries) %>% 
  filter(between(year, 1950, 2016)) %>% 
  group_by(year, age, sex) %>% 
  summarise(mean_Mx = mean(Mx, na.rm = T)) %>% 
  ungroup() %>% 
  ggplot(aes(x = year, y = mean_Mx, colour = sex)) + 
  facet_wrap(~age) +
  geom_point() +
  scale_y_log10()


```

So, the trends on % reductions have been more continuous for age 90 than age 80, and more continuous for females than males. 

Let's now do this for a couple of additional age groups 

```{r}
dta_Mx %>% 
  filter(sex != "total") %>% 
  filter(age %in% c(0, 30, 80, 90)) %>% 
  filter(code %in% high_income_countries) %>% 
  filter(between(year, 1950, 2016)) %>% 
  group_by(year, age, sex) %>% 
  summarise(mean_Mx = mean(Mx, na.rm = T)) %>% 
  ungroup() %>% 
  ggplot(aes(x = year, y = mean_Mx, colour = sex)) + 
  facet_wrap(~age) +
  geom_point() +
  scale_y_log10()


```

Now let's estimate the correlation between these trends

```{r}
dta_trnd <- dta_Mx %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016))  %>% 
  filter(age %in% c(0, 30, 80, 90)) %>% 
  group_by(year, age) %>% 
  summarise(mean_Mx = mean(Mx, na.rm = T)) %>% 
  ungroup() %>% 
  mutate(log_mean_Mx = log(mean_Mx, 10)) 

dta_trnd %>% 
  select(-mean_Mx) %>% 
  mutate(age = paste0("age_", age)) %>% 
  spread(age, log_mean_Mx) %>%
  select(-year) %>% 
  cor()

```

So, as expected, % improvements in infancy are more strongly correlated with those at age 80 and 90 than at age 30, and % changes at age 90 are less strongly correlated than with those at other ages. 

Let's do this as a heatmap for all ages 

```{r}
dta_trnd <- dta_Mx %>% 
  filter(sex == "total") %>% 
  filter(between(year, 1955, 2016))  %>% 
  filter(age <= 109) %>% 
  group_by(year, age) %>% 
  summarise(mean_Mx = mean(Mx, na.rm = T)) %>% 
  ungroup() %>% 
  mutate(log_mean_Mx = log(mean_Mx, 10)) 

tmp <- dta_trnd %>% 
  select(-mean_Mx) %>% 
  spread(age, log_mean_Mx) %>%
  select(-year) %>% 
  cor() 

cor_df <- tmp %>% 
  as_tibble() %>% 
  mutate(from_age = rownames(tmp)) %>% 
  gather(key="to_age", value = "value", -from_age) %>% 
  mutate(from_age = as.numeric(from_age), to_age = as.numeric(to_age))

cor_df %>% 
  filter(from_age <= 100, to_age <= 100) %>% 
  ggplot(aes(x = from_age, y = to_age, fill = value)) + 
  geom_tile() +
  scale_fill_viridis_c() +
  scale_x_continuous(breaks = seq(0, 100, by = 10)) +
  scale_y_continuous(breaks = seq(0, 100, by = 10)) +
  coord_equal()

```


Definitely a 'wow' figure! 

Interesting that the trends that apply at older ages don't apply at the oldest ages (from around age 96 onwards)

Let's see how different it looks by gender.

```{r}
dta_trnd <- dta_Mx %>% 
  filter(sex != "total") %>% 
  filter(between(year, 1955, 2016))  %>% 
  filter(age <= 109) %>% 
  group_by(sex, year, age) %>% 
  summarise(mean_Mx = mean(Mx, na.rm = T)) %>% 
  ungroup() %>% 
  mutate(log_mean_Mx = log(mean_Mx, 10)) %>% 
  group_by(sex) %>% 
  nest()


cors_df <- dta_trnd %>% 
  mutate(cors = map(
    data, 
    function(X) {
      X %>% 
        select(-mean_Mx) %>% 
        spread(age, log_mean_Mx) %>% 
        select(-year) %>% 
        cor()
      }
    )
  ) %>% 
  mutate(cor_df = map(
    cors,
    function(X){
      X %>% 
        as_tibble() %>% 
        mutate(from_age = rownames(X)) %>% 
        gather(key = "to_age", value = "value", -from_age) %>% 
        mutate(from_age = as.numeric(from_age), to_age = as.numeric(to_age))
      }
    )
  ) %>% 
  select(sex, cor_df) %>% 
  unnest()


cors_df %>% 
  filter(from_age <= 100, to_age <= 100) %>% 
  ggplot(aes(x = from_age, y = to_age, fill = value)) + 
  geom_tile() +
  scale_fill_viridis_c() +
  scale_x_continuous(breaks = seq(0, 100, by = 10)) +
  scale_y_continuous(breaks = seq(0, 100, by = 10)) +
  coord_equal() + 
  facet_wrap(~sex)

```

Again, beautiful, staggering, and awesome. This is like seeing the genome of population health improvement being mapped. The gender differences in the differences in correlation are very apparent. 

Let's extend this to a few few select countries: 

* France
* Sweden?
* UK
* USA 
* Spain (Identified as most compatible with Lee-Carter modelling approach)
* Japan


```{r}
dta_trnd <- dta_Mx %>% 
  filter(sex != "total") %>%
  filter(code %in% c("FRATNP", "SWE", "GBR_NP", "USA", "ESP", "JPN")) %>% 
  filter(between(year, 1955, 2016))  %>% 
  filter(age <= 109) %>% 
  group_by(code, sex, year, age) %>% 
  summarise(mean_Mx = mean(Mx, na.rm = T)) %>% 
  ungroup() %>% 
  mutate(log_mean_Mx = log(mean_Mx + 0.00001, 10)) %>% # Correction for Sweden
  group_by(sex, code) %>% 
  nest()


cors_df <- dta_trnd %>% 
  mutate(cors = map(
    data, 
    function(X) {
      X %>% 
        select(-mean_Mx) %>% 
        spread(age, log_mean_Mx) %>% 
        select(-year) %>% 
        cor()
      }
    )
  ) %>% 
  mutate(cor_df = map(
    cors,
    function(X){
      X %>% 
        as_tibble() %>% 
        mutate(from_age = rownames(X)) %>% 
        gather(key = "to_age", value = "value", -from_age) %>% 
        mutate(from_age = as.numeric(from_age), to_age = as.numeric(to_age))
      }
    )
  ) %>% 
  select(sex, code, cor_df) %>% 
  unnest()


cors_df %>% 
  filter(from_age <= 100, to_age <= 100) %>% 
  ggplot(aes(x = from_age, y = to_age, fill = value)) + 
  geom_tile() +
  scale_fill_viridis_c(limits = c(-1,1)) +
  scale_x_continuous(breaks = seq(0, 100, by = 10)) +
  scale_y_continuous(breaks = seq(0, 100, by = 10)) +
  coord_equal() + 
  facet_grid(sex ~ code)

```


Let's do this just for the UK nations 

```{r}
dta_trnd <- dta_Mx %>% 
  filter(sex != "total") %>%
  filter(code %in% c("GBRTENW", "GBR_SCO", "GBR_NIR")) %>% 
  filter(between(year, 1955, 2016))  %>% 
  filter(age <= 109) %>% 
  group_by(code, sex, year, age) %>% 
  summarise(mean_Mx = mean(Mx, na.rm = T)) %>% 
  ungroup() %>% 
  mutate(log_mean_Mx = log(mean_Mx + 0.00001, 10)) %>% # Correction for Sweden
  group_by(sex, code) %>% 
  nest()


cors_df <- dta_trnd %>% 
  mutate(cors = map(
    data, 
    function(X) {
      X %>% 
        select(-mean_Mx) %>% 
        spread(age, log_mean_Mx) %>% 
        select(-year) %>% 
        cor()
      }
    )
  ) %>% 
  mutate(cor_df = map(
    cors,
    function(X){
      X %>% 
        as_tibble() %>% 
        mutate(from_age = rownames(X)) %>% 
        gather(key = "to_age", value = "value", -from_age) %>% 
        mutate(from_age = as.numeric(from_age), to_age = as.numeric(to_age))
      }
    )
  ) %>% 
  select(sex, code, cor_df) %>% 
  unnest()


cors_df %>% 
  filter(from_age <= 100, to_age <= 100) %>% 
  ggplot(aes(x = from_age, y = to_age, fill = value)) + 
  geom_tile() +
  scale_fill_viridis_c(limits = c(-1,1)) +
  scale_x_continuous(breaks = seq(0, 100, by = 10)) +
  scale_y_continuous(breaks = seq(0, 100, by = 10)) +
  coord_equal() + 
  facet_grid(sex ~code) + 
  labs(x = "Age", y = "Age", title = "Correlations between rates of change in log mortality risk age different ages",
       subtitle = "Trends from 1955-2016. England & Wales, Scotland, Northern Ireland",
       caption = "Source: Human Mortality Database. Available from https://github.com/JonMinton/Life_expectancy_limits/")

ggsave("figures/trend_correlation_heatmap.png", height = 20, width = 25, units = "cm", dpi = 300)


```


Although plotly doesn't convert faceting too easily, can I use ggplotly for hovering over values for Scotland males alone?

```{r}
p <- cors_df %>% 
  filter(from_age <= 100, to_age <= 100) %>% 
  filter(sex == "male") %>% 
  filter(code == "GBR_SCO") %>% 
  ggplot(aes(x = from_age, y = to_age, fill = value)) + 
  geom_tile() + 
  scale_fill_viridis_c() + 
  scale_x_continuous(breaks = seq(0, 100, by = 10)) +
  scale_y_continuous(breaks = seq(0, 100, by = 10)) + 
  coord_equal() 

p <- cors_df %>% 
  filter(from_age <= 100, to_age <= 100) %>% 
  ggplot(aes(x = from_age, y = to_age, fill = value)) + 
  geom_tile() +
  scale_fill_viridis_c() +
  scale_x_continuous(breaks = seq(0, 100, by = 10)) +
  scale_y_continuous(breaks = seq(0, 100, by = 10)) +
  coord_equal() + 
  facet_grid(sex ~code)

plotly::ggplotly(p)

```

ggplotly now works with faceted plots! (Though the number of observations looks like it causes things to struggle!)

For Scotland and England & Wales (independently), how have the correlations changed over time? 

England/Wales first: 

```{r}
dta_trnd <- dta_Mx %>% 
  filter(sex != "total") %>%
  filter(code %in% c("GBRTENW")) %>% 
  filter(between(year, 1950, 2010))  %>%
  filter(age <= 109) %>% 
  mutate(
    decade = cut(year, breaks = seq(1950, 2010, by = 10), labels = c("1950s", "1960s", "1970s", "1980s", "1990s", "2000s"), include.lowest = TRUE)
  ) %>% 
  group_by(sex, decade, year, age) %>% 
  summarise(mean_Mx = mean(Mx, na.rm = T)) %>% 
  ungroup() %>% 
  mutate(log_mean_Mx = log(mean_Mx, 10)) %>% # Correction for Sweden
  group_by(sex, decade) %>% 
  nest()


cors_df <- dta_trnd %>% 
  mutate(cors = map(
    data, 
    function(X) {
      X %>% 
        select(-mean_Mx) %>% 
        spread(age, log_mean_Mx) %>% 
        select(-year) %>% 
        cor()
      }
    )
  ) %>% 
  mutate(cor_df = map(
    cors,
    function(X){
      X %>% 
        as_tibble() %>% 
        mutate(from_age = rownames(X)) %>% 
        gather(key = "to_age", value = "value", -from_age) %>% 
        mutate(from_age = as.numeric(from_age), to_age = as.numeric(to_age))
      }
    )
  ) %>% 
  select(sex, decade, cor_df) %>% 
  unnest()


cors_df %>% 
  filter(from_age <= 100, to_age <= 100) %>% 
  ggplot(aes(x = from_age, y = to_age, fill = value)) + 
  geom_tile() +
  scale_fill_viridis_c() +
  scale_x_continuous(breaks = seq(0, 100, by = 10)) +
  scale_y_continuous(breaks = seq(0, 100, by = 10)) +
  coord_equal() + 
  facet_grid(sex ~ decade)

```


Now for Scotland. 

```{r}
dta_trnd <- dta_Mx %>% 
  filter(sex != "total") %>%
  filter(code %in% c("GBR_SCO")) %>% 
  filter(between(year, 1950, 2010))  %>%
  filter(age <= 109) %>% 
  mutate(
    decade = cut(year, breaks = seq(1950, 2010, by = 10), labels = c("1950s", "1960s", "1970s", "1980s", "1990s", "2000s"), include.lowest = TRUE)
  ) %>% 
  group_by(sex, decade, year, age) %>% 
  summarise(mean_Mx = mean(Mx, na.rm = T)) %>% 
  ungroup() %>% 
  mutate(log_mean_Mx = log(mean_Mx + 0.00001, 10)) %>% # Correction for Sweden
  group_by(sex, decade) %>% 
  nest()


cors_df <- dta_trnd %>% 
  mutate(cors = map(
    data, 
    function(X) {
      X %>% 
        select(-mean_Mx) %>% 
        spread(age, log_mean_Mx) %>% 
        select(-year) %>% 
        cor()
      }
    )
  ) %>% 
  mutate(cor_df = map(
    cors,
    function(X){
      X %>% 
        as_tibble() %>% 
        mutate(from_age = rownames(X)) %>% 
        gather(key = "to_age", value = "value", -from_age) %>% 
        mutate(from_age = as.numeric(from_age), to_age = as.numeric(to_age))
      }
    )
  ) %>% 
  select(sex, decade, cor_df) %>% 
  unnest()


cors_df %>% 
  filter(from_age <= 100, to_age <= 100) %>% 
  ggplot(aes(x = from_age, y = to_age, fill = value)) + 
  geom_tile() +
  scale_fill_viridis_c() +
  scale_x_continuous(breaks = seq(0, 100, by = 10)) +
  scale_y_continuous(breaks = seq(0, 100, by = 10)) +
  coord_equal() + 
  facet_grid(sex ~ decade)

```
Northern Ireland


```{r}
dta_trnd <- dta_Mx %>% 
  filter(sex != "total") %>%
  filter(code %in% c("GBR_NIR")) %>% 
  filter(between(year, 1950, 2010))  %>%
  filter(age <= 109) %>% 
  mutate(
    decade = cut(year, breaks = seq(1950, 2010, by = 10), labels = c("1950s", "1960s", "1970s", "1980s", "1990s", "2000s"), include.lowest = TRUE)
  ) %>% 
  group_by(sex, decade, year, age) %>% 
  summarise(mean_Mx = mean(Mx, na.rm = T)) %>% 
  ungroup() %>% 
  mutate(log_mean_Mx = log(mean_Mx + 0.00001, 10)) %>% # Correction for Sweden
  group_by(sex, decade) %>% 
  nest()


cors_df <- dta_trnd %>% 
  mutate(cors = map(
    data, 
    function(X) {
      X %>% 
        select(-mean_Mx) %>% 
        spread(age, log_mean_Mx) %>% 
        select(-year) %>% 
        cor()
      }
    )
  ) %>% 
  mutate(cor_df = map(
    cors,
    function(X){
      X %>% 
        as_tibble() %>% 
        mutate(from_age = rownames(X)) %>% 
        gather(key = "to_age", value = "value", -from_age) %>% 
        mutate(from_age = as.numeric(from_age), to_age = as.numeric(to_age))
      }
    )
  ) %>% 
  select(sex, decade, cor_df) %>% 
  unnest()


cors_df %>% 
  filter(from_age <= 100, to_age <= 100) %>% 
  ggplot(aes(x = from_age, y = to_age, fill = value)) + 
  geom_tile() +
  scale_fill_viridis_c() +
  scale_x_continuous(breaks = seq(0, 100, by = 10)) +
  scale_y_continuous(breaks = seq(0, 100, by = 10)) +
  coord_equal() + 
  facet_grid(sex ~ decade)

```

```{r}
dta_trnd <- dta_Mx %>% 
  filter(sex != "total") %>%
  filter(code %in% c("GBRTENW")) %>% 
  filter(between(year, 1955, 2015))  %>%
  filter(age <= 109) %>% 
  mutate(
    decade = cut(year, breaks = seq(1955, 2015, by = 10), include.lowest = TRUE)
  ) %>% 
  group_by(sex, decade, year, age) %>% 
  summarise(mean_Mx = mean(Mx, na.rm = T)) %>% 
  ungroup() %>% 
  mutate(log_mean_Mx = log(mean_Mx, 10)) %>% # Correction for Sweden
  group_by(sex, decade) %>% 
  nest()


cors_df <- dta_trnd %>% 
  mutate(cors = map(
    data, 
    function(X) {
      X %>% 
        select(-mean_Mx) %>% 
        spread(age, log_mean_Mx) %>% 
        select(-year) %>% 
        cor()
      }
    )
  ) %>% 
  mutate(cor_df = map(
    cors,
    function(X){
      X %>% 
        as_tibble() %>% 
        mutate(from_age = rownames(X)) %>% 
        gather(key = "to_age", value = "value", -from_age) %>% 
        mutate(from_age = as.numeric(from_age), to_age = as.numeric(to_age))
      }
    )
  ) %>% 
  select(sex, decade, cor_df) %>% 
  unnest()


cors_df %>% 
  filter(from_age <= 100, to_age <= 100) %>% 
  ggplot(aes(x = from_age, y = to_age, fill = value)) + 
  geom_tile() +
  scale_fill_viridis_c() +
  scale_x_continuous(breaks = seq(0, 100, by = 10)) +
  scale_y_continuous(breaks = seq(0, 100, by = 10)) +
  coord_equal() + 
  facet_grid(sex ~ decade) +
  labs(
    x = "Age", y = "Age",
    title = "Correlation between trends at indivdual ages, England & Wales, by decade"
  )

```


Now for Scotland. 

```{r}
dta_trnd <- dta_Mx %>% 
  filter(sex != "total") %>%
  filter(code %in% c("GBR_SCO")) %>% 
  filter(between(year, 1955, 2015))  %>%
  filter(age <= 109) %>% 
  mutate(
    decade = cut(year, breaks = seq(1955, 2015, by = 10), include.lowest = TRUE)
  ) %>% 
  group_by(sex, decade, year, age) %>% 
  summarise(mean_Mx = mean(Mx, na.rm = T)) %>% 
  ungroup() %>% 
  mutate(log_mean_Mx = log(mean_Mx + 0.00001, 10)) %>% # Correction for Sweden
  group_by(sex, decade) %>% 
  nest()


cors_df <- dta_trnd %>% 
  mutate(cors = map(
    data, 
    function(X) {
      X %>% 
        select(-mean_Mx) %>% 
        spread(age, log_mean_Mx) %>% 
        select(-year) %>% 
        cor()
      }
    )
  ) %>% 
  mutate(cor_df = map(
    cors,
    function(X){
      X %>% 
        as_tibble() %>% 
        mutate(from_age = rownames(X)) %>% 
        gather(key = "to_age", value = "value", -from_age) %>% 
        mutate(from_age = as.numeric(from_age), to_age = as.numeric(to_age))
      }
    )
  ) %>% 
  select(sex, decade, cor_df) %>% 
  unnest()


cors_df %>% 
  filter(from_age <= 100, to_age <= 100) %>% 
  ggplot(aes(x = from_age, y = to_age, fill = value)) + 
  geom_tile() +
  scale_fill_viridis_c() +
  scale_x_continuous(breaks = seq(0, 100, by = 10)) +
  scale_y_continuous(breaks = seq(0, 100, by = 10)) +
  coord_equal() + 
  facet_grid(sex ~ decade)

```

## Analysis/results for aqmen conference presentation

Populations:

* France, 
* UK
* USA
* Japan

Time period:

* 1970 to last available year

Analyses:

* Individual plots
* Correlations between broad age groups 
* Dendrogram of age correlations 


```{r}

dta_Mx %>% 
  filter(code %in% c("FRATNP", "GBR_NP", "USA", "JPN")) %>% 
  filter(year >= 1970) %>% 
  filter(age <= 109) %>% 
  filter(sex != "total") %>% 
  group_by(code, sex) %>% 
  nest() 


```


Let's look at hclust examples 

It seems from this that the datastructure required should have the following:

* rownames: the individual ages
* columns: rate in different years 

Let's start with a single population: USA males

```{r}
dta_Mx %>% 
  filter(code == "USA") %>% 
  filter(sex == "male") %>% 
  filter(year >= 1970) %>%
  filter(age <= 109) %>% 
  mutate(lnMx = log(Mx + 0.00001, 10)) %>% 
  select(-code, -sex, -Mx) %>% 
  spread(year, lnMx) -> tmp

nms <- tmp$age
tmp$age <- NULL
rownames(tmp) <- nms
tmp

hc <- hclust(dist(tmp), "ave")

ggdendro::ggdendrogram(hc, rotate = TRUE, leaf_labels = TRUE) + coord_flip()


```


This has been a bit too much of a drain on my cognitive resources for now. I'm going to stop and reevaluate when I've got more focus. It does seem to identify two broad clusters, and for these clusters to be as expected, split between working age and retiremnt age, with age 0 with post retirement age. 

Let's quickly try PCA on the same

```{r}
pc <- prcomp(tmp, scale = TRUE, center = TRUE)

pc_df <- tibble(age = rownames(pc$x), pc1 = pc$x[,1], pc2 = pc$x[,2], pc3 = pc$x[,3])

ggplot(pc_df, aes(x = pc1, y = pc2, label = age)) + 
  geom_text()

ggplot(pc_df, aes(x = pc2, y = pc3, label = age)) + 
  geom_text()


```

```{r}

dta_Mx %>% 
  filter(code == "USA") %>% 
  filter(sex == "female") %>% 
  filter(year >= 1970) %>%
  filter(age <= 109) %>% 
  mutate(lnMx = log(Mx + 0.00001, 10)) %>% 
  select(-code, -sex, -Mx) %>% 
  spread(year, lnMx) -> tmp2

nms <- tmp2$age
tmp2$age <- NULL
rownames(tmp2) <- nms
tmp2

hc2 <- hclust(dist(tmp2), "ave")

ggdendro::ggdendrogram(hc2, rotate = TRUE, leaf_labels = TRUE) + coord_flip()


pc2 <- prcomp(tmp2, scale = TRUE, center = TRUE)

pc2_df <- pc2$x[,1:3] %>% as.tibble() %>% mutate(age = rownames(pc2$x))

ggplot(pc2_df, aes(x = PC1, y = PC2, label = age)) + 
  geom_text()

ggplot(pc2_df, aes(x = PC2, y = PC3, label = age)) + 
  geom_text()


```