-
Notifications
You must be signed in to change notification settings - Fork 1
/
CFR.Rmd
280 lines (227 loc) · 13 KB
/
CFR.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
---
title: "Case Fatality Rates (CFR)"
date: "`r format(Sys.Date(), '%A %b %d, %Y')`"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, warning=FALSE)
```
```{r, message=FALSE, warning=FALSE, include=FALSE}
library(CoV19)
library(ggplot2)
library(gridExtra)
```
The case fatality rate (CFR) is the fraction of deaths relative to the positive cases (reported positive by health officials). Once the epidemic is over, the CFR is (cumulative number of death)/(cumulative number of cases). But during the epidemic, just dividing the cumulative deaths by cumulative cases will greatly under estimate the CFR because there is a delay between the positive test and death.
Instead while the epidemic is spreading, we want to compare deaths today against the cases some days prior. But what lag (days prior) to use? We can get a handle on this by looking at Lombardia.
See this page for another discussion of calculating CFR during an epidemic. This also suggests that lag 7 is the 'right' lag. But we'll try to estimate the right lag by looking at the lagged CFRs directly.
https://www.worldometers.info/coronavirus/coronavirus-death-rate/#correct
```{r}
datL <- c()
reg <- "Lombardia"
x<-subset(italy, region==reg)
x$new.death <- c(NA, diff(x$death))
x$new.positive <- c(NA, diff(x$positive))
x$cum.death <- stats::filter(x$new.death, rep(1, 7), sides=1)
x$cum.positive <- stats::filter(x$new.positive, rep(1, 7), sides=1)
for(lag in 0:15){
tmp <- data.frame(date=x$date[(1+lag):nrow(x)],
x=x$death[(1+lag):nrow(x)],
val= x$death[(1+lag):nrow(x)]/x$positive[1:(nrow(x)-lag)],
cum.x=x$cum.death[(1+lag):nrow(x)],
cum.val= x$cum.death[(1+lag):nrow(x)]/x$cum.positive[1:(nrow(x)-lag)],
lag=lag, region=reg)
datL <- rbind(datL, tmp)
}
datL$lag <- as.factor(datL$lag)
datL$log.val <- log(datL$val)
datL$log.cum.val <- log(datL$cum.val)
```
The plot shows the cumulative deaths in Lombardia up to June 1, 2020 against the lagged CFR. If the lag is right, the line should be flat. Lag 0 is just cumulative deaths today divided by cumulative cases. We can see that goes up with time. The cases were slowing and the deaths were catching up so to speak. So this illustrates the problem with using lag 0. Lag 10 (ten days prior) is too long. The ratio of cumulative deaths to cumulative cases 10 days prior goes down as the case curve bends (flattens). Somewhere between lag 5 to 7 is about right. The lagged CFR plot flattens somewhere in that lag.
So that suggests that the CFR in Lombardia was about 20% (the level of the lag 5 line) to 23% (the level of the lag 7 line) up to June 1, 2020. Note this is not the infection fatality rate. Like many places that are overwhelmed, testing is focused on those who show up at the hospital in distress. The hospitalization rate is about 40% in Lombardia, suggesting that they really focus the testing on the worst cases. Also the Italy has an older population.
```{r}
reg <- "Lombardia"
p <- ggplot(subset(datL, region==reg & lag%in%c(0,3,5,7,10,15) & date < as.Date("2020-06-01")), aes(x=x,y=log.val,color=lag)) + geom_line() + geom_point() +
ggtitle(paste(reg, "cumulative CFR")) +
xlab("Cumulative Deaths") +
scale_y_continuous(name="lagged CFR", breaks=-4:0,
labels=round(exp(-4:0),digits=2), limits=c(-4,0))
p
```
This next plot shows how the CFR has changed week to week. Here a week of deaths is compared to a week of positives 7 days prior. Here we see that the 7-day lag CFR had and was steadily decreasing even while the hospitals became overwhelmed. Doctors (presumably) got better at treating CoVid-19. I say "presumably" since it is known that treatment did improve (less reliance on ventilators and more on high levels of oxygen) and there is no evidence (reported in the scientific literature) that the virus has evolved to be less lethal. We also see that since sometime in August 2020 the CFR has stabilized around 1-2% of reported positives.
```{r}
reg <- "Lombardia"
p <- ggplot(subset(datL, region==reg & lag%in%c(7,15)), aes(x=date,y=log.cum.val, color=lag)) + geom_line() +
ggtitle(paste(reg, "7-day CFR")) +
xlab("Cumulative Deaths") +
scale_y_continuous(name="lagged CFR", breaks=-6:0,
labels=round(exp(-6:0),digits=2), limits=c(-6,0))
p
```
## Other countries
Let's look at other countries. Remember the CFR depends a lot on how the country does testing. Test even people who aren't that sick (or who are even asymptomatic) and the CFR is lower. Test only those who need to be hospitalized, and the CFR will be higher. This calculation requires a fair number of deaths to compute. I've set the minumum at 100. The estimate of the CFR at lag 5 and 7 for the last 4 values is shown in the title.
The CFRs (lag 5 to 7 days) in Europe are ca 15-21% across all the countries with over 1000 fatalities.
```{r}
dat <- c()
for(reg in c("Italy","France","Canada", "US","Spain","United Kingdom","Austria","Germany", "Switzerland", "Greece", "Portugal", "Sweden", "Denmark", "Belgium", "Netherlands", "Iran", "China", "Czechia", "Poland")){
x<-subset(world, region==reg)
x$new.death <- c(NA, diff(x$death))
x$new.positive <- c(NA, diff(x$positive))
x$cum.death <- stats::filter(x$new.death, rep(1, 7), sides=1)
x$cum.positive <- stats::filter(x$new.positive, rep(1, 7), sides=1)
for(lag in 0:15){
tmp <- data.frame(date=x$date[(1+lag):nrow(x)],
x=x$death[(1+lag):nrow(x)],
val= x$death[(1+lag):nrow(x)]/x$positive[1:(nrow(x)-lag)],
cum.x=x$cum.death[(1+lag):nrow(x)],
cum.val= x$cum.death[(1+lag):nrow(x)]/x$cum.positive[1:(nrow(x)-lag)],
lag=lag, region=reg)
dat <- rbind(dat, tmp)
}
}
dat$lag <- as.factor(dat$lag)
dat$log.val <- log(dat$val)
dat$log.cum.val <- log(dat$cum.val)
```
```{r}
plotfun <- function(dat, reg, ylims, lags=c(0,3,5,7,10,15), xlims=c(100,5000)){
dat <- subset(dat, date <= as.Date("2020-06-01"))
val1 <- subset(dat, region==reg & lag==5)$val
val1 <- 100*round(mean(val1[(length(val1)-3):length(val1)]),digits=3)
val2 <- subset(dat, region==reg & lag==7)$val
val2 <- 100*round(mean(val2[(length(val2)-3):length(val2)]),digits=3)
p <- ggplot(subset(dat, region==reg & lag%in%lags), aes(x=x,y=log.val,color=lag)) + geom_line() + geom_point() +
ggtitle(paste0(reg," ",val1," to ", val2,"%")) +
xlab("Cumulative Deaths up to June 1") +
scale_y_continuous(name="lagged CFR", breaks=ylims[1]:ylims[2],
labels=round(exp(ylims[1]:ylims[2]),digits=2), limits=ylims)
p
}
plotcumfun <- function(dat, reg, ylims, lags=7){
p <- ggplot(subset(dat, region==reg & lag%in%lags), aes(x=date,y=log.cum.val,color=lag)) + geom_line() +
ggtitle(reg) +
xlab("") +
scale_y_continuous(name="lagged CFR", breaks=ylims[1]:ylims[2],
labels=round(exp(ylims[1]:ylims[2]),digits=2), limits=ylims)
p
}
```
In Italy, the flat line is 3- to 5-day lag, which would suggest that people were not coming in to get tested until they were very sick and close to dying.
```{r}
reg <- "Italy"
p1 <- plotfun(dat, reg, c(-4,0))
reg <- "Italy"
p1b <- plotcumfun(dat, reg, c(-6,-1), lags=c(7,15))
gridExtra::grid.arrange(p1,p1b, nrow=1)
```
France shows the 7-dag lag being the flat line and CFR dropping a bit lower than in Lombardia. It's been below 1% in late summer and lately has been 1%. Lombardia is older and this may reflect that in France it is the positives are younger on average.
```{r}
reg <- "France"
p1 <- plotfun(dat, reg, c(-4,0))
p1b <- plotcumfun(dat, reg, c(-6,-1), lags=c(7,15))
gridExtra::grid.arrange(p1,p1b, nrow=1)
```
Belgium.
```{r}
reg <- "Belgium"
p1 <- plotfun(dat, reg, c(-4,0))
p1b <- plotcumfun(dat, reg, c(-6,-1), lags=c(7,15))
gridExtra::grid.arrange(p1,p1b, nrow=1)
```
Spain is an interesting case with a big drop in the CFR in June. In October as cases have skyrocketed back up in Spain, the CFR has risen to ca 1.5%.
```{r}
reg <- "Spain"
p1 <- plotfun(dat, reg, c(-4,0))
p1b <- plotcumfun(dat, reg, c(-6,-1), lags=c(7,15))
gridExtra::grid.arrange(p1,p1b, nrow=1)
```
United Kingdom, same pattern. CFR dropped steadily through late summer and is now steady at near 1%.
```{r}
reg <- "United Kingdom"
p1 <- plotfun(dat, reg, c(-4,0))
p1b <- plotcumfun(dat, reg, c(-6,-1), lags=c(7,15))
gridExtra::grid.arrange(p1,p1b, nrow=1)
```
Germany, and actually all of Central Europe, is an exception in that a 10- or longer lag is the flatline. This suggests either some difference in the age of positives or a difference in how deaths are reported, i.e. perhaps there is more of a lag in reporting. The 7- versus 10-day lag does not make a huge difference in the reported weekly CFR (right panel). Like the rest of western Europe, the CFR for the fall wave is about 1%.
```{r}
reg <- "Germany"
p1 <- plotfun(dat, reg, c(-6,0))
p1b <- plotcumfun(dat, reg, c(-6,-1), lags=c(7,15))
gridExtra::grid.arrange(p1,p1b, nrow=1)
```
The Czech Republic and Poland are cases where the country avoided the spring waves but are now experiencing the highest per capita infection rates in Europe. Mortality was also quite low for their small spring waves. The fall CFR for the Czech Republic is currently similar to the rest of Europe at about 1.5% while that in Poland is above 2%. I used a 7-day lag though a 10-day lag looks a bit better. With a 10-day lag, the CFR is a little higher; closer to 2% for the Czech Republic and 3% for Poland.
```{r}
reg <- "Czechia"
p1 <- plotfun(dat, reg, c(-6,0))
p1b <- plotcumfun(dat, reg, c(-6,-1), lags=c(7,15))
gridExtra::grid.arrange(p1,p1b, nrow=1)
```
```{r}
reg <- "Poland"
p1 <- plotfun(dat, reg, c(-6,0))
p1b <- plotcumfun(dat, reg, c(-6,-1), lags=c(7,15))
gridExtra::grid.arrange(p1,p1b, nrow=1)
```
## US CFR
The flattening lags for the US is about the same, 5 to 7 days prior. But look at the lagged CFR (lag 5 to 7). The CFR was much lower in the US versus Europe in the spring wave, closer 5% versus the 7-10+% that was seen in western European countries. We see the same pattern of a declining weekly CFR over summer, again presumably as doctors figured out how to treat patients better. Now as we enter the big fall/winter wave, the CFR has leveled out at about 1.5-2%.
```{r}
reg <- "US"
p1 <- plotfun(dat, reg, c(-6,0))
p1b <- plotcumfun(dat, reg, c(-6,-1), lags=c(7,15))
gridExtra::grid.arrange(p1,p1b, nrow=1)
```
The US numbers are the sum across very diverse states, each of which is handling reporting at different levels and which have different testing levels. However the general patterns are quite similar across states. The fall-early winter CFR is running about 1-2% across states, with most near 2%.
```{r}
datS <- c()
for(reg in c("OK","NV","NY","NJ","WA","CA","LA","MI","ND","FL","GA","TX","WI","AL","MT","AZ","TN","AL","MA","MD","IL","KS","IA","SD","UT","ID")){
x<-subset(states, region==reg)
x$new.death <- c(NA, diff(x$death))
x$new.positive <- c(NA, diff(x$positive))
x$cum.death <- stats::filter(x$new.death, rep(1, 7), sides=1)
x$cum.positive <- stats::filter(x$new.positive, rep(1, 7), sides=1)
for(lag in 0:15){
tmp <- data.frame(date=x$date[(1+lag):nrow(x)],
x=x$death[(1+lag):nrow(x)],
val= x$death[(1+lag):nrow(x)]/x$positive[1:(nrow(x)-lag)],
cum.x=x$cum.death[(1+lag):nrow(x)],
cum.val= x$cum.death[(1+lag):nrow(x)]/x$cum.positive[1:(nrow(x)-lag)],
lag=lag, region=reg)
datS <- rbind(datS, tmp)
}
}
datS$lag <- as.factor(datS$lag)
datS$region <- as.factor(datS$region)
datS$log.val <- log(datS$val)
datS$log.cum.val <- log(datS$cum.val)
```
```{r}
lag <- c(7,15)
pl <- list()
for (reg in c("NY", "FL", "IA", "AZ", "MI", "WI", "ND", "CA","WA"))
pl[[reg]] <-plotcumfun(datS, reg, c(-6,0), lags=lag)+ theme(legend.position = "none")
gridExtra::grid.arrange(grobs=pl, ncol=3)
```
## Comparision across hotspot and non-hotspot states
The 7-day and 15-day lag CFR does not seem higher in states with very high cases per million and where there are reports of full ICUs.
States where ICUs are full are almost full.
```{r}
lag <- c(7,15)
pl <- list()
for (reg in c("IA", "ND", "SD", "ID"))
pl[[reg]] <-plotcumfun(datS, reg, c(-6,0), lags=lag) + geom_hline(yintercept = log(0.02))+ theme(legend.position = "none")
gridExtra::grid.arrange(grobs=pl, ncol=2)
```
States with plenty of capacity.
```{r}
lag <- c(7,15)
pl <- list()
for (reg in c("CA", "WA", "TX", "NY"))
pl[[reg]] <-plotcumfun(datS, reg, c(-6,0), lags=lag) + geom_hline(yintercept = log(0.02)) + theme(legend.position = "none")
gridExtra::grid.arrange(grobs=pl, ncol=2)
```
FL and AZ are interesting cases. The summer case peak was in early- to mid-July and CFR steadily increased after that for a month and a half.
```{r}
lag <- c(7,15)
pl <- list()
for (reg in c("FL", "AZ"))
pl[[reg]] <-plotcumfun(datS, reg, c(-6,0), lags=lag) + geom_hline(yintercept = log(0.02)) + theme(legend.position = "none") + xlim(c(as.Date("2020-05-01"), Sys.Date()))
gridExtra::grid.arrange(grobs=pl, ncol=2)
```