Abstract
Using Google mobility data as a measure for the compliance with stay-at-home orders during the first wave of Coronavirus from March to August 2020, Savaris et al. (2021) find no significant relationship between increases in stay-at-home mobility and fatalities due to Covid19. This comment revisits the findings for a selected subsample of 15 European countries, i.e. 105 potential pairs and finds that the results in Savaris et al. (2021) are due the implicit assumption that changes in mobility have an immediate effect on Covid19 fatalaties. Accounting for the common observation that deaths are a lagged measure for the dynamic of the pandemic, the results change drastically. The methodology of Savaris et al. (2021) yields only two related pairs between mobility and deaths for any of the countries under considerations. Including up to 6 weekly lags into the regressions and replacing the simple \(t\)-test by a Granger causality test leaves us with 73 significant relations (69.52%), without even accounting for various other issues of the study. The conclusions in the original study are premature.if (file.exists("data_raw/mobility_raw.rds")) {
mobility_raw <- readRDS("data_raw/mobility_raw.rds")
} else {
mobility_raw <- download_google_cmr_data()
saveRDS(mobility_raw, "data_raw/mobility_raw.rds")
}
owid_raw <- read_csv("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv")
mobility_clean <- mobility_raw %>%
filter(iso3c %in% sample_countries) %>%
dplyr::select(iso3c, date, residential) %>%
filter(date >= "2020-02-08" & date <= "2020-08-21")
owid_deaths_clean <- owid_raw %>%
rename(iso3c = iso_code) %>%
filter(iso3c %in% sample_countries) %>%
filter(date >= "2020-02-08" & date <= "2020-08-21") %>%
dplyr::select(iso3c, date, new_deaths_per_million) %>%
mutate(new_deaths_per_million = replace_na(new_deaths_per_million, 0))
data_for_regression <- left_join(mobility_clean, owid_deaths_clean, by = c("date", "iso3c")) %>%
mutate(epiweek = epiweek(date)) %>%
group_by(iso3c, epiweek) %>%
summarise(deaths_per_million = sum(new_deaths_per_million, na.rm = TRUE),
residential = mean(residential),
date = max(date), .groups = "drop") %>%
group_by(iso3c) %>%
mutate(delta_deaths = deaths_per_million - lag(deaths_per_million,1),
delta_residential = residential - lag(residential)) %>%
ungroup()
Mobility data is from Google, Savaris et al. (2021) use the category “residential” which measures the time spent at home relatively to a reference value. Data is downloaded and cleaned via the tidycovid19 R package by Joachim Gassen, available via Github here. New deaths per million are from ourworldindata.com, available from Github here.
In both cases, the results in this comment rely on the raw data. It is not clear from the paper which time series is used for the deaths per million, since the auxiliary codes of the authors available here seem to use smoothed time series, but the paper makes no such statement.
The sample of countries is smaller compared to Savaris et al. (2021) to ensure comparability between the countries, which is not completely given in the original data set. The sample countries are: Sweden (SWE), Norway (NOR), Netherlands (NLD), Luxembourg (LUX), Italy (ITA), Ireland (IRL), Great Britain (GBR), France (FRA), Finland (FIN), Spain (ESP), Denmark (DNK), Germany (DEU), Switzerland (CHE), Belgium (BEL), and Austria (AUT).
lineplot <- data_for_regression %>% filter(iso3c %in% c("DEU", "GBR", "IRL", "SWE", "NOR", "FRA", "AUT")) %>%
ggplot(data = ., aes(x = date, y = residential, color =iso3c, linetype = iso3c )) +
geom_line(size = 1) +
theme_ibf() +
ylab("Weekly residential mobility") +
xlab("Date") +
# scale_colour_manual(values = c(colors[2], colors[1])) +
scale_color_brewer(palette = "Paired") +
scale_x_date(expand = c(0, 0)) +
theme(legend.title = element_blank())
ggplotly(lineplot)
Google residential mobility as the average over epidemilogical weeks for the sample countries under consideration. The sample period is 15 February 2020 to 21 August 2020.
The figure above illustrates the mobility data for a selected sample of 7 countries, namely Austria, France, Ireland, Sweden, Germany, Great Britain and Norway. Focusing on Sweden as the main counterfactual with respect to the implementation of stay-at-home orders, it is clearly visible that the other selected countries exhibit higher increases in stay-at-home mobility. France has the highest increase, well in line with the stricter shutdowns compared to , e.g., Germany. However, it is not clear how the increases are related to actual lockdowns, since people anticipated stay-at-home orders in several cases and stayed home more before those became effective.
Savaris et al. (2021) start with taking first differences of the two time series deaths (\(Y_t\)) and and stay-at-home mobility (\(X_t\)) to guarantee stationarity. Then, they estimate the following linear regression
\[\begin{equation} \Delta (Y_t^A - Y_t^B) = \beta_0 + \beta_1 \Delta (X_t^A - X_t^B) + \epsilon_t \end{equation}\]
where the coefficient \(\beta_1\) is the main quantity of interest. More precisely, the authors test \(H_0: \beta_1 = 0\) against the two sided alternative hypothesis that \(\beta_1 \neq 0\). Intuitively, a significant coefficient \(\beta_1\) indicates that the differences in the weekly changes in deaths between two countries is linearly related to their difference in the weekly changes of stay-at-home mobility. For each pair of their sample regions/countries, Savaris et al. (2021) compute \(p\)-value for the null hypothesis above and adopt a 5% hurdle for the level of significance. Savaris et al. (2021) use OLS standard errors or do not report it otherwise.
# idea: fix a, loop over b, repeat for all a \in countries
A <- sample_countries
replication_results <- foreach(i = 1:length(A), .combine = 'rbind', .packages = c("tidyverse", "lubridate")) %dopar% {
pvalues_for_heatmap_tmp <- tibble(A = character(N-1),
B = character(N-1),
pval = numeric(N-1))
A_tmp <- data_for_regression %>%
filter(iso3c == A[i]) %>%
dplyr::select(date, delta_deaths, delta_residential) %>%
rename(Y_A = delta_deaths,
X_A = delta_residential)
pvalues_for_heatmap_tmp$A <- A[i]
B <- sample_countries[sample_countries!=A[i]]
for (j in 1:length(B)){
B_tmp <- data_for_regression %>%
filter(iso3c == B[j]) %>%
dplyr::select(date, delta_deaths, delta_residential) %>%
rename(Y_B = delta_deaths,
X_B = delta_residential)
pvalues_for_heatmap_tmp$B[j] <- B[j]
diff_data <- left_join(A_tmp, B_tmp, by = "date") %>%
mutate(Delta_X = X_A - X_B,
Delta_Y = Y_A - Y_B) %>%
dplyr::select(Delta_X, Delta_Y) %>% drop_na()
model_tmp <- lm(Delta_Y ~ Delta_X, data = diff_data) %>% summary()
pvalues_for_heatmap_tmp$pval[j] <- model_tmp$coefficients[2,4]
}
return(pvalues_for_heatmap_tmp)
}
# diff_data <- left_join(germany, sweden) %>%
# mutate(Delta_X = X_DEU - X_SWE,
# Delta_Y = Y_DEU - Y_SWE) %>%
# dplyr::select(Delta_X, Delta_Y) %>% drop_na()
#
#
# lm(Delta_Y ~Delta_X, data = diff_data) %>% summary()
#
# lm(Delta_Y ~lag(Delta_X,1) +lag(Delta_X,2) + lag(Delta_X,3) +lag(Delta_X,4), data = diff_data) %>% summary()
replication_results$pval[replication_results$pval > 0.05] <- NA
plot <- ggplot(replication_results, aes(A, B)) +
geom_tile(aes(fill = pval, colour = "")) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
scale_fill_gradient(limits = c(0, 0.05), breaks = c(0.01, 0.05), na.value="gray") +
ylab(element_blank()) + xlab(element_blank()) +
scale_colour_manual(values=NA) +
guides(colour=guide_legend("not significant", override.aes=list(fill="gray"))) +
theme_ibf()
plot
Heatmap of the \(p\)-values with OLS standard errors from the regression in Equation.
The heatmab above presents the results for our 15 European countries under considerations. The results are well in line with the findings in Savaris et al. (2021), there seems to be little relation between the two variables in the OLS equation. Only 2 out of 105 pairs exhibit a significant relation, i.e. a little less than 2%. Interestingly, those two pairs are insignificant in the original study, maybe due to differences in the data preparation, as outlined in the previous Section.
As outlined above, a contemporaneous relationship between mobility and deaths is hard to reconcile with the incubation time after contracting the virus, the onset of symptoms and the eventual death from Covid19. A simple approach to account for this delay is including lags of the right-hand side of OLS Equation and replace the \(t\)-test above with an \(F\) test or Granger causality test. Formally, the regression looks as follows:
\[\begin{equation} \Delta (Y_t^A - Y_t^B) = \beta_0 + \sum_{l=1}^{L} \beta_l \Delta (X_{t-l}^A - X_{t-l}^B) + \sum_{j=1}^{L} \gamma_j \Delta (Y_{t-j}^A - Y_{t-j}^B) + \epsilon_t \end{equation}\]
The \(L\) is chosen according to the Akaike Information Criterion for each individual pair of data with a maximum lag of 6. The Granger causality test has the null hypothesis that \(\beta_1 = \beta_2 = ... = \beta_L = 0\) against the alternative that at last one \(\beta_l \neq 0 \forall l\).
# idea: fix a, loop over b, repeat for all a \in countries
A <- sample_countries
replication_results <- foreach(i = 1:length(A), .combine = 'rbind', .packages = c("tidyverse", "lubridate", "vars")) %dopar% {
pvalues_for_heatmap_tmp <- tibble(A = character(N-1),
B = character(N-1),
pval = numeric(N-1))
A_tmp <- data_for_regression %>%
filter(iso3c == A[i]) %>%
dplyr::select(date, delta_deaths, delta_residential) %>%
rename(Y_A = delta_deaths,
X_A = delta_residential)
pvalues_for_heatmap_tmp$A <- A[i]
B <- sample_countries[sample_countries!=A[i]]
for (j in 1:length(B)){
B_tmp <- data_for_regression %>%
filter(iso3c == B[j]) %>%
dplyr::select(date, delta_deaths, delta_residential) %>%
rename(Y_B = delta_deaths,
X_B = delta_residential)
pvalues_for_heatmap_tmp$B[j] <- B[j]
diff_data <- left_join(A_tmp, B_tmp, by = "date") %>%
mutate(Delta_X = X_A - X_B,
Delta_Y = Y_A - Y_B) %>%
dplyr::select(Delta_X, Delta_Y) %>% drop_na()
p<- VARselect(diff_data, lag.max = 6,
type = "const")$selection[1]
varmodel <- VAR(diff_data, lag.max = p, type = "const")
summary(varmodel)
granger_tmp <- causality(varmodel, cause="Delta_X")
granger_tmp
granger_tmp$Granger$p.value
pvalues_for_heatmap_tmp$pval[j] <- granger_tmp$Granger$p.value
}
return(pvalues_for_heatmap_tmp)
}
# diff_data <- left_join(germany, sweden) %>%
# mutate(Delta_X = X_DEU - X_SWE,
# Delta_Y = Y_DEU - Y_SWE) %>%
# dplyr::select(Delta_X, Delta_Y) %>% drop_na()
#
#
# lm(Delta_Y ~Delta_X, data = diff_data) %>% summary()
#
# lm(Delta_Y ~lag(Delta_X,1) +lag(Delta_X,2) + lag(Delta_X,3) +lag(Delta_X,4), data = diff_data) %>% summary()
significant <- length(replication_results$pval[replication_results$pval < 0.05])/2
replication_results$pval[replication_results$pval > 0.05] <- NA
plot <- ggplot(replication_results, aes(A, B)) +
geom_tile(aes(fill = pval, colour = "")) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
scale_fill_gradient(limits = c(0, 0.05), breaks = c(0.01, 0.05), na.value="gray") +
# ylab(element_blank()) + xlab(element_blank()) +
scale_colour_manual(values=NA) +
guides(colour=guide_legend("not significant", override.aes=list(fill="gray"))) +
theme_ibf()
plot
Heatmap of the \(p\)-values from a Granger causality test with 4 lags. The null hypothesis is that \(\Delta X\) does not Granger cause \(\Delta Y\) and a \(p\)-value below 0.05 indicates a rejection of this null hypothesis at the 5% level. Rejecting the null hypothesis implies Granger causality between \(\Delta X\) and \(\Delta Y\).
The heatmap above presents the \(p\)-values of the null hypothesis above. A value below 0.05 is tantamount to rejection of the null at the 5% level and is the minimum threshold. In contrast to the analysis of Savaris et al. (2021), the number of significantly related pairs increases drastically to 73 (69.52%). Therefore, unaccounted lags in the relation between \(X\) and \(Y\) are a likely explanation for their negative results.
Leibniz University Hannover, Institute of Banking and Finance. E-Mail:sebastian.schroen@finance.uni-hannover.de↩︎