if (file.exists("data_raw/mobility_raw.rds")) {
  
  mobility_raw <- readRDS("data_raw/mobility_raw.rds")
  
  } else { 
    
  mobility_raw <- download_google_cmr_data()
  saveRDS(mobility_raw, "data_raw/mobility_raw.rds")
  }



owid_raw <- read_csv("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv")
mobility_clean <- mobility_raw %>% 
  filter(iso3c %in% sample_countries) %>% 
  dplyr::select(iso3c, date, residential) %>% 
  filter(date  >= "2020-02-08" & date <= "2020-08-21") 



owid_deaths_clean <- owid_raw %>% 
  rename(iso3c = iso_code) %>% 
  filter(iso3c %in% sample_countries) %>% 
  filter(date  >= "2020-02-08" & date <= "2020-08-21") %>% 
  dplyr::select(iso3c, date, new_deaths_per_million) %>% 
  mutate(new_deaths_per_million = replace_na(new_deaths_per_million, 0))

  

data_for_regression <- left_join(mobility_clean, owid_deaths_clean, by = c("date", "iso3c")) %>% 
  mutate(epiweek = epiweek(date)) %>% 
  group_by(iso3c, epiweek) %>% 
  summarise(deaths_per_million = sum(new_deaths_per_million, na.rm = TRUE),
            residential = mean(residential),
            date = max(date), .groups = "drop") %>% 
  group_by(iso3c) %>% 
  mutate(delta_deaths = deaths_per_million - lag(deaths_per_million,1),
         delta_residential = residential - lag(residential)) %>% 
  ungroup()

1 Data collection and summary statistics

1.1 Sources and preparation

Mobility data is from Google, Savaris et al. (2021) use the category “residential” which measures the time spent at home relatively to a reference value. Data is downloaded and cleaned via the tidycovid19 R package by Joachim Gassen, available via Github here. New deaths per million are from ourworldindata.com, available from Github here.

In both cases, the results in this comment rely on the raw data. It is not clear from the paper which time series is used for the deaths per million, since the auxiliary codes of the authors available here seem to use smoothed time series, but the paper makes no such statement.

The sample of countries is smaller compared to Savaris et al. (2021) to ensure comparability between the countries, which is not completely given in the original data set. The sample countries are: Sweden (SWE), Norway (NOR), Netherlands (NLD), Luxembourg (LUX), Italy (ITA), Ireland (IRL), Great Britain (GBR), France (FRA), Finland (FIN), Spain (ESP), Denmark (DNK), Germany (DEU), Switzerland (CHE), Belgium (BEL), and Austria (AUT).

1.2 Summary statistics

lineplot <- data_for_regression %>%  filter(iso3c %in% c("DEU", "GBR", "IRL", "SWE", "NOR", "FRA", "AUT")) %>% 
ggplot(data = ., aes(x = date, y = residential, color =iso3c, linetype = iso3c )) + 
  geom_line(size = 1) +
  theme_ibf() +
  ylab("Weekly residential mobility") + 
  xlab("Date") +
#  scale_colour_manual(values = c(colors[2], colors[1])) +
  scale_color_brewer(palette = "Paired") +
  scale_x_date(expand = c(0, 0)) +
  theme(legend.title =  element_blank())


ggplotly(lineplot)

Google residential mobility as the average over epidemilogical weeks for the sample countries under consideration. The sample period is 15 February 2020 to 21 August 2020.

The figure above illustrates the mobility data for a selected sample of 7 countries, namely Austria, France, Ireland, Sweden, Germany, Great Britain and Norway. Focusing on Sweden as the main counterfactual with respect to the implementation of stay-at-home orders, it is clearly visible that the other selected countries exhibit higher increases in stay-at-home mobility. France has the highest increase, well in line with the stricter shutdowns compared to , e.g., Germany. However, it is not clear how the increases are related to actual lockdowns, since people anticipated stay-at-home orders in several cases and stayed home more before those became effective.

2 Revisiting the results for 15 Western European countries

2.1 Replication of main results

Savaris et al. (2021) start with taking first differences of the two time series deaths (\(Y_t\)) and and stay-at-home mobility (\(X_t\)) to guarantee stationarity. Then, they estimate the following linear regression

\[\begin{equation} \Delta (Y_t^A - Y_t^B) = \beta_0 + \beta_1 \Delta (X_t^A - X_t^B) + \epsilon_t \end{equation}\]

where the coefficient \(\beta_1\) is the main quantity of interest. More precisely, the authors test \(H_0: \beta_1 = 0\) against the two sided alternative hypothesis that \(\beta_1 \neq 0\). Intuitively, a significant coefficient \(\beta_1\) indicates that the differences in the weekly changes in deaths between two countries is linearly related to their difference in the weekly changes of stay-at-home mobility. For each pair of their sample regions/countries, Savaris et al. (2021) compute \(p\)-value for the null hypothesis above and adopt a 5% hurdle for the level of significance. Savaris et al. (2021) use OLS standard errors or do not report it otherwise.

# idea: fix a, loop over b, repeat for all a \in countries




A <- sample_countries


replication_results <- foreach(i = 1:length(A), .combine = 'rbind', .packages = c("tidyverse", "lubridate")) %dopar% {
    

    pvalues_for_heatmap_tmp <- tibble(A = character(N-1),
                              B = character(N-1),
                              pval = numeric(N-1))
    
    
    A_tmp <- data_for_regression %>% 
    filter(iso3c == A[i]) %>% 
    dplyr::select(date, delta_deaths, delta_residential) %>% 
    rename(Y_A = delta_deaths,
           X_A = delta_residential) 
    
    
    pvalues_for_heatmap_tmp$A <- A[i]
    
    
    B <- sample_countries[sample_countries!=A[i]]
    
    for (j in 1:length(B)){

    B_tmp <- data_for_regression %>% 
    filter(iso3c == B[j]) %>% 
    dplyr::select(date, delta_deaths, delta_residential) %>% 
    rename(Y_B = delta_deaths,
           X_B = delta_residential)  
    
    
    pvalues_for_heatmap_tmp$B[j] <- B[j]
    

        
    diff_data <- left_join(A_tmp, B_tmp, by = "date") %>% 
    mutate(Delta_X = X_A - X_B,
           Delta_Y = Y_A - Y_B) %>% 
    dplyr::select(Delta_X, Delta_Y) %>%  drop_na()
    
    
    model_tmp <- lm(Delta_Y ~ Delta_X, data =  diff_data) %>% summary()
    pvalues_for_heatmap_tmp$pval[j] <- model_tmp$coefficients[2,4]
       
      
    
    }
  
  return(pvalues_for_heatmap_tmp)
    
  }



# diff_data <- left_join(germany, sweden) %>% 
#   mutate(Delta_X = X_DEU - X_SWE,
#          Delta_Y = Y_DEU - Y_SWE) %>% 
#   dplyr::select(Delta_X, Delta_Y) %>%  drop_na()
# 
# 
# lm(Delta_Y ~Delta_X, data =  diff_data) %>% summary()
# 
# lm(Delta_Y ~lag(Delta_X,1) +lag(Delta_X,2) + lag(Delta_X,3) +lag(Delta_X,4), data =  diff_data) %>% summary()


replication_results$pval[replication_results$pval > 0.05] <- NA

plot <- ggplot(replication_results, aes(A, B)) + 
  geom_tile(aes(fill = pval, colour = "")) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  scale_fill_gradient(limits = c(0, 0.05), breaks = c(0.01, 0.05), na.value="gray") +
  ylab(element_blank()) + xlab(element_blank()) +
 scale_colour_manual(values=NA) +
  guides(colour=guide_legend("not significant", override.aes=list(fill="gray"))) +
  theme_ibf()


plot
Heatmap of the $p$-values with OLS standard errors from the regression in Equation. \label{fig_replication}

Heatmap of the \(p\)-values with OLS standard errors from the regression in Equation.

The heatmab above presents the results for our 15 European countries under considerations. The results are well in line with the findings in Savaris et al. (2021), there seems to be little relation between the two variables in the OLS equation. Only 2 out of 105 pairs exhibit a significant relation, i.e. a little less than 2%. Interestingly, those two pairs are insignificant in the original study, maybe due to differences in the data preparation, as outlined in the previous Section.

2.2 Including weekly lags

As outlined above, a contemporaneous relationship between mobility and deaths is hard to reconcile with the incubation time after contracting the virus, the onset of symptoms and the eventual death from Covid19. A simple approach to account for this delay is including lags of the right-hand side of OLS Equation and replace the \(t\)-test above with an \(F\) test or Granger causality test. Formally, the regression looks as follows:

\[\begin{equation} \Delta (Y_t^A - Y_t^B) = \beta_0 + \sum_{l=1}^{L} \beta_l \Delta (X_{t-l}^A - X_{t-l}^B) + \sum_{j=1}^{L} \gamma_j \Delta (Y_{t-j}^A - Y_{t-j}^B) + \epsilon_t \end{equation}\]

The \(L\) is chosen according to the Akaike Information Criterion for each individual pair of data with a maximum lag of 6. The Granger causality test has the null hypothesis that \(\beta_1 = \beta_2 = ... = \beta_L = 0\) against the alternative that at last one \(\beta_l \neq 0 \forall l\).

# idea: fix a, loop over b, repeat for all a \in countries




A <- sample_countries


replication_results <- foreach(i = 1:length(A), .combine = 'rbind', .packages = c("tidyverse", "lubridate", "vars")) %dopar% {
    

    pvalues_for_heatmap_tmp <- tibble(A = character(N-1),
                              B = character(N-1),
                              pval = numeric(N-1))
    
    
    A_tmp <- data_for_regression %>% 
    filter(iso3c == A[i]) %>% 
    dplyr::select(date, delta_deaths, delta_residential) %>% 
    rename(Y_A = delta_deaths,
           X_A = delta_residential) 
    
    
    pvalues_for_heatmap_tmp$A <- A[i]
    
    
    B <- sample_countries[sample_countries!=A[i]]
    
    for (j in 1:length(B)){

    B_tmp <- data_for_regression %>% 
    filter(iso3c == B[j]) %>% 
    dplyr::select(date, delta_deaths, delta_residential) %>% 
    rename(Y_B = delta_deaths,
           X_B = delta_residential)  
    
    
    pvalues_for_heatmap_tmp$B[j] <- B[j]
    
    
        
    diff_data <- left_join(A_tmp, B_tmp, by = "date") %>% 
    mutate(Delta_X = X_A - X_B,
           Delta_Y = Y_A - Y_B) %>% 
    dplyr::select(Delta_X, Delta_Y) %>%  drop_na()
    
    
    
    
    p<- VARselect(diff_data, lag.max = 6, 
                type = "const")$selection[1]
    
    
    varmodel <- VAR(diff_data, lag.max = p, type = "const")
    
    summary(varmodel)
    
    granger_tmp <- causality(varmodel, cause="Delta_X")
    granger_tmp

    granger_tmp$Granger$p.value
    
     pvalues_for_heatmap_tmp$pval[j] <-  granger_tmp$Granger$p.value
    
    }
  
  return(pvalues_for_heatmap_tmp)
    
  }



# diff_data <- left_join(germany, sweden) %>% 
#   mutate(Delta_X = X_DEU - X_SWE,
#          Delta_Y = Y_DEU - Y_SWE) %>% 
#   dplyr::select(Delta_X, Delta_Y) %>%  drop_na()
# 
# 
# lm(Delta_Y ~Delta_X, data =  diff_data) %>% summary()
# 
# lm(Delta_Y ~lag(Delta_X,1) +lag(Delta_X,2) + lag(Delta_X,3) +lag(Delta_X,4), data =  diff_data) %>% summary()


significant <- length(replication_results$pval[replication_results$pval < 0.05])/2


replication_results$pval[replication_results$pval > 0.05] <- NA


plot <- ggplot(replication_results, aes(A, B)) + 
  geom_tile(aes(fill = pval, colour = "")) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  scale_fill_gradient(limits = c(0, 0.05), breaks = c(0.01, 0.05), na.value="gray") +
 # ylab(element_blank()) + xlab(element_blank()) +
 scale_colour_manual(values=NA) +
  guides(colour=guide_legend("not significant", override.aes=list(fill="gray"))) +
  theme_ibf()


plot
Heatmap of the $p$-values from a Granger causality test with 4 lags. The null hypothesis is that $\Delta X$ does not Granger cause $\Delta Y$ and a $p$-value below 0.05 indicates a rejection of this null hypothesis at the 5% level. Rejecting the null hypothesis implies Granger causality between $\Delta X$ and $\Delta Y$. \label{fig_granger}

Heatmap of the \(p\)-values from a Granger causality test with 4 lags. The null hypothesis is that \(\Delta X\) does not Granger cause \(\Delta Y\) and a \(p\)-value below 0.05 indicates a rejection of this null hypothesis at the 5% level. Rejecting the null hypothesis implies Granger causality between \(\Delta X\) and \(\Delta Y\).

The heatmap above presents the \(p\)-values of the null hypothesis above. A value below 0.05 is tantamount to rejection of the null at the 5% level and is the minimum threshold. In contrast to the analysis of Savaris et al. (2021), the number of significantly related pairs increases drastically to 73 (69.52%). Therefore, unaccounted lags in the relation between \(X\) and \(Y\) are a likely explanation for their negative results.

References

Savaris, R. F., G. Pumi, J. Dalzochio, and R. Kunst. 2021. “Stay-at-Home Policy Is a Case of Exception Fallacy: An Internet-Based Ecological Study.” Scientific Reports 11 (1): 5313. http://www.nature.com/articles/s41598-021-84092-1.

  1. Leibniz University Hannover, Institute of Banking and Finance. E-Mail:↩︎