AirPollutionCOVID19

Links between air pollution and COVID-19 in England


Marco Travaglio, Yizhou Yu, Rebeka Popovic, Liza Selley, Nuno Santos Leal, and Luis Miguel Martins
MRC Toxicology Unit, University of Cambridge

Yizhou Yu, MRC Toxicology Unit, University of Cambridge, (yzy21 [at] mrc-tox.cam.ac.uk) September 2020

General information

This pipeline relates to data included in Figures 3 to 5 of our study titled Links between air pollution and COVID-19 in England, as well as Supplementary tables included in the supplementary materials.

The input files for this analysis pipeline are on the master branch of this GitHub page:
https://github.com/M1gus/AirPollutionCOVID19

This study was first deposited in a preprint server (medRxiv) in April 2020: https://www.medrxiv.org/content/10.1101/2020.04.16.20067405v5

After peer-review, the study was published in the journal “Environmental Pollution” in October 2020: https://www.sciencedirect.com/science/article/pii/S0269749120365489?via%3Dihub#abs0020

The main aims of the workflow presented here were as follows:
1. Determine a relationship between air pollutants and COVID-19-associated deaths/cases in England
2. Investigate whether any relationship between air pollution and COVID-19 remains significant in the presence of confounding factors at the regional and subregional level
3. Determine the effect of air pollutants on infectivity at individual levels.

Only the UK Biobank data is not available in the repository as they require separate application. Please visit ukbiobank.ac.uk for more information. A detailed list of the variables used in the UK Biobank analysis is available here:

Supplementary Table 1. Variables from the UK Biobank.

library(stargazer)
stargazer(read.csv("data/suppl_table_1.csv"), summary=FALSE, rownames=FALSE, type = "html")
Variable.name UK.Biobank.ID Description.of.the.variable
x\_coord 20074 Home location at assessment - east co-ordinate
y\_coord 20075 Home location at assessment - north co-ordinate
townsend 189 Townsend deprivation index at recruitment
Age 34 2020 subtracted by year of birth
whr 48, 49 Waist circumference divided by hip circumference
highBP 4079, 4080 High Blood pressure\*
WP\_dusty 22609 Workplace very dusty\*
WP\_chemicals 22610 Workplace full of chemical or other fumes\*
WP\_cig 22611 Workplace had a lot of cigarette smoke from other people smoking\*
WP\_diesel 22615 Workplace had a lot of diesel exhaust\*
breathing 22616 Breathing problems during period of job\*
whistling 2316 Wheeze or whistling in the chest in last year
sex 31 Sex
fev 20153 Forced expiratory volume in 1-second (FEV1), predicted\*
copd 22130 Doctor diagnosed COPD (chronic obstructive pulmonary disease)\*
diabetes 2443 Diabetes diagnosed by doctor
n\_cancers 134 Number of self-reported cancers
smoking 20116 Smoking status\*


*These variables contained too few data and were excluded from analysis.
The detailed information for every variable can be retrieved by searching for the UK Biobank ID at ukbiobank.ac.uk.

Aim 1: Preliminary analysis

Deaths/cases ~ air pollutants + population
Then, Model comparisons

Load packages

library(MASS)
library(stargazer)
library(ggplot2)
library(dplyr)
library(tidyverse)
library(purrr)
library(AER)
library(reshape)
library(rgdal) # for spTransform, requires sudo apt install libgdal-dev
require(stringr)
library(sp)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
library(rgeos)

Please note:
the following packages may need to be installed: liblapack-dev
liblapack3
libopenblas-base
libopenblas-dev
texlive-fonts-recommended
lmodern
Can be done with apt-get on ubuntu

View the data

Read and verify the data:

preL_dt = read.csv("data/26-4-2020_yyAIR_COVID_PRE_LD_dt.csv")[,-1]
head(preL_dt)
##                     Region cases_preL deaths_preL Date_cases
## 1          East Of England       5356         746         NA
## 2                   London      16913        2120         NA
## 3                 Midlands      10501        1491         NA
## 4 North East And Yorkshire       8004         893         NA
## 5               North West       9394         847         NA
## 6               South East       8547         783         NA
##   Population_size_2018 Average_Pop_density_personkm2 Cases Deaths NO.levels
## 1              6201214                         324.0  6499   1448  9.502135
## 2              8908081                        5666.0 19511   3522 25.193133
## 3             10704906                         380.5 14844   2684 14.529437
## 4              8137524                         333.0 10633   1641 16.501209
## 5              7292093                         517.0 12093   1801  8.581661
## 6              9133625                         479.0 10919   1430 10.646127
##   NO2.levels O3.levels
## 1   19.55372  54.36748
## 2   38.51252  36.91391
## 3   24.11985  47.69889
## 4   25.15139  45.29534
## 5   20.06274  48.95081
## 6   20.47013  52.30382

Quickly visualise the distribution of each variable:

require (reshape)
#normalise per column & reshape the data for plotting
preL_dt_scale = as.data.frame(sapply(preL_dt[,-c(1,4)], scale))
preL_dt_scale$Region = preL_dt$Region
preL_dt_long = melt(preL_dt_scale, in.vars = "Region")

ggplot(preL_dt_long, aes (value)) +
    geom_density() +
  geom_histogram() +
    facet_wrap(~variable) + 
  theme_bw() + 
  theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))

Models

Analyse the death data:

summary(full_lm <- lm(data = preL_dt, deaths_preL ~ Average_Pop_density_personkm2 + NO.levels + NO2.levels + O3.levels))
## 
## Call:
## lm(formula = deaths_preL ~ Average_Pop_density_personkm2 + NO.levels + 
##     NO2.levels + O3.levels, data = preL_dt)
## 
## Residuals:
##        1        2        3        4        5        6        7 
## -101.701   -1.428  192.949 -187.616  -13.232   44.475   66.554 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                   -1.235e+04  4.844e+03  -2.549   0.1256  
## Average_Pop_density_personkm2 -6.310e-01  2.544e-01  -2.480   0.1313  
## NO.levels                     -3.326e+02  1.124e+02  -2.958   0.0978 .
## NO2.levels                     6.015e+02  1.782e+02   3.375   0.0777 .
## O3.levels                      8.824e+01  5.409e+01   1.631   0.2444  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 211.4 on 2 degrees of freedom
## Multiple R-squared:  0.956,  Adjusted R-squared:  0.8681 
## F-statistic: 10.87 on 4 and 2 DF,  p-value: 0.08599

The model is overdispersed - use negative binomial or poisson regressions.
The poisson and Negative Binomial models work better with count data. Changing these will not affect the OLS results.

preL_dt$NO.levels = round(preL_dt$NO.levels * 1000)
preL_dt$NO2.levels = round(preL_dt$NO2.levels * 1000)
preL_dt$O3.levels = round(preL_dt$O3.levels * 1000)
preL_dt$Average_Pop_density_personkm2 = round(preL_dt$Average_Pop_density_personkm2)

It is important to ensure that they are integers:

preL_dt_int = as.data.frame(sapply(preL_dt[,2:ncol(preL_dt)], as.integer))
preL_dt_int$Region = preL_dt$Region
preL_dt_int
##   cases_preL deaths_preL Date_cases Population_size_2018
## 1       5356         746         NA              6201214
## 2      16913        2120         NA              8908081
## 3      10501        1491         NA             10704906
## 4       8004         893         NA              8137524
## 5       9394         847         NA              7292093
## 6       8547         783         NA              9133625
## 7       2898         368         NA              5599735
##   Average_Pop_density_personkm2 Cases Deaths NO.levels NO2.levels O3.levels
## 1                           324  6499   1448      9502      19554     54367
## 2                          5666 19511   3522     25193      38513     36914
## 3                           380 14844   2684     14529      24120     47699
## 4                           333 10633   1641     16501      25151     45295
## 5                           517 12093   1801      8582      20063     48951
## 6                           479 10919   1430     10646      20470     52304
## 7                           235  3913    608     14047      21991     48054
##                     Region
## 1          East Of England
## 2                   London
## 3                 Midlands
## 4 North East And Yorkshire
## 5               North West
## 6               South East
## 7               South West
### Number of death
summary(full_lm.nb <- glm.nb(data = preL_dt_int, deaths_preL ~ Average_Pop_density_personkm2 + NO.levels + NO2.levels + O3.levels))
## 
## Call:
## glm.nb(formula = deaths_preL ~ Average_Pop_density_personkm2 + 
##     NO.levels + NO2.levels + O3.levels, data = preL_dt_int, init.theta = 196.4212515, 
##     link = log)
## 
## Deviance Residuals: 
##        1         2         3         4         5         6         7  
## -1.50927  -0.04204   1.13015  -1.12365  -0.18404   1.44286   0.13514  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   -1.090e+01  1.890e+00  -5.767 8.07e-09 ***
## Average_Pop_density_personkm2 -9.331e-04  9.829e-05  -9.494  < 2e-16 ***
## NO.levels                     -4.728e-04  4.487e-05 -10.539  < 2e-16 ***
## NO2.levels                     8.135e-04  7.086e-05  11.482  < 2e-16 ***
## O3.levels                      1.200e-04  2.068e-05   5.802 6.56e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(196.4213) family taken to be 1)
## 
##     Null deviance: 294.5029  on 6  degrees of freedom
## Residual deviance:   6.9535  on 2  degrees of freedom
## AIC: 91.745
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  196 
##           Std. Err.:  129 
## 
##  2 x log-likelihood:  -79.745
summary(full_lm.p <- glm(data = preL_dt_int, deaths_preL ~ Average_Pop_density_personkm2 + NO.levels + NO2.levels + O3.levels, family = "poisson"))
## 
## Call:
## glm(formula = deaths_preL ~ Average_Pop_density_personkm2 + NO.levels + 
##     NO2.levels + O3.levels, family = "poisson", data = preL_dt_int)
## 
## Deviance Residuals: 
##       1        2        3        4        5        6        7  
## -3.4193  -0.0597   2.5243  -2.8870  -0.4216   3.3526   0.8474  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   -1.159e+01  9.091e-01  -12.75   <2e-16 ***
## Average_Pop_density_personkm2 -9.771e-04  4.570e-05  -21.38   <2e-16 ***
## NO.levels                     -4.916e-04  2.301e-05  -21.36   <2e-16 ***
## NO2.levels                     8.457e-04  3.571e-05   23.68   <2e-16 ***
## O3.levels                      1.247e-04  9.462e-06   13.18   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1833.017  on 6  degrees of freedom
## Residual deviance:   38.539  on 2  degrees of freedom
## AIC: 109.09
## 
## Number of Fisher Scoring iterations: 4
#pchisq(2 * (logLik(full_lm.nb) - logLik(full_lm.p)), df = 1, lower.tail = FALSE)
#(est <- cbind(Estimate = coef(full_lm.nb), confint(full_lm.nb)))

Analyse the Cases data:

summary(cases_lm <- lm(data = preL_dt, cases_preL ~ Average_Pop_density_personkm2 + NO.levels + NO2.levels + O3.levels))
## 
## Call:
## lm(formula = cases_preL ~ Average_Pop_density_personkm2 + NO.levels + 
##     NO2.levels + O3.levels, data = preL_dt)
## 
## Residuals:
##        1        2        3        4        5        6        7 
## -1610.04   -62.57   495.92  -375.69  -200.13  2067.35  -314.85 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)                   -5.887e+04  4.405e+04  -1.337    0.313
## Average_Pop_density_personkm2 -3.805e+00  2.313e+00  -1.645    0.242
## NO.levels                     -2.778e+00  1.022e+00  -2.717    0.113
## NO2.levels                     4.118e+00  1.620e+00   2.541    0.126
## O3.levels                      2.380e-01  4.921e-01   0.484    0.676
## 
## Residual standard error: 1923 on 2 degrees of freedom
## Multiple R-squared:  0.9365, Adjusted R-squared:  0.8095 
## F-statistic: 7.373 on 4 and 2 DF,  p-value: 0.123
summary(cases_lm.nb <- glm.nb(data = preL_dt_int, cases_preL ~ Average_Pop_density_personkm2 + NO.levels + NO2.levels + O3.levels))
## 
## Call:
## glm.nb(formula = cases_preL ~ Average_Pop_density_personkm2 + 
##     NO.levels + NO2.levels + O3.levels, data = preL_dt_int, init.theta = 33.53196858, 
##     link = log)
## 
## Deviance Residuals: 
##        1         2         3         4         5         6         7  
## -1.62262  -0.06666  -0.12959   0.24412  -0.18307   1.97477  -0.62489  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   -4.451e+00  3.968e+00  -1.122 0.261994    
## Average_Pop_density_personkm2 -7.942e-04  2.084e-04  -3.811 0.000138 ***
## NO.levels                     -4.916e-04  9.215e-05  -5.334 9.60e-08 ***
## NO2.levels                     7.404e-04  1.461e-04   5.069 4.01e-07 ***
## O3.levels                      6.958e-05  4.430e-05   1.570 0.116300    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(33.532) family taken to be 1)
## 
##     Null deviance: 55.8888  on 6  degrees of freedom
## Residual deviance:  7.0375  on 2  degrees of freedom
## AIC: 132.87
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  33.5 
##           Std. Err.:  17.9 
## 
##  2 x log-likelihood:  -120.868
summary(cases_lm.p <- glm(data = preL_dt_int, cases_preL ~ Average_Pop_density_personkm2 + NO.levels + NO2.levels + O3.levels, family = "poisson"))
## 
## Call:
## glm(formula = cases_preL ~ Average_Pop_density_personkm2 + NO.levels + 
##     NO2.levels + O3.levels, family = "poisson", data = preL_dt_int)
## 
## Deviance Residuals: 
##        1         2         3         4         5         6         7  
## -20.3835   -0.5511   -0.2285    1.7246   -2.0609   28.5914  -10.4853  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   -2.934e+00  3.006e-01  -9.761   <2e-16 ***
## Average_Pop_density_personkm2 -7.301e-04  1.543e-05 -47.324   <2e-16 ***
## NO.levels                     -4.524e-04  7.507e-06 -60.273   <2e-16 ***
## NO2.levels                     6.787e-04  1.186e-05  57.231   <2e-16 ***
## O3.levels                      5.611e-05  3.104e-06  18.080   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 13239.2  on 6  degrees of freedom
## Residual deviance:  1350.5  on 2  degrees of freedom
## AIC: 1436.1
## 
## Number of Fisher Scoring iterations: 4
Supplementary Table 2. Effect of air pollutants on COVID-19 cases in England at the regional level.

Cases data: table

stargazer(cases_lm.p, cases_lm.nb,type="html",
          dep.var.labels="Number of cases until 8 April 2020",
          ci=TRUE, ci.level=0.95, single.row=TRUE)
Dependent variable:
Number of cases until 8 April 2020
Poisson negative
binomial
(1) (2)
Average\_Pop\_density\_personkm2 \-0.001\*\*\* (-0.001, -0.001) \-0.001\*\*\* (-0.001, -0.0004)
NO.levels \-0.0005\*\*\* (-0.0005, -0.0004) \-0.0005\*\*\* (-0.001, -0.0003)
NO2.levels 0.001\*\*\* (0.001, 0.001) 0.001\*\*\* (0.0005, 0.001)
O3.levels 0.0001\*\*\* (0.0001, 0.0001) 0.0001 (-0.00002, 0.0002)
Constant \-2.934\*\*\* (-3.523, -2.345) \-4.451 (-12.229, 3.327)
Observations 7 7
Log Likelihood \-713.041 \-61.434
theta 33.532\* (17.929)
Akaike Inf. Crit. 1,436.081 132.868
Note: *p\<0.1; **p\<0.05; ***p\<0.01


Each column of the table corresponds to a different type of regression model as indicated at the top. The raw estimate values of each model are listed with their 95% confidence intervals in parentheses. The p-values are indicated using the number of asterisks beside the estimates. OLS, ordinary least square; Average_Pop_densitykm2, average population density per square kilometer; NO.levels, nitrogen oxide levels; NO2.levels, nitrogen dioxide levels; O3.levels, ozone levels; Akaike Inf. Crit., Akaike’s Information Criteria; Residual Std. Error, Residual standard error.

Supplementary Table 3. Effect of air pollutants on COVID-19 deaths in England at the regional level.
stargazer(full_lm.p, full_lm.nb,type="html",
          dep.var.labels="Number of deaths until 8 April 2020",
          ci=TRUE, ci.level=0.95, single.row=TRUE)
Dependent variable:
Number of deaths until 8 April 2020
Poisson negative
binomial
(1) (2)
Average\_Pop\_density\_personkm2 \-0.001\*\*\* (-0.001, -0.001) \-0.001\*\*\* (-0.001, -0.001)
NO.levels \-0.0005\*\*\* (-0.001, -0.0004) \-0.0005\*\*\* (-0.001, -0.0004)
NO2.levels 0.001\*\*\* (0.001, 0.001) 0.001\*\*\* (0.001, 0.001)
O3.levels 0.0001\*\*\* (0.0001, 0.0001) 0.0001\*\*\* (0.0001, 0.0002)
Constant \-11.590\*\*\* (-13.372, -9.809) \-10.900\*\*\* (-14.604, -7.195)
Observations 7 7
Log Likelihood \-49.547 \-40.873
theta 196.421 (128.791)
Akaike Inf. Crit. 109.094 91.745
Note: *p\<0.1; **p\<0.05; ***p\<0.01


Each column of the table corresponds to a different type of regression model as indicated at the top. The raw estimate values of each model are listed with their 95% confidence intervals in parentheses. The p-values are indicated using the number of asterisks beside the estimates. OLS, ordinary least square; Average_Pop_densitykm2, average population density per square kilometer; NO.levels, nitrogen oxide levels; NO2.levels, nitrogen dioxide levels; O3.levels, ozone levels; Akaike Inf. Crit., Akaike’s Information Criteria; Residual Std. Error, Residual standard error.

Aim 2: subregional level analysis

Aim: Analyse the effect of air pollutants on COVID cases and deaths at the regional level

Deaths data

Data curation

pop_dens = read.csv("data/2018_official_popDensity.csv")[c("Code","X2018.people.per.sq..km")]
earnings =read.csv("data/ann_earning_2018_perLA.csv")[c("Code","Mean_ann_earnings")]
age = read.csv("data/processed_median_age_of_population_perLA.csv")[c("Code","median_age_2018","Name")]
covid_deaths = read.csv("data/covid_deaths_until10April_byAreaCode.csv")

covid_deaths$total_deaths = covid_deaths$Home + covid_deaths$Hospital + covid_deaths$Care.home + covid_deaths$Hospice + covid_deaths$Other.communal.establishment + covid_deaths$Elsewhere

covid_deaths = covid_deaths[c("Area.code","total_deaths")]
colnames(covid_deaths) <- c("Code","deaths")

nrow(pop_dens)
## [1] 435
nrow(earnings)
## [1] 422
nrow(age)
## [1] 433
nrow(covid_deaths)
## [1] 346
#merge data
merged_covid_dt_LA = merge(covid_deaths, pop_dens, by = "Code")
merged_covid_dt_LA = merge(merged_covid_dt_LA, earnings, by = "Code")
merged_covid_dt_LA = merge(merged_covid_dt_LA, age, by = "Code")
nrow(merged_covid_dt_LA)
## [1] 334

Now, we will use a python script (in the form of a Jupyter Notebook) to match air pollution data to local authorities.
#### Analysis
##### Data visualisation for model selection
Load data and set correct formats

# load data 
colnames(merged_covid_dt_LA)
## [1] "Code"                    "deaths"                 
## [3] "X2018.people.per.sq..km" "Mean_ann_earnings"      
## [5] "median_age_2018"         "Name"
merged_covid_dt_LA$X2018.people.per.sq..km = as.numeric(gsub(",","",merged_covid_dt_LA$X2018.people.per.sq..km))
merged_covid_dt_LA$Mean_ann_earnings = as.numeric(gsub(",","",merged_covid_dt_LA$Mean_ann_earnings))

write.csv(merged_covid_dt_LA, "data_output/merged_covid_cov_dt_LA.csv")

re-read here

covid_air_dt = read.csv("data_output/merged_covidAir_cov_dt_LA.csv", na.strings = "x")
covid_air_dt$X2018.people.per.sq..km = as.numeric(gsub(",","",covid_air_dt$X2018.people.per.sq..km))
covid_air_dt$Mean_ann_earnings = as.numeric(gsub(",","",covid_air_dt$Mean_ann_earnings))
covid_air_dt$X2018.people.per.sq..km = as.numeric(covid_air_dt$X2018.people.per.sq..km)
covid_air_dt$Mean_ann_earnings = as.numeric(covid_air_dt$Mean_ann_earnings)

Make sure that only England data is included

covid_air_dt = covid_air_dt[startsWith(as.character(covid_air_dt$Code), 'E'),]
nrow(covid_air_dt)
## [1] 312

Visualise all numeric data

covid_air_dt_vis = subset(covid_air_dt, select=c(deaths, X2018.people.per.sq..km, Mean_ann_earnings,median_age_2018,pm25_val,no2_val,o3_val,
                                                 pm10_val,so2_val, nox_val))
covid_air_dt_vis = as.data.frame(sapply(covid_air_dt_vis, function(x) as.numeric(x) ) )
#normalise per column & reshape the data for plotting
covid_air_dt_vis = as.data.frame(sapply(covid_air_dt_vis, scale))

covid_air_dt_vis$Code = covid_air_dt$Code
covid_air_dt_vis_long = melt(covid_air_dt_vis, in.vars = "Code")
## Using Code as id variables
ggplot(covid_air_dt_vis_long, aes (value)) +
  geom_histogram() +
  geom_density() +
  facet_wrap(~variable) + 
  theme_bw() + 
  theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Models

Negative binomial regression model

pm25_deaths.nb <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km + 
                                   Mean_ann_earnings + median_age_2018 + pm25_val)
car::vif(pm25_deaths.nb)
## X2018.people.per.sq..km       Mean_ann_earnings         median_age_2018 
##                2.080989                1.441089                2.135625 
##                pm25_val 
##                1.950479
pm10_deaths.nb <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km + 
                                   Mean_ann_earnings + median_age_2018 + pm10_val)
car::vif(pm10_deaths.nb)
## X2018.people.per.sq..km       Mean_ann_earnings         median_age_2018 
##                2.099725                1.385040                1.950317 
##                pm10_val 
##                1.667899
nox_deaths.nb <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km + 
                                  Mean_ann_earnings + median_age_2018 + nox_val)
car::vif(nox_deaths.nb)
## X2018.people.per.sq..km       Mean_ann_earnings         median_age_2018 
##                2.529754                1.277034                2.128073 
##                 nox_val 
##                2.650417
no2_deaths.nb <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km + 
                                  Mean_ann_earnings + median_age_2018 + no2_val)
car::vif(no2_deaths.nb)
## X2018.people.per.sq..km       Mean_ann_earnings         median_age_2018 
##                2.417227                1.263175                2.314129 
##                 no2_val 
##                2.736928
o3_deaths.nb <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km + 
                                 Mean_ann_earnings + median_age_2018 + o3_val)
car::vif(o3_deaths.nb)
## X2018.people.per.sq..km       Mean_ann_earnings         median_age_2018 
##                2.094533                1.278499                1.827214 
##                  o3_val 
##                1.148228
#summary(so2_deaths.nb <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km + #
#                                  Mean_ann_earnings + median_age_2018 + so2_val))
#car::vif(so2_deaths.nb)

We note that annual earnings in our models is not significant. We therefore proceed to remove this variable.

# glm -earnings, since not significant 
summary(pm25_deaths.nb_red <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km + 
                                    median_age_2018 + pm25_val))
car::vif(pm25_deaths.nb_red)
summary(pm10_deaths.nb_red <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km + 
                                    median_age_2018 + pm10_val))
car::vif(pm10_deaths.nb_red)
summary(nox_deaths.nb_red <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km + 
                                   median_age_2018 + nox_val))
car::vif(nox_deaths.nb_red)
summary(no2_deaths.nb_red <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km + 
                                   median_age_2018 + no2_val))
car::vif(no2_deaths.nb_red)
summary(o3_deaths.nb_red <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km + 
                                  median_age_2018 + o3_val))
car::vif(o3_deaths.nb_red)
summary(so2_deaths.nb_red <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km + 
                                   median_age_2018 + so2_val))
car::vif(so2_deaths.nb_red)


Calculate odds ratios
pm25_deaths.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(pm25_deaths.nb), confint(pm25_deaths.nb))), p_value = summary(pm25_deaths.nb)$coefficients[,4]))
pm10_deaths.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(pm10_deaths.nb), confint(pm10_deaths.nb))), p_value = summary(pm10_deaths.nb)$coefficients[,4]))
nox_deaths.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(nox_deaths.nb), confint(nox_deaths.nb))), p_value = summary(nox_deaths.nb)$coefficients[,4]))
no2_deaths.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(no2_deaths.nb), confint(no2_deaths.nb))), p_value = summary(no2_deaths.nb)$coefficients[,4]))
o3_deaths.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(o3_deaths.nb), confint(o3_deaths.nb))), p_value = summary(o3_deaths.nb)$coefficients[,4]))
#so2_deaths.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(so2_deaths.nb), confint(so2_deaths.nb))), p_value = summary(so2_deaths.nb)$coefficients[,4]))
#make a df just with the pollutants 

LA_covid_onlyPoll = data.frame(rbind(pm25_deaths.nb_mrr[nrow(pm25_deaths.nb_mrr),],
                                      pm10_deaths.nb_mrr[nrow(pm10_deaths.nb_mrr),],
                                      nox_deaths.nb_mrr[nrow(nox_deaths.nb_mrr),],
                                      no2_deaths.nb_mrr[nrow(no2_deaths.nb_mrr),],
                                      o3_deaths.nb_mrr[nrow(o3_deaths.nb_mrr),]))
Plot odds ratios

sort data

LA_covid_onlyPoll$names = row.names(LA_covid_onlyPoll)
LA_covid_onlyPoll$significance = "p-value > 0.05"
LA_covid_onlyPoll$significance[LA_covid_onlyPoll$p_value < 0.05] <- "p-value < 0.05"
write.csv(LA_covid_onlyPoll, "data_output/LA_covid_onlyPoll_preL_AP2018.csv", row.names = F)

plot

ggplot(LA_covid_onlyPoll, aes(x=reorder(names, OR), y=OR, color=significance)) + 
    geom_point(fill="white", shape=21, size = 2) +
    geom_errorbar(aes(ymin=X2.5.., ymax=X97.5..),
                  width=.2,                    # Width of the error bars
                  position=position_dodge(.9)) +
  theme_classic() + 
  geom_hline(yintercept = 1, linetype="dotted") +
  coord_flip()+ylab("Mortality rate ratios") + 
  xlab("Pollutants")+
  ylim(0.75, 1.6)

ggsave('fig_out/LA_deaths_MRR.pdf')

Cases in local authorities

Data curation

cases_LA_raw = read.csv("data/coronavirus-cases_latest-18_5_2020.csv")
cases_LA_dt = subset(cases_LA_raw, Specimen.date == "2020-04-10", select = c(Area.code, Cumulative.lab.confirmed.cases))
cases_LA_dt.agg <-aggregate(cases_LA_dt, by=list(cases_LA_dt$Area.code), 
  FUN=mean, na.rm=TRUE)
nrow(cases_LA_dt.agg)
## [1] 342

Merge cases data to main data

deaths_LA_dt = read.csv("data_output/merged_covidAir_cov_dt_LA.csv", na.strings = "x")
deaths_LA_dt$X2018.people.per.sq..km = gsub(",", "", deaths_LA_dt$X2018.people.per.sq..km)
deaths_LA_dt$X2018.people.per.sq..km = as.numeric(deaths_LA_dt$X2018.people.per.sq..km)
deaths_LA_dt$Mean_ann_earnings = gsub(",", "", deaths_LA_dt$Mean_ann_earnings)
deaths_LA_dt$Mean_ann_earnings = as.numeric(deaths_LA_dt$Mean_ann_earnings)

cases_deaths_LA_dt = merge(deaths_LA_dt, cases_LA_dt.agg, by.x = "Code",by.y = "Group.1")
nrow(cases_deaths_LA_dt)
## [1] 301

There are only 301 observations matched.

Analysis of the cases data

Visualisation

ggplot(data = cases_deaths_LA_dt, aes(x = Cumulative.lab.confirmed.cases))+
  geom_histogram(stat="count")

cases_deaths_LA_dt$X2018.people.per.sq..km = as.numeric(cases_deaths_LA_dt$X2018.people.per.sq..km)
cases_deaths_LA_dt$Mean_ann_earnings = as.numeric(cases_deaths_LA_dt$Mean_ann_earnings)

Model

summary(pm25_cases.nb <- glm.nb(data = cases_deaths_LA_dt, Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km + 
                                   Mean_ann_earnings + median_age_2018 + pm25_val))
## 
## Call:
## glm.nb(formula = Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km + 
##     Mean_ann_earnings + median_age_2018 + pm25_val, data = cases_deaths_LA_dt, 
##     init.theta = 2.724975873, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.8555  -0.8682  -0.3414   0.2772   2.8464  
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              9.593e+00  5.567e-01  17.232  < 2e-16 ***
## X2018.people.per.sq..km  9.626e-05  1.953e-05   4.930 8.21e-07 ***
## Mean_ann_earnings        6.492e-06  5.788e-06   1.122 0.261996    
## median_age_2018         -9.122e-02  1.037e-02  -8.800  < 2e-16 ***
## pm25_val                -9.629e-02  2.770e-02  -3.476 0.000509 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(2.725) family taken to be 1)
## 
##     Null deviance: 562.55  on 294  degrees of freedom
## Residual deviance: 311.82  on 290  degrees of freedom
##   (6 observations deleted due to missingness)
## AIC: 3583.7
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  2.725 
##           Std. Err.:  0.216 
## 
##  2 x log-likelihood:  -3571.745
car::vif(pm25_cases.nb)
## X2018.people.per.sq..km       Mean_ann_earnings         median_age_2018 
##                2.085624                1.427671                2.125130 
##                pm25_val 
##                1.933232
summary(pm10_cases.nb <- glm.nb(data = cases_deaths_LA_dt, Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km + 
                                   Mean_ann_earnings + median_age_2018 + pm10_val))
## 
## Call:
## glm.nb(formula = Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km + 
##     Mean_ann_earnings + median_age_2018 + pm10_val, data = cases_deaths_LA_dt, 
##     init.theta = 2.745559885, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.8501  -0.8823  -0.3287   0.2934   2.9348  
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              9.513e+00  5.216e-01  18.238  < 2e-16 ***
## X2018.people.per.sq..km  1.002e-04  1.956e-05   5.122 3.03e-07 ***
## Mean_ann_earnings        5.806e-06  5.650e-06   1.028 0.304083    
## median_age_2018         -8.743e-02  9.861e-03  -8.866  < 2e-16 ***
## pm10_val                -6.748e-02  1.762e-02  -3.831 0.000128 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(2.7456) family taken to be 1)
## 
##     Null deviance: 566.73  on 294  degrees of freedom
## Residual deviance: 311.67  on 290  degrees of freedom
##   (6 observations deleted due to missingness)
## AIC: 3581.3
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  2.746 
##           Std. Err.:  0.218 
## 
##  2 x log-likelihood:  -3569.297
car::vif(pm10_cases.nb)
## X2018.people.per.sq..km       Mean_ann_earnings         median_age_2018 
##                2.109349                1.370273                1.937009 
##                pm10_val 
##                1.650101
summary(nox_cases.nb <- glm.nb(data = cases_deaths_LA_dt, Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km + 
                                  Mean_ann_earnings + median_age_2018 + nox_val))
## 
## Call:
## glm.nb(formula = Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km + 
##     Mean_ann_earnings + median_age_2018 + nox_val, data = cases_deaths_LA_dt, 
##     init.theta = 2.721556683, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7665  -0.8356  -0.3858   0.3034   3.4862  
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              7.409e+00  4.954e-01  14.956  < 2e-16 ***
## X2018.people.per.sq..km  5.246e-05  2.220e-05   2.363  0.01813 *  
## Mean_ann_earnings       -4.812e-06  5.385e-06  -0.894  0.37145    
## median_age_2018         -5.803e-02  1.051e-02  -5.522 3.35e-08 ***
## nox_val                  1.740e-02  5.344e-03   3.255  0.00113 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(2.7216) family taken to be 1)
## 
##     Null deviance: 561.85  on 294  degrees of freedom
## Residual deviance: 312.05  on 290  degrees of freedom
##   (6 observations deleted due to missingness)
## AIC: 3584.4
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  2.722 
##           Std. Err.:  0.216 
## 
##  2 x log-likelihood:  -3572.364
car::vif(nox_cases.nb)
## X2018.people.per.sq..km       Mean_ann_earnings         median_age_2018 
##                2.692080                1.233985                2.182823 
##                 nox_val 
##                2.905970
summary(no2_cases.nb <- glm.nb(data = cases_deaths_LA_dt, Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km + 
                                  Mean_ann_earnings + median_age_2018 + no2_val))
## 
## Call:
## glm.nb(formula = Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km + 
##     Mean_ann_earnings + median_age_2018 + no2_val, data = cases_deaths_LA_dt, 
##     init.theta = 2.729294735, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7594  -0.8476  -0.3940   0.3071   3.5818  
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              7.190e+00  5.306e-01  13.550  < 2e-16 ***
## X2018.people.per.sq..km  5.698e-05  2.142e-05   2.661  0.00780 ** 
## Mean_ann_earnings       -4.853e-06  5.365e-06  -0.904  0.36574    
## median_age_2018         -5.476e-02  1.094e-02  -5.007 5.52e-07 ***
## no2_val                  3.005e-02  9.141e-03   3.287  0.00101 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(2.7293) family taken to be 1)
## 
##     Null deviance: 563.42  on 294  degrees of freedom
## Residual deviance: 312.03  on 290  degrees of freedom
##   (6 observations deleted due to missingness)
## AIC: 3583.5
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  2.729 
##           Std. Err.:  0.216 
## 
##  2 x log-likelihood:  -3571.468
car::vif(no2_cases.nb)
## X2018.people.per.sq..km       Mean_ann_earnings         median_age_2018 
##                2.512648                1.228516                2.370271 
##                 no2_val 
##                2.910421
summary(o3_cases.nb <- glm.nb(data = cases_deaths_LA_dt, Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km + 
                                 Mean_ann_earnings + median_age_2018 + o3_val))
## 
## Call:
## glm.nb(formula = Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km + 
##     Mean_ann_earnings + median_age_2018 + o3_val, data = cases_deaths_LA_dt, 
##     init.theta = 2.808347379, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9822  -0.8344  -0.3153   0.3226   2.8354  
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              8.206e+00  4.265e-01  19.241  < 2e-16 ***
## X2018.people.per.sq..km  6.994e-05  1.934e-05   3.616 0.000299 ***
## Mean_ann_earnings        7.689e-06  5.480e-06   1.403 0.160572    
## median_age_2018         -6.928e-02  9.456e-03  -7.327 2.36e-13 ***
## o3_val                  -5.259e-02  1.111e-02  -4.733 2.21e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(2.8083) family taken to be 1)
## 
##     Null deviance: 579.47  on 294  degrees of freedom
## Residual deviance: 311.40  on 290  degrees of freedom
##   (6 observations deleted due to missingness)
## AIC: 3574.1
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  2.808 
##           Std. Err.:  0.223 
## 
##  2 x log-likelihood:  -3562.114
car::vif(o3_cases.nb)
## X2018.people.per.sq..km       Mean_ann_earnings         median_age_2018 
##                2.108263                1.318626                1.820923 
##                  o3_val 
##                1.177405
summary(so2_cases.nb <- glm.nb(data = cases_deaths_LA_dt, Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km + 
                                  Mean_ann_earnings + median_age_2018 + so2_val))
## 
## Call:
## glm.nb(formula = Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km + 
##     Mean_ann_earnings + median_age_2018 + so2_val, data = cases_deaths_LA_dt, 
##     init.theta = 2.717269061, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7230  -0.8505  -0.3570   0.3183   3.4351  
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              7.239e+00  5.311e-01  13.629  < 2e-16 ***
## X2018.people.per.sq..km  7.709e-05  1.964e-05   3.925 8.66e-05 ***
## Mean_ann_earnings        2.288e-06  5.451e-06   0.420  0.67462    
## median_age_2018         -6.018e-02  1.036e-02  -5.811 6.20e-09 ***
## so2_val                  2.370e-01  7.467e-02   3.174  0.00151 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(2.7173) family taken to be 1)
## 
##     Null deviance: 560.98  on 294  degrees of freedom
## Residual deviance: 312.07  on 290  degrees of freedom
##   (6 observations deleted due to missingness)
## AIC: 3584.9
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  2.717 
##           Std. Err.:  0.215 
## 
##  2 x log-likelihood:  -3572.864
car::vif(so2_cases.nb)
## X2018.people.per.sq..km       Mean_ann_earnings         median_age_2018 
##                2.104075                1.262575                2.115436 
##                 so2_val 
##                1.525447

Generate the table

stargazer(pm25_cases.nb, pm10_cases.nb, nox_cases.nb, no2_cases.nb,
          o3_cases.nb, so2_cases.nb,type="html",out = "fig_out/LA_covid_allPoll_CASES_nb.html",
          dep.var.labels="Number of COVID-19-related cases",
          single.row=TRUE)
Dependent variable:
Number of COVID-19-related cases
(1) (2) (3) (4) (5) (6)
X2018.people.per.sq..km 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002) 0.0001\*\* (0.00002) 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002)
Mean\_ann\_earnings 0.00001 (0.00001) 0.00001 (0.00001) \-0.00000 (0.00001) \-0.00000 (0.00001) 0.00001 (0.00001) 0.00000 (0.00001)
median\_age\_2018 \-0.091\*\*\* (0.010) \-0.087\*\*\* (0.010) \-0.058\*\*\* (0.011) \-0.055\*\*\* (0.011) \-0.069\*\*\* (0.009) \-0.060\*\*\* (0.010)
pm25\_val \-0.096\*\*\* (0.028)
pm10\_val \-0.067\*\*\* (0.018)
nox\_val 0.017\*\*\* (0.005)
no2\_val 0.030\*\*\* (0.009)
o3\_val \-0.053\*\*\* (0.011)
so2\_val 0.237\*\*\* (0.075)
Constant 9.593\*\*\* (0.557) 9.513\*\*\* (0.522) 7.409\*\*\* (0.495) 7.190\*\*\* (0.531) 8.206\*\*\* (0.426) 7.239\*\*\* (0.531)
Observations 295 295 295 295 295 295
Log Likelihood \-1,786.872 \-1,785.649 \-1,787.182 \-1,786.734 \-1,782.057 \-1,787.432
theta 2.725\*\*\* (0.216) 2.746\*\*\* (0.218) 2.722\*\*\* (0.216) 2.729\*\*\* (0.216) 2.808\*\*\* (0.223) 2.717\*\*\* (0.215)
Akaike Inf. Crit. 3,583.745 3,581.297 3,584.364 3,583.468 3,574.114 3,584.864
Note: *p\<0.1; **p\<0.05; ***p\<0.01
Calculate odds ratios
pm25_cases.nb_irr = data.frame(cbind(exp(cbind(OR = coef(pm25_cases.nb), confint(pm25_cases.nb))), p_value = summary(pm25_cases.nb)$coefficients[,4]))
pm10_cases.nb_irr = data.frame(cbind(exp(cbind(OR = coef(pm10_cases.nb), confint(pm10_cases.nb))), p_value = summary(pm10_cases.nb)$coefficients[,4]))
nox_cases.nb_irr = data.frame(cbind(exp(cbind(OR = coef(nox_cases.nb), confint(nox_cases.nb))), p_value = summary(nox_cases.nb)$coefficients[,4]))
no2_cases.nb_irr = data.frame(cbind(exp(cbind(OR = coef(no2_cases.nb), confint(no2_cases.nb))), p_value = summary(no2_cases.nb)$coefficients[,4]))
o3_cases.nb_irr = data.frame(cbind(exp(cbind(OR = coef(o3_cases.nb), confint(o3_cases.nb))), p_value = summary(o3_cases.nb)$coefficients[,4]))
so2_cases.nb_irr = data.frame(cbind(exp(cbind(OR = coef(so2_cases.nb), confint(so2_cases.nb))), p_value = summary(so2_cases.nb)$coefficients[,4]))
#make a df just with the pollutants 

LA_covid_CASES_onlyPoll = data.frame(rbind(pm25_cases.nb_irr[nrow(pm25_cases.nb_irr),],
                                      pm10_cases.nb_irr[nrow(pm10_cases.nb_irr),],
                                      nox_cases.nb_irr[nrow(nox_cases.nb_irr),],
                                      no2_cases.nb_irr[nrow(no2_cases.nb_irr),],
                                      o3_cases.nb_irr[nrow(o3_cases.nb_irr),],
                                      so2_cases.nb_irr[nrow(so2_cases.nb_irr),]))
Plot odds ratios

sort data

LA_covid_CASES_onlyPoll$names = row.names(LA_covid_CASES_onlyPoll)
LA_covid_CASES_onlyPoll$significance = "p-value > 0.05"
LA_covid_CASES_onlyPoll$significance[LA_covid_CASES_onlyPoll$p_value < 0.05] <- "p-value < 0.05"

write.csv(LA_covid_CASES_onlyPoll, "data_output/LA_covid_CASES_onlyPoll_preL_2018AP.csv", row.names = F)

plot

ggplot(LA_covid_CASES_onlyPoll, aes(x=reorder(names, OR), y=OR, color=significance)) + 
    geom_point(fill="white", shape=21, size = 2) +
    geom_errorbar(aes(ymin=X2.5.., ymax=X97.5..),
                  width=.2,                    # Width of the error bars
                  position=position_dodge(.9)) +
  theme_classic() + 
  geom_hline(yintercept = 1, linetype="dotted") +
  coord_flip()+ylab("Infectivity rate ratios") + 
  xlab("Pollutants")+
  ylim(0.75, 1.6)

ggsave('fig_out/LA_CASES_odds_IRR.pdf')

Save the numbers

library(stargazer)
stargazer(LA_covid_CASES_onlyPoll, summary=FALSE ,type="html",out 
          ="fig_out/LA_covid_infectivity_IRR.html")
OR X2.5.. X97.5.. p\_value names significance
pm25\_val 0.908 0.860 0.958 0.001 pm25\_val p-value \< 0.05
pm10\_val 0.935 0.903 0.967 0.0001 pm10\_val p-value \< 0.05
nox\_val 1.018 1.008 1.028 0.001 nox\_val p-value \< 0.05
no2\_val 1.031 1.014 1.047 0.001 no2\_val p-value \< 0.05
o3\_val 0.949 0.928 0.970 0.00000 o3\_val p-value \< 0.05
so2\_val 1.267 1.103 1.459 0.002 so2\_val p-value \< 0.05

Summary tables of models in Aim 2

Add
#### Supplementary Table 4. Effect of air pollutants on the number of COVID-related cases at the subregional level Add

stargazer(pm25_cases.nb, pm10_cases.nb, nox_cases.nb, no2_cases.nb,
          o3_cases.nb, so2_cases.nb,type="html",out = "fig_out/LA_covid_allPoll_CASES_nb.html",
          dep.var.labels="Number of COVID-19-related cases",
          single.row=TRUE)
Dependent variable:
Number of COVID-19-related cases
(1) (2) (3) (4) (5) (6)
X2018.people.per.sq..km 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002) 0.0001\*\* (0.00002) 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002)
Mean\_ann\_earnings 0.00001 (0.00001) 0.00001 (0.00001) \-0.00000 (0.00001) \-0.00000 (0.00001) 0.00001 (0.00001) 0.00000 (0.00001)
median\_age\_2018 \-0.091\*\*\* (0.010) \-0.087\*\*\* (0.010) \-0.058\*\*\* (0.011) \-0.055\*\*\* (0.011) \-0.069\*\*\* (0.009) \-0.060\*\*\* (0.010)
pm25\_val \-0.096\*\*\* (0.028)
pm10\_val \-0.067\*\*\* (0.018)
nox\_val 0.017\*\*\* (0.005)
no2\_val 0.030\*\*\* (0.009)
o3\_val \-0.053\*\*\* (0.011)
so2\_val 0.237\*\*\* (0.075)
Constant 9.593\*\*\* (0.557) 9.513\*\*\* (0.522) 7.409\*\*\* (0.495) 7.190\*\*\* (0.531) 8.206\*\*\* (0.426) 7.239\*\*\* (0.531)
Observations 295 295 295 295 295 295
Log Likelihood \-1,786.872 \-1,785.649 \-1,787.182 \-1,786.734 \-1,782.057 \-1,787.432
theta 2.725\*\*\* (0.216) 2.746\*\*\* (0.218) 2.722\*\*\* (0.216) 2.729\*\*\* (0.216) 2.808\*\*\* (0.223) 2.717\*\*\* (0.215)
Akaike Inf. Crit. 3,583.745 3,581.297 3,584.364 3,583.468 3,574.114 3,584.864
Note: *p\<0.1; **p\<0.05; ***p\<0.01


The value in parentheses represent the standard error.
Summary of the effects of individual air pollutant species on lab-confirmed COVID-related cases up to April 10th 2020. The three asterisks near the theta indicate that the models are significantly better than null.

Add

stargazer(pm25_deaths.nb, pm10_deaths.nb, nox_deaths.nb, no2_deaths.nb,
          o3_deaths.nb,type="html",out = "fig_out/LA_covid_allPoll_nb.html",
          dep.var.labels="Number of COVID-19-related deaths",
          single.row=TRUE)
Dependent variable:
Number of COVID-19-related deaths
(1) (2) (3) (4) (5)
X2018.people.per.sq..km 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002)
Mean\_ann\_earnings 0.00000 (0.00001) 0.00001 (0.00001) \-0.00000 (0.00001) \-0.00000 (0.00001) 0.00001 (0.00001)
median\_age\_2018 \-0.079\*\*\* (0.012) \-0.079\*\*\* (0.011) \-0.055\*\*\* (0.012) \-0.050\*\*\* (0.012) \-0.067\*\*\* (0.011)
pm25\_val \-0.039 (0.031)
pm10\_val \-0.036\* (0.020)
nox\_val 0.016\*\*\* (0.006)
no2\_val 0.030\*\*\* (0.010)
o3\_val \-0.037\*\*\* (0.013)
Constant 6.988\*\*\* (0.635) 7.103\*\*\* (0.597) 5.541\*\*\* (0.542) 5.268\*\*\* (0.580) 6.323\*\*\* (0.489)
Observations 306 306 306 306 306
Log Likelihood \-1,395.565 \-1,394.788 \-1,392.433 \-1,391.427 \-1,392.320
theta 2.099\*\*\* (0.171) 2.109\*\*\* (0.171) 2.143\*\*\* (0.175) 2.157\*\*\* (0.176) 2.142\*\*\* (0.175)
Akaike Inf. Crit. 2,801.130 2,799.577 2,794.865 2,792.853 2,794.640
Note: *p\<0.1; **p\<0.05; ***p\<0.01


Summary of the effects of individual air pollutant species on lab-confirmed COVID-related deaths up to April 10th 2020. The three asterisks near the theta indicate that the models are significantly better than null. The MRR (mortality rate ratios) of O3 is interesting: more O3 = ~3% less deaths. NOx, NO2 and SO2 are all significant predictors; they will increase the number of deaths by 1.3, 2.4 and 17.2% respectively, per 1 µg/m3.

Analysis using the 5-year average of air pollution values

Average PCM data for all pollutants

NO2

no2_2014 = read.csv("data/raw_mapno22014.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'no22014')]
no2_2015 = read.csv("data/raw_mapno22015.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'no22015')]
no2_2016 = read.csv("data/raw_mapno22016.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'no22016')]
no2_2017 = read.csv("data/raw_mapno22017.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'no22017')]
no2_2018 = read.csv("data/raw_mapno22018.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'no22018')]
no2_5y = Reduce(merge, list(no2_2014,no2_2015,no2_2016,no2_2017,no2_2018))
no2_5y$no2_5yAvg = (no2_5y$no22014+no2_5y$no22015+no2_5y$no22016+no2_5y$no22017+no2_5y$no22018)/5
no2_5y = no2_5y[ ,c('ukgridcode', 'no22018','no2_5yAvg')]

NOX

nox_2014 = read.csv("data/raw_mapnox2014.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'nox2014')]
nox_2015 = read.csv("data/raw_mapnox2015.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'nox2015')]
nox_2016 = read.csv("data/raw_mapnox2016.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'nox2016')]
nox_2017 = read.csv("data/raw_mapnox2017.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'nox2017')]
nox_2018 = read.csv("data/raw_mapnox2018.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'nox2018')]
nox_5y = Reduce(merge, list(nox_2014,nox_2015,nox_2016,nox_2017,nox_2018))
nox_5y$nox_5yAvg = (nox_5y$nox2014+nox_5y$nox2015+nox_5y$nox2016+nox_5y$nox2017+nox_5y$nox2018)/5
nox_5y = nox_5y[ ,c('ukgridcode', 'nox2018','nox_5yAvg')]

PM25

pm25_2014 = read.csv("data/raw_mappm252014.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'pm252014g')]
pm25_2015 = read.csv("data/raw_mappm252015.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'pm252015g')]
pm25_2016 = read.csv("data/raw_mappm252016.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'pm252016g')]
pm25_2017 = read.csv("data/raw_mappm252017.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'pm252017g')]
pm25_2018 = read.csv("data/raw_mappm252018.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'pm252018g')]
pm25_5y = Reduce(merge, list(pm25_2014,pm25_2015,pm25_2016,pm25_2017,pm25_2018))
pm25_5y$pm25_5yAvg = (pm25_5y$pm252014g+pm25_5y$pm252015g+pm25_5y$pm252016g+pm25_5y$pm252017g+pm25_5y$pm252018g)/5
pm25_5y = pm25_5y[ ,c('ukgridcode', 'pm252018g','pm25_5yAvg')]

PM10

pm10_2014 = read.csv("data/raw_mappm102014.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'pm102014g')]
pm10_2015 = read.csv("data/raw_mappm102015.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'pm102015g')]
pm10_2016 = read.csv("data/raw_mappm102016.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'pm102016g')]
pm10_2017 = read.csv("data/raw_mappm102017.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'pm102017g')]
pm10_2018 = read.csv("data/raw_mappm102018.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'pm102018g')]
pm10_5y = Reduce(merge, list(pm10_2014,pm10_2015,pm10_2016,pm10_2017,pm10_2018))
pm10_5y$pm10_5yAvg = (pm10_5y$pm102014g+pm10_5y$pm102015g+pm10_5y$pm102016g+pm10_5y$pm102017g+pm10_5y$pm102018g)/5
pm10_5y = pm10_5y[ ,c('ukgridcode', 'pm102018g','pm10_5yAvg')]

O3

o3_2014 = read.csv("data/raw_mapo312014.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'dgt12014')]
o3_2015 = read.csv("data/raw_mapo312015.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'dgt12015')]
o3_2016 = read.csv("data/raw_mapo312016.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'dgt12016')]
o3_2017 = read.csv("data/raw_mapo32017.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'dgt12017')]
o3_2018 = read.csv("data/raw_mapo312018.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'dgt12018')]
o3_5y = Reduce(merge, list(o3_2014,o3_2015,o3_2016,o3_2017,o3_2018))
o3_5y$o3_5yAvg = (o3_5y$dgt12014+o3_5y$dgt12015+o3_5y$dgt12016+o3_5y$dgt12017+o3_5y$dgt12018)/5
o3_5y = o3_5y[ ,c('ukgridcode', 'dgt12018','o3_5yAvg')]
Create merged df with 2018 and 5yAvg
ap_dt = Reduce(merge, list(no2_5y,nox_5y,pm25_5y,pm10_5y,o3_5y))
write.csv(ap_dt, "data_output/5Yaverage_AP_PCMdata.csv", row.names = F)
write.csv(na.omit(ap_dt), "data_output/5Yaverage_AP_PCMdata_na.csv", row.names = F)

Convert the XY coordinates to lon-lat

ap_dt = read.csv("data_output/5Yaverage_AP_PCMdata_na.csv")

# Add X Y coordinates from any of the data 

o3_2014 = read.csv("data/raw_mapo312014.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'x','y')]
ap_dt = merge(ap_dt,o3_2014, by = "ukgridcode")
ap_dt_out = ap_dt[ , -which(names(ap_dt) %in% c("x","y"))]
### convert to lon lat ----

# transform: easting/northing -> long/lat
### shortcuts (from https://stephendavidgregory.github.io/useful/UKgrid_to_LatLon)
ukgrid <- "+init=epsg:27700"
latlong <- "+init=epsg:4326"

### Create coordinates variable
ap_coords <- cbind(Easting = as.numeric(as.character(ap_dt$x)),
                Northing = as.numeric(as.character(ap_dt$y)))

### Create the SpatialPointsDataFrame
ap_SP <- SpatialPointsDataFrame(ap_coords,
                                 data = ap_dt,
                                 proj4string = CRS("+init=epsg:27700"))

### Convert
ap_LL <- spTransform(ap_SP, CRS(latlong))

ap_ll_df = data.frame('lon' = coordinates(ap_LL)[, 1], 'lat' = coordinates(ap_LL)[, 2], 'ukgridcode' = ap_LL$ukgridcode)

ap_dt_ll = merge(ap_dt_out,ap_ll_df, by = "ukgridcode")

write.csv(ap_dt_ll,"data_output/processed_allAP_lonlat.csv", row.names = F)

Model of Mortality based on 5YA, pre-lockdown

##### Data curation

avgAP_dt = read.csv("data_output/merged_covidAir_cov_LA_5YA.csv", na.strings = "x")

avgAP_dt$X2018.people.per.sq..km = as.numeric(gsub(",","",avgAP_dt$X2018.people.per.sq..km))
avgAP_dt$Mean_ann_earnings = as.numeric(gsub(",","",avgAP_dt$Mean_ann_earnings))
avgAP_dt$X2018.people.per.sq..km = as.numeric(avgAP_dt$X2018.people.per.sq..km)
avgAP_dt$Mean_ann_earnings = as.numeric(avgAP_dt$Mean_ann_earnings)

Additional variables for 5YA

avgEarnings_2014 = na.omit(read.csv("data_output/avgEarnings_2014.csv", na.strings = "x")[,c("Code","avgEarn_2014")])
avgEarnings_2015 = na.omit(read.csv("data_output/avgEarnings_2015.csv", na.strings = "x")[,c("Code","avgEarn_2015")])
avgEarnings_2016 = na.omit(read.csv("data_output/avgEarnings_2016.csv", na.strings = "x")[,c("Code","avgEarn_2016")])
avgEarnings_2017 = na.omit(read.csv("data_output/avgEarnings_2017.csv", na.strings = "x")[,c("Code","avgEarn_2017")])
avgEarnings_2018 = na.omit(read.csv("data_output/avgEarnings_2018.csv", na.strings = "x")[,c("Code","avgEarn_2018")])

avgEarnings_5ya = Reduce(merge, list(avgEarnings_2014,avgEarnings_2015,avgEarnings_2016,avgEarnings_2017,avgEarnings_2018))

avgEarnings_5ya$avgEarn_2014 = as.numeric(gsub(",","",avgEarnings_5ya$avgEarn_2014))
avgEarnings_5ya$avgEarn_2015 = as.numeric(gsub(",","",avgEarnings_5ya$avgEarn_2015))
avgEarnings_5ya$avgEarn_2016 = as.numeric(gsub(",","",avgEarnings_5ya$avgEarn_2016))
avgEarnings_5ya$avgEarn_2017 = as.numeric(gsub(",","",avgEarnings_5ya$avgEarn_2017))
avgEarnings_5ya$avgEarn_2018 = as.numeric(gsub(",","",avgEarnings_5ya$avgEarn_2018))

avgEarnings_5ya$earnings_5ya = (avgEarnings_5ya$avgEarn_2014 + avgEarnings_5ya$avgEarn_2015 + avgEarnings_5ya$avgEarn_2016 + avgEarnings_5ya$avgEarn_2017 + avgEarnings_5ya$avgEarn_2018)/5

avgEarnings_5ya = avgEarnings_5ya[,c("Code","earnings_5ya")]

Pop_density_5ya = read.csv("data/Pop_density_full_dt.csv")[,c("Code","pop_dens_5ya")]

age_5ya = read.csv("data_output/age_5ya.csv")[,c("Code","age_5ya")]

covariates_5ya = Reduce(merge, list(avgEarnings_5ya,Pop_density_5ya,age_5ya))
#avgAP_dt = merge(avgAP_dt, covariates_5ya, by = "Code")
avgAP_dt = merge(avgAP_dt, covariates_5ya[,-3], by = "Code")
avgAP_dt$pop_dens_5ya = as.numeric(gsub(",","",avgAP_dt$pop_dens_5ya))
write.csv(avgAP_dt,"data_output/merged_covidAir_cov_LA_5YA.csv", row.names = F)
summary(pm25_deaths_5YA.nb <- glm.nb(data = avgAP_dt, deaths ~ pop_dens_5ya +
                                   earnings_5ya + age_5ya + pm25_5yAvg))
## 
## Call:
## glm.nb(formula = deaths ~ pop_dens_5ya + earnings_5ya + age_5ya + 
##     pm25_5yAvg, data = avgAP_dt, init.theta = 2.228067951, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.5294  -0.8876  -0.3636   0.2695   3.3767  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   6.390e+00  5.943e-01  10.752  < 2e-16 ***
## pop_dens_5ya  9.994e-05  2.273e-05   4.397 1.10e-05 ***
## earnings_5ya -2.086e-07  6.848e-06  -0.030    0.976    
## age_5ya      -7.057e-02  1.204e-02  -5.861 4.59e-09 ***
## pm25_5yAvg    7.720e-03  1.938e-02   0.398    0.690    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(2.2281) family taken to be 1)
## 
##     Null deviance: 487.47  on 282  degrees of freedom
## Residual deviance: 300.11  on 278  degrees of freedom
## AIC: 2612.1
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  2.228 
##           Std. Err.:  0.188 
## 
##  2 x log-likelihood:  -2600.111
car::vif(pm25_deaths_5YA.nb)
## pop_dens_5ya earnings_5ya      age_5ya   pm25_5yAvg 
##     2.142889     1.189121     1.982406     1.134312
summary(pm10_deaths_5YA.nb <- glm.nb(data = avgAP_dt, deaths ~ pop_dens_5ya +
                                   earnings_5ya + age_5ya + pm10_5yAvg))
## 
## Call:
## glm.nb(formula = deaths ~ pop_dens_5ya + earnings_5ya + age_5ya + 
##     pm10_5yAvg, data = avgAP_dt, init.theta = 2.227020703, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.5260  -0.8855  -0.3567   0.2658   3.3505  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   6.456e+00  5.880e-01  10.980  < 2e-16 ***
## pop_dens_5ya  9.992e-05  2.276e-05   4.390 1.13e-05 ***
## earnings_5ya  2.476e-08  6.845e-06   0.004    0.997    
## age_5ya      -7.137e-02  1.193e-02  -5.983 2.20e-09 ***
## pm10_5yAvg    2.242e-03  1.366e-02   0.164    0.870    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(2.227) family taken to be 1)
## 
##     Null deviance: 487.25  on 282  degrees of freedom
## Residual deviance: 300.11  on 278  degrees of freedom
## AIC: 2612.2
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  2.227 
##           Std. Err.:  0.187 
## 
##  2 x log-likelihood:  -2600.239
car::vif(pm10_deaths_5YA.nb)
## pop_dens_5ya earnings_5ya      age_5ya   pm10_5yAvg 
##     2.148181     1.187764     1.945466     1.101626
summary(nox_deaths_5YA.nb <- glm.nb(data = avgAP_dt, deaths ~ pop_dens_5ya +
                                   earnings_5ya + age_5ya + nox_5yAvg))
## 
## Call:
## glm.nb(formula = deaths ~ pop_dens_5ya + earnings_5ya + age_5ya + 
##     nox_5yAvg, data = avgAP_dt, init.theta = 2.330563559, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.4959  -0.8776  -0.3217   0.2737   3.8186  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   5.264e+00  5.935e-01   8.870  < 2e-16 ***
## pop_dens_5ya  8.440e-05  2.331e-05   3.621 0.000293 ***
## earnings_5ya  2.352e-06  6.711e-06   0.350 0.726007    
## age_5ya      -5.045e-02  1.244e-02  -4.055    5e-05 ***
## nox_5yAvg     1.469e-02  4.179e-03   3.515 0.000440 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(2.3306) family taken to be 1)
## 
##     Null deviance: 508.44  on 282  degrees of freedom
## Residual deviance: 299.44  on 278  degrees of freedom
## AIC: 2598.9
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  2.331 
##           Std. Err.:  0.198 
## 
##  2 x log-likelihood:  -2586.947
car::vif(nox_deaths_5YA.nb)
## pop_dens_5ya earnings_5ya      age_5ya    nox_5yAvg 
##     2.352705     1.191368     2.207728     1.827694
summary(no2_deaths_5YA.nb <- glm.nb(data = avgAP_dt, deaths ~ pop_dens_5ya +
                                   earnings_5ya + age_5ya + no2_5yAvg))
## 
## Call:
## glm.nb(formula = deaths ~ pop_dens_5ya + earnings_5ya + age_5ya + 
##     no2_5yAvg, data = avgAP_dt, init.theta = 2.327684382, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.5071  -0.8829  -0.3292   0.2712   3.8454  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   5.232e+00  6.097e-01   8.582  < 2e-16 ***
## pop_dens_5ya  8.883e-05  2.288e-05   3.882 0.000103 ***
## earnings_5ya  2.010e-06  6.712e-06   0.299 0.764600    
## age_5ya      -5.038e-02  1.261e-02  -3.995 6.47e-05 ***
## no2_5yAvg     2.384e-02  7.044e-03   3.385 0.000713 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(2.3277) family taken to be 1)
## 
##     Null deviance: 507.85  on 282  degrees of freedom
## Residual deviance: 299.49  on 278  degrees of freedom
## AIC: 2599.3
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  2.328 
##           Std. Err.:  0.197 
## 
##  2 x log-likelihood:  -2587.340
car::vif(no2_deaths_5YA.nb)
## pop_dens_5ya earnings_5ya      age_5ya    no2_5yAvg 
##     2.263950     1.190001     2.265576     1.747459
summary(o3_deaths_5YA.nb <- glm.nb(data = avgAP_dt, deaths ~ pop_dens_5ya +
                                   earnings_5ya + age_5ya + o3_5yAvg))
## 
## Call:
## glm.nb(formula = deaths ~ pop_dens_5ya + earnings_5ya + age_5ya + 
##     o3_5yAvg, data = avgAP_dt, init.theta = 2.353153533, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.3590  -0.8889  -0.3458   0.3287   3.3772  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   6.250e+00  5.195e-01  12.031  < 2e-16 ***
## pop_dens_5ya  7.587e-05  2.285e-05   3.320 0.000899 ***
## earnings_5ya  9.901e-06  6.856e-06   1.444 0.148705    
## age_5ya      -5.912e-02  1.171e-02  -5.047 4.49e-07 ***
## o3_5yAvg     -1.793e-01  4.201e-02  -4.268 1.97e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(2.3532) family taken to be 1)
## 
##     Null deviance: 513.04  on 282  degrees of freedom
## Residual deviance: 299.22  on 278  degrees of freedom
## AIC: 2596.1
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  2.353 
##           Std. Err.:  0.200 
## 
##  2 x log-likelihood:  -2584.060
car::vif(o3_deaths_5YA.nb)
## pop_dens_5ya earnings_5ya      age_5ya     o3_5yAvg 
##     2.282899     1.257509     1.969455     1.311695
stargazer(pm25_deaths_5YA.nb, pm10_deaths_5YA.nb, nox_deaths_5YA.nb, no2_deaths_5YA.nb,o3_deaths_5YA.nb,type="html",out = "fig_out/LA_covid_allPoll_DEATHS_5YA_nb.html",
          dep.var.labels="Number of COVID-19-related deaths, 5-year average",
          single.row=TRUE)
Dependent variable:
Number of COVID-19-related deaths, 5-year average
(1) (2) (3) (4) (5)
pop\_dens\_5ya 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002)
earnings\_5ya \-0.00000 (0.00001) 0.00000 (0.00001) 0.00000 (0.00001) 0.00000 (0.00001) 0.00001 (0.00001)
age\_5ya \-0.071\*\*\* (0.012) \-0.071\*\*\* (0.012) \-0.050\*\*\* (0.012) \-0.050\*\*\* (0.013) \-0.059\*\*\* (0.012)
pm25\_5yAvg 0.008 (0.019)
pm10\_5yAvg 0.002 (0.014)
nox\_5yAvg 0.015\*\*\* (0.004)
no2\_5yAvg 0.024\*\*\* (0.007)
o3\_5yAvg \-0.179\*\*\* (0.042)
Constant 6.390\*\*\* (0.594) 6.456\*\*\* (0.588) 5.264\*\*\* (0.594) 5.232\*\*\* (0.610) 6.250\*\*\* (0.519)
Observations 283 283 283 283 283
Log Likelihood \-1,301.056 \-1,301.119 \-1,294.474 \-1,294.670 \-1,293.030
theta 2.228\*\*\* (0.188) 2.227\*\*\* (0.187) 2.331\*\*\* (0.198) 2.328\*\*\* (0.197) 2.353\*\*\* (0.200)
Akaike Inf. Crit. 2,612.111 2,612.239 2,598.947 2,599.340 2,596.060
Note: *p\<0.1; **p\<0.05; ***p\<0.01
Plot odds ratios
pm25_deaths_5YA.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(pm25_deaths_5YA.nb), confint(pm25_deaths_5YA.nb))), p_value = summary(pm25_deaths_5YA.nb)$coefficients[,4]))
pm10_deaths_5YA.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(pm10_deaths_5YA.nb), confint(pm10_deaths_5YA.nb))), p_value = summary(pm10_deaths_5YA.nb)$coefficients[,4]))
nox_deaths_5YA.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(nox_deaths_5YA.nb), confint(nox_deaths_5YA.nb))), p_value = summary(nox_deaths_5YA.nb)$coefficients[,4]))
no2_deaths_5YA.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(no2_deaths_5YA.nb), confint(no2_deaths_5YA.nb))), p_value = summary(no2_deaths_5YA.nb)$coefficients[,4]))
o3_deaths_5YA.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(o3_deaths_5YA.nb), confint(o3_deaths_5YA.nb))), p_value = summary(o3_deaths_5YA.nb)$coefficients[,4]))
#make a df just with the pollutants 

LA_covid_DEATHS_onlyPoll_5YA = data.frame(rbind(pm25_deaths_5YA.nb_mrr[nrow(pm25_deaths_5YA.nb_mrr),],
                pm10_deaths_5YA.nb_mrr[nrow(pm10_deaths_5YA.nb_mrr),],
                nox_deaths_5YA.nb_mrr[nrow(nox_deaths_5YA.nb_mrr),],
                no2_deaths_5YA.nb_mrr[nrow(no2_deaths_5YA.nb_mrr),],
                o3_deaths_5YA.nb_mrr[nrow(o3_deaths_5YA.nb_mrr),]))

sort data

LA_covid_DEATHS_onlyPoll_5YA$names = row.names(LA_covid_DEATHS_onlyPoll_5YA)
LA_covid_DEATHS_onlyPoll_5YA$significance = "p-value > 0.05"
LA_covid_DEATHS_onlyPoll_5YA$significance[LA_covid_DEATHS_onlyPoll_5YA$p_value < 0.05] <- "p-value < 0.05"
write.csv(LA_covid_DEATHS_onlyPoll_5YA,"data_output/LA_covid_DEATHS_onlyPoll_5YA.csv", row.names = F)

plot

ggplot(LA_covid_DEATHS_onlyPoll_5YA, aes(x=reorder(names, OR), y=OR, color=significance)) + 
    geom_point(fill="white", shape=21, size = 2) +
    geom_errorbar(aes(ymin=X2.5.., ymax=X97.5..),
                  width=.2,                    # Width of the error bars
                  position=position_dodge(.9)) +
  geom_hline(yintercept = 1, linetype="dotted") +
  coord_flip()+ylab("Mortality rate ratios, pre-lockdown") + 
  xlab("Pollutants, 5 year average (2014-2018)") + theme_classic()

ggsave('fig_out/LA_CASES_odds_MRR_5YA.pdf')

Infectivity analysis with 5YA air pollution data, pre lockdown

cases_LA_raw = read.csv("data/coronavirus-cases_latest-18_5_2020.csv")
cases_LA_dt = subset(cases_LA_raw, Specimen.date == "2020-04-10", select = c(Area.code, Cumulative.lab.confirmed.cases))
library(dplyr)
cases_LA_dt = cases_LA_dt %>% 
group_by(Area.code) %>% 
summarise(cases = mean(Cumulative.lab.confirmed.cases))
## `summarise()` ungrouping output (override with `.groups` argument)
nrow(cases_LA_dt)
## [1] 342
Model
avgAP_dt = read.csv("data_output/merged_covidAir_cov_LA_5YA.csv", na.strings = "x")
avgAP_dt$pop_dens_5ya = as.numeric(gsub(",","",avgAP_dt$pop_dens_5ya))
avgAP_dt$X2018.people.per.sq..km = as.numeric(gsub(",","",avgAP_dt$X2018.people.per.sq..km))
avgAP_dt$Mean_ann_earnings = as.numeric(gsub(",","",avgAP_dt$Mean_ann_earnings))
avgAP_dt$X2018.people.per.sq..km = as.numeric(avgAP_dt$X2018.people.per.sq..km)
avgAP_dt$Mean_ann_earnings = as.numeric(avgAP_dt$Mean_ann_earnings)
#avgAP_dt = merge(avgAP_dt,cases_LA_dt, by.x = "Code",by.y = "Area.code")
#write.csv(avgAP_dt,"data_output/merged_covidAir_cov_LA_5YA.csv", row.names = F)
library(MASS)
pm25_cases_5YA.nb <- glm.nb(data = avgAP_dt, cases ~ pop_dens_5ya +
                                   earnings_5ya + age_5ya + pm25_5yAvg)
car::vif(pm25_cases_5YA.nb)
## pop_dens_5ya earnings_5ya      age_5ya   pm25_5yAvg 
##     2.127469     1.183202     1.976028     1.133826
pm10_cases_5YA.nb <- glm.nb(data = avgAP_dt, cases ~ pop_dens_5ya +
                                   earnings_5ya + age_5ya + pm10_5yAvg)
car::vif(pm10_cases_5YA.nb)
## pop_dens_5ya earnings_5ya      age_5ya   pm10_5yAvg 
##     2.132595     1.181741     1.939209     1.100561
nox_cases_5YA.nb <- glm.nb(data = avgAP_dt, cases ~ pop_dens_5ya +
                                   earnings_5ya + age_5ya + nox_5yAvg)
car::vif(nox_cases_5YA.nb)
## pop_dens_5ya earnings_5ya      age_5ya    nox_5yAvg 
##     2.334119     1.184576     2.204639     1.828731
no2_cases_5YA.nb <- glm.nb(data = avgAP_dt, cases ~ pop_dens_5ya +
                                   earnings_5ya + age_5ya + no2_5yAvg)
car::vif(no2_cases_5YA.nb)
## pop_dens_5ya earnings_5ya      age_5ya    no2_5yAvg 
##     2.246629     1.183251     2.261649     1.748043
o3_cases_5YA.nb <- glm.nb(data = avgAP_dt, cases ~ pop_dens_5ya +
                                   earnings_5ya + age_5ya + o3_5yAvg)
car::vif(o3_cases_5YA.nb)
## pop_dens_5ya earnings_5ya      age_5ya     o3_5yAvg 
##     2.261263     1.250229     1.966950     1.314072
stargazer(pm25_cases_5YA.nb, pm10_cases_5YA.nb, nox_cases_5YA.nb, no2_cases_5YA.nb,o3_cases_5YA.nb,type="html",out = "fig_out/LA_covid_allPoll_CASES_5YA_nb.html",
          dep.var.labels="Number of COVID-19-related cases, pre-lockdown, 5-year averages",
          single.row=TRUE)
Dependent variable:
Number of COVID-19-related cases, pre-lockdown, 5-year averages
(1) (2) (3) (4) (5)
pop\_dens\_5ya 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002) 0.00005\*\* (0.00002)
earnings\_5ya \-0.00000 (0.00001) \-0.00000 (0.00001) \-0.00000 (0.00001) \-0.00000 (0.00001) 0.00001 (0.00001)
age\_5ya \-0.082\*\*\* (0.011) \-0.082\*\*\* (0.011) \-0.062\*\*\* (0.011) \-0.062\*\*\* (0.012) \-0.063\*\*\* (0.010)
pm25\_5yAvg \-0.028 (0.018)
pm10\_5yAvg \-0.023\* (0.012)
nox\_5yAvg 0.012\*\*\* (0.004)
no2\_5yAvg 0.019\*\*\* (0.006)
o3\_5yAvg \-0.228\*\*\* (0.037)
Constant 8.855\*\*\* (0.540) 8.900\*\*\* (0.533) 7.538\*\*\* (0.544) 7.527\*\*\* (0.559) 8.194\*\*\* (0.460)
Observations 283 283 283 283 283
Log Likelihood \-1,729.358 \-1,728.939 \-1,725.318 \-1,725.733 \-1,713.342
theta 2.598\*\*\* (0.209) 2.605\*\*\* (0.210) 2.667\*\*\* (0.215) 2.660\*\*\* (0.215) 2.882\*\*\* (0.234)
Akaike Inf. Crit. 3,468.715 3,467.878 3,460.637 3,461.467 3,436.684
Note: *p\<0.1; **p\<0.05; ***p\<0.01
Plot odds ratios
pm25_cases_5YA.nb_irr = data.frame(cbind(exp(cbind(OR = coef(pm25_cases_5YA.nb), confint(pm25_cases_5YA.nb))), p_value = summary(pm25_cases_5YA.nb)$coefficients[,4]))
pm10_cases_5YA.nb_irr = data.frame(cbind(exp(cbind(OR = coef(pm10_cases_5YA.nb), confint(pm10_cases_5YA.nb))), p_value = summary(pm10_cases_5YA.nb)$coefficients[,4]))
nox_cases_5YA.nb_irr = data.frame(cbind(exp(cbind(OR = coef(nox_cases_5YA.nb), confint(nox_cases_5YA.nb))), p_value = summary(nox_cases_5YA.nb)$coefficients[,4]))
no2_cases_5YA.nb_irr = data.frame(cbind(exp(cbind(OR = coef(no2_cases_5YA.nb), confint(no2_cases_5YA.nb))), p_value = summary(no2_cases_5YA.nb)$coefficients[,4]))
o3_cases_5YA.nb_irr = data.frame(cbind(exp(cbind(OR = coef(o3_cases_5YA.nb), confint(o3_cases_5YA.nb))), p_value = summary(o3_cases_5YA.nb)$coefficients[,4]))
#make a df just with the pollutants 

LA_covid_cases_onlyPoll_5YA = data.frame(rbind(pm25_cases_5YA.nb_irr[nrow(pm25_cases_5YA.nb_irr),],
                pm10_cases_5YA.nb_irr[nrow(pm10_cases_5YA.nb_irr),],
                nox_cases_5YA.nb_irr[nrow(nox_cases_5YA.nb_irr),],
                no2_cases_5YA.nb_irr[nrow(no2_cases_5YA.nb_irr),],
                o3_cases_5YA.nb_irr[nrow(o3_cases_5YA.nb_irr),]))

sort data

LA_covid_cases_onlyPoll_5YA$names = row.names(LA_covid_cases_onlyPoll_5YA)
LA_covid_cases_onlyPoll_5YA$significance = "p-value > 0.05"
LA_covid_cases_onlyPoll_5YA$significance[LA_covid_cases_onlyPoll_5YA$p_value < 0.05] <- "p-value < 0.05"
write.csv(LA_covid_cases_onlyPoll_5YA,"data_output/LA_covid_cases_onlyPoll_5YA.csv", row.names = F)

plot

ggplot(LA_covid_cases_onlyPoll_5YA, aes(x=reorder(names, OR), y=OR, color=significance)) + 
    geom_point(fill="white", shape=21, size = 2) +
    geom_errorbar(aes(ymin=X2.5.., ymax=X97.5..),
                  width=.2,                    # Width of the error bars
                  position=position_dodge(.9)) +
  geom_hline(yintercept = 1, linetype="dotted") +
  coord_flip()+ylab("Infectivity rate ratios, pre-lockdown") + 
  xlab("Pollutants, 5 year average (2014-2018)") + theme_classic()

ggsave('fig_out/LA_CASES_odds_IRR_5YA.pdf')

Re-analysis with the cases and deaths numbers until the end of April

Prepare data: cases and death until the end of April

april_deaths = read.csv("data_output/LA_deaths_April2020.csv")[,c("Code","april_deaths")]

april_cases = read.csv("data_output/COVID_LA_cases.csv")[,c("lad19_cd","april_26_cases")]
colnames(april_cases) <- c("Code","april_cases")

april_cases= april_cases %>% 
  group_by(Code) %>% 
  summarise(april_cases = sum(april_cases))
## `summarise()` ungrouping output (override with `.groups` argument)
resub_dt = merge(april_cases,april_deaths, by = "Code")
resub_dt = merge(avgAP_dt,resub_dt, by = "Code", all.x=T)

#ADD 2018 data
resub_dt = merge(resub_dt,
                 read.csv("data_output/merged_covidAir_cov_dt_LA.csv")[,c("Code",
                                                                         "pm25_val",
                                                                         "no2_val",
                                                                         "o3_val",
                                                                         "pm10_val",
                                                                         "nox_val")], by = "Code", all.x=T)

write.csv(resub_dt, "data_output/LA_dt_resub_full.csv", row.names = F)

April cases models

LA_covid_pm25 <-glm.nb(data = resub_dt, april_cases ~ X2018.people.per.sq..km + Mean_ann_earnings + median_age_2018 +
                          pm25_val)

LA_covid_no2 <-glm.nb(data = resub_dt, april_cases ~ X2018.people.per.sq..km + Mean_ann_earnings + median_age_2018 +
                          no2_val)

LA_covid_nox <-glm.nb(data = resub_dt, april_cases ~ X2018.people.per.sq..km + Mean_ann_earnings + median_age_2018 +
                          nox_val)

LA_covid_pm10 <-glm.nb(data = resub_dt, april_cases ~ X2018.people.per.sq..km + Mean_ann_earnings + median_age_2018 +
                          pm10_val)

LA_covid_o3 <-glm.nb(data = resub_dt, april_cases ~ X2018.people.per.sq..km + Mean_ann_earnings + median_age_2018 +
                          o3_val)


LA_covid_pm25_5yAvg <-glm.nb(data = resub_dt, april_cases ~ pop_dens_5ya + earnings_5ya + age_5ya +
                          pm25_5yAvg)

LA_covid_no2_5yAvg <-glm.nb(data = resub_dt, april_cases ~ pop_dens_5ya + earnings_5ya + age_5ya +
                          no2_5yAvg)

LA_covid_nox_5yAvg <-glm.nb(data = resub_dt, april_cases ~ pop_dens_5ya + earnings_5ya + age_5ya +
                          nox_5yAvg)

LA_covid_pm10_5yAvg <-glm.nb(data = resub_dt, april_cases ~ pop_dens_5ya + earnings_5ya + age_5ya +
                          pm10_5yAvg)

LA_covid_o3_5yAvg <-glm.nb(data = resub_dt, april_cases ~ pop_dens_5ya + earnings_5ya + age_5ya +
                          o3_5yAvg)

stargazer::stargazer(LA_covid_pm25,LA_covid_pm10,LA_covid_no2,LA_covid_nox,LA_covid_o3,
                     LA_covid_pm25_5yAvg,LA_covid_pm10_5yAvg,LA_covid_no2_5yAvg,
                         LA_covid_nox_5yAvg,LA_covid_o3_5yAvg,
                     type="html",out = "fig_out/LA_covid_NB_model_resub_cases.html",
          dep.var.labels="COVID positive or not",
          single.row=TRUE)
Dependent variable:
COVID positive or not
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
X2018.people.per.sq..km 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002) 0.00004 (0.00002) 0.00003 (0.00003) 0.00004\*\* (0.00002)
Mean\_ann\_earnings \-0.00000 (0.00001) \-0.00000 (0.00001) \-0.00002\*\* (0.00001) \-0.00002\*\* (0.00001) 0.00000 (0.00001)
median\_age\_2018 \-0.099\*\*\* (0.012) \-0.095\*\*\* (0.011) \-0.055\*\*\* (0.013) \-0.059\*\*\* (0.012) \-0.072\*\*\* (0.011)
pm25\_val \-0.120\*\*\* (0.031)
pm10\_val \-0.087\*\*\* (0.020)
no2\_val 0.033\*\*\* (0.010)
nox\_val 0.019\*\*\* (0.006)
o3\_val \-0.065\*\*\* (0.012)
pop\_dens\_5ya 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002) 0.00005\*\* (0.00002) 0.00005\* (0.00002) 0.00002 (0.00002)
earnings\_5ya \-0.00001\*\* (0.00001) \-0.00001\*\* (0.00001) \-0.00001\*\* (0.00001) \-0.00001\* (0.00001) \-0.00000 (0.00001)
age\_5ya \-0.089\*\*\* (0.012) \-0.089\*\*\* (0.012) \-0.065\*\*\* (0.013) \-0.066\*\*\* (0.013) \-0.067\*\*\* (0.011)
pm25\_5yAvg \-0.038\*\* (0.019)
pm10\_5yAvg \-0.032\*\* (0.014)
no2\_5yAvg 0.020\*\*\* (0.007)
nox\_5yAvg 0.012\*\*\* (0.004)
o3\_5yAvg \-0.256\*\*\* (0.041)
Constant 10.872\*\*\* (0.629) 10.801\*\*\* (0.586) 7.974\*\*\* (0.605) 8.233\*\*\* (0.564) 9.129\*\*\* (0.480) 10.074\*\*\* (0.594) 10.149\*\*\* (0.586) 8.463\*\*\* (0.617) 8.493\*\*\* (0.601) 9.243\*\*\* (0.505)
Observations 283 283 283 283 283 283 283 283 283 283
Log Likelihood \-1,867.267 \-1,865.012 \-1,869.058 \-1,869.567 \-1,861.414 \-1,873.401 \-1,872.673 \-1,870.542 \-1,870.358 \-1,856.444
theta 2.217\*\*\* (0.176) 2.248\*\*\* (0.179) 2.193\*\*\* (0.174) 2.186\*\*\* (0.174) 2.300\*\*\* (0.184) 2.134\*\*\* (0.169) 2.143\*\*\* (0.170) 2.172\*\*\* (0.173) 2.175\*\*\* (0.173) 2.375\*\*\* (0.190)
Akaike Inf. Crit. 3,744.535 3,740.024 3,748.115 3,749.134 3,732.829 3,756.802 3,755.346 3,751.084 3,750.716 3,722.888
Note: *p\<0.1; **p\<0.05; ***p\<0.01
Plot odds ratios
list_df_full = do.call("rbind", list(
                summary(LA_covid_pm25)$coefficients[nrow(summary(LA_covid_pm25)$coefficients),1:4],
                 summary(LA_covid_pm10)$coefficients[nrow(summary(LA_covid_pm10)$coefficients),1:4],
                 summary(LA_covid_no2)$coefficients[nrow(summary(LA_covid_no2)$coefficients),1:4],
                 summary(LA_covid_nox)$coefficients[nrow(summary(LA_covid_nox)$coefficients),1:4],
                 summary(LA_covid_o3)$coefficients[nrow(summary(LA_covid_o3)$coefficients),1:4],
                 summary(LA_covid_pm25_5yAvg)$coefficients[nrow(summary(LA_covid_pm25_5yAvg)$coefficients),1:4],
                 summary(LA_covid_pm10_5yAvg)$coefficients[nrow(summary(LA_covid_pm10_5yAvg)$coefficients),1:4],
                 summary(LA_covid_no2_5yAvg)$coefficients[nrow(summary(LA_covid_no2_5yAvg)$coefficients),1:4],
                 summary(LA_covid_nox_5yAvg)$coefficients[nrow(summary(LA_covid_nox_5yAvg)$coefficients),1:4],
                 summary(LA_covid_o3_5yAvg)$coefficients[nrow(summary(LA_covid_o3_5yAvg)$coefficients),1:4]))
#add labels 

pollutants = c("pm25","pm10","no2","nox","o3",
         "pm25_5yAvg","pm10_5yAvg","no2_5yAvg","nox_5yAvg","o3_5yAvg")
pollutants = data.frame(pollutants)

list_df_full = cbind(pollutants, list_df_full)

colnames(list_df_full) <- c("pollutants", "Estimate","Std", "z","p")

list_df_full$OR = exp(list_df_full$Estimate)
list_df_full$upper = exp(list_df_full$Estimate + list_df_full$Std)
list_df_full$lower = exp(list_df_full$Estimate - list_df_full$Std)

list_df_full$significance = "p-value<0.05"
list_df_full$significance[list_df_full$p > 0.05] <- "p-value>0.05"

write.csv(list_df_full, "data_output/LA_covidInfection_RRs_fullData_resub.csv", row.names=F)

## odds ratios and 95% CI

library(ggplot2)
ggplot(list_df_full, 
       aes(x=reorder(pollutants, OR), y=OR, color = significance)) + 
    geom_point(shape=21, size = 2) +
    geom_errorbar(aes(ymin=lower, ymax=upper),
                  width=.2,                    # Width of the error bars
                  position=position_dodge(.9)) +
  theme_classic() + 
  geom_hline(yintercept = 1, linetype="dotted") +
  coord_flip()+ylab("Infectivity rate ratios, resub") + 
  xlab("pollutants")

ggsave('fig_out/LA_covidInfection_RRs_fullData_resub.pdf', width = 6, height = 4)

April deaths model

LA_covid_pm25 <-glm.nb(data = resub_dt, april_deaths ~ X2018.people.per.sq..km + Mean_ann_earnings + median_age_2018 +
                          pm25_val)

LA_covid_no2 <-glm.nb(data = resub_dt, april_deaths ~ X2018.people.per.sq..km + Mean_ann_earnings + median_age_2018 +
                          no2_val)

LA_covid_nox <-glm.nb(data = resub_dt, april_deaths ~ X2018.people.per.sq..km + Mean_ann_earnings + median_age_2018 +
                          nox_val)

LA_covid_pm10 <-glm.nb(data = resub_dt, april_deaths ~ X2018.people.per.sq..km + Mean_ann_earnings + median_age_2018 +
                          pm10_val)

LA_covid_o3 <-glm.nb(data = resub_dt, april_deaths ~ X2018.people.per.sq..km + Mean_ann_earnings + median_age_2018 +
                          o3_val)


LA_covid_pm25_5yAvg <-glm.nb(data = resub_dt, april_deaths ~ pop_dens_5ya + earnings_5ya + age_5ya +
                          pm25_5yAvg)

LA_covid_no2_5yAvg <-glm.nb(data = resub_dt, april_deaths ~ pop_dens_5ya + earnings_5ya + age_5ya +
                          no2_5yAvg)

LA_covid_nox_5yAvg <-glm.nb(data = resub_dt, april_deaths ~ pop_dens_5ya + earnings_5ya + age_5ya +
                          nox_5yAvg)

LA_covid_pm10_5yAvg <-glm.nb(data = resub_dt, april_deaths ~ pop_dens_5ya + earnings_5ya + age_5ya +
                          pm10_5yAvg)

LA_covid_o3_5yAvg <-glm.nb(data = resub_dt, april_deaths ~ pop_dens_5ya + earnings_5ya + age_5ya +
                          o3_5yAvg)

stargazer::stargazer(LA_covid_pm25,LA_covid_pm10,LA_covid_no2,LA_covid_nox,LA_covid_o3,
                     LA_covid_pm25_5yAvg,LA_covid_pm10_5yAvg,LA_covid_no2_5yAvg,
                         LA_covid_nox_5yAvg,LA_covid_o3_5yAvg,
                     type="html",out = "fig_out/LA_covid_NB_model_resub_deaths.html",
          dep.var.labels="COVID-19 death",
          single.row=TRUE)
Dependent variable:
COVID-19 death
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
X2018.people.per.sq..km 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002) 0.00003 (0.00002) 0.00003 (0.00002) 0.00005\*\* (0.00002)
Mean\_ann\_earnings 0.00000 (0.00001) 0.00000 (0.00001) \-0.00001 (0.00001) \-0.00001 (0.00001) 0.00000 (0.00001)
median\_age\_2018 \-0.076\*\*\* (0.011) \-0.074\*\*\* (0.010) \-0.043\*\*\* (0.011) \-0.046\*\*\* (0.011) \-0.059\*\*\* (0.010)
pm25\_val \-0.060\*\* (0.029)
pm10\_val \-0.047\*\* (0.018)
no2\_val 0.031\*\*\* (0.009)
nox\_val 0.018\*\*\* (0.005)
o3\_val \-0.042\*\*\* (0.011)
pop\_dens\_5ya 0.0001\*\*\* (0.00002) 0.0001\*\*\* (0.00002) 0.00004\*\* (0.00002) 0.00004\* (0.00002) 0.00003 (0.00002)
earnings\_5ya \-0.00001 (0.00001) \-0.00000 (0.00001) \-0.00000 (0.00001) \-0.00000 (0.00001) 0.00000 (0.00001)
age\_5ya \-0.069\*\*\* (0.011) \-0.069\*\*\* (0.011) \-0.046\*\*\* (0.011) \-0.046\*\*\* (0.011) \-0.056\*\*\* (0.010)
pm25\_5yAvg 0.005 (0.018)
pm10\_5yAvg 0.0004 (0.012)
no2\_5yAvg 0.025\*\*\* (0.006)
nox\_5yAvg 0.016\*\*\* (0.004)
o3\_5yAvg \-0.183\*\*\* (0.038)
Constant 8.145\*\*\* (0.577) 8.170\*\*\* (0.540) 6.087\*\*\* (0.544) 6.310\*\*\* (0.508) 7.170\*\*\* (0.442) 7.466\*\*\* (0.538) 7.536\*\*\* (0.533) 6.150\*\*\* (0.548) 6.196\*\*\* (0.534) 7.254\*\*\* (0.467)
Observations 283 283 283 283 283 283 283 283 283 283
Log Likelihood \-1,535.037 \-1,534.045 \-1,530.920 \-1,531.456 \-1,530.728 \-1,537.505 \-1,537.548 \-1,528.701 \-1,528.580 \-1,526.697
theta 2.683\*\*\* (0.220) 2.700\*\*\* (0.221) 2.759\*\*\* (0.227) 2.749\*\*\* (0.226) 2.761\*\*\* (0.227) 2.641\*\*\* (0.216) 2.641\*\*\* (0.216) 2.799\*\*\* (0.230) 2.801\*\*\* (0.230) 2.837\*\*\* (0.234)
Akaike Inf. Crit. 3,080.074 3,078.090 3,071.840 3,072.912 3,071.456 3,085.011 3,085.096 3,067.402 3,067.160 3,063.394
Note: *p\<0.1; **p\<0.05; ***p\<0.01
Plot odds ratios
list_df_full = do.call("rbind", list(
                summary(LA_covid_pm25)$coefficients[nrow(summary(LA_covid_pm25)$coefficients),1:4],
                 summary(LA_covid_pm10)$coefficients[nrow(summary(LA_covid_pm10)$coefficients),1:4],
                 summary(LA_covid_no2)$coefficients[nrow(summary(LA_covid_no2)$coefficients),1:4],
                 summary(LA_covid_nox)$coefficients[nrow(summary(LA_covid_nox)$coefficients),1:4],
                 summary(LA_covid_o3)$coefficients[nrow(summary(LA_covid_o3)$coefficients),1:4],
                 summary(LA_covid_pm25_5yAvg)$coefficients[nrow(summary(LA_covid_pm25_5yAvg)$coefficients),1:4],
                 summary(LA_covid_pm10_5yAvg)$coefficients[nrow(summary(LA_covid_pm10_5yAvg)$coefficients),1:4],
                 summary(LA_covid_no2_5yAvg)$coefficients[nrow(summary(LA_covid_no2_5yAvg)$coefficients),1:4],
                 summary(LA_covid_nox_5yAvg)$coefficients[nrow(summary(LA_covid_nox_5yAvg)$coefficients),1:4],
                 summary(LA_covid_o3_5yAvg)$coefficients[nrow(summary(LA_covid_o3_5yAvg)$coefficients),1:4]))
#add labels 

pollutants = c("pm25","pm10","no2","nox","o3",
         "pm25_5yAvg","pm10_5yAvg","no2_5yAvg","nox_5yAvg","o3_5yAvg")
pollutants = data.frame(pollutants)

list_df_full = cbind(pollutants, list_df_full)

colnames(list_df_full) <- c("pollutants", "Estimate","Std", "z","p")

list_df_full$OR = exp(list_df_full$Estimate)
list_df_full$upper = exp(list_df_full$Estimate + list_df_full$Std)
list_df_full$lower = exp(list_df_full$Estimate - list_df_full$Std)

list_df_full$significance = "p-value<0.05"
list_df_full$significance[list_df_full$p > 0.05] <- "p-value>0.05"

write.csv(list_df_full, "data_output/LA_covidDEATH_RRs_fullData_resub.csv", row.names=F)

## odds ratios and 95% CI

library(ggplot2)
ggplot(list_df_full, 
       aes(x=reorder(pollutants, OR), y=OR, color = significance)) + 
    geom_point(shape=21, size = 2) +
    geom_errorbar(aes(ymin=lower, ymax=upper),
                  width=.2,                    # Width of the error bars
                  position=position_dodge(.9)) +
  theme_classic() + 
  geom_hline(yintercept = 1, linetype="dotted") +
  coord_flip()+ylab("Mortality rate ratios, resub") + 
  xlab("pollutants")

ggsave('fig_out/LA_covidDEATH_RRs_fullData_resub.pdf', width = 6, height = 4)

Aim 3: Individual level analysis

Data curation

covid data curation

covid_df = read.csv("data/covid19_result.txt", sep = '\t')
covid_id = covid_df[,c(1,5,6)]
colnames(covid_id)
## [1] "eid"    "origin" "result"
covid_id_unique = aggregate(covid_id, by=list(covid_id$eid), 
                            FUN=max)

ggplot(data = covid_id_unique, aes(x = result))+
  geom_histogram(stat="count")

ggplot(data = covid_id_unique, aes(x = origin))+
  geom_histogram(stat="count")

Look at time

covid_df_time = covid_df
covid_df_time$specdate = as.Date(covid_df_time$specdate, format="%d/%m/%Y")
covid_df_time<-covid_df_time %>% 
group_by(specdate) %>%
mutate(culm_cases=cumsum(result))

ggplot(covid_df_time, aes(x=specdate, y=culm_cases)) +
  geom_bar(stat="identity") + 
  scale_x_date(date_labels = "%d/%m/%Y")

load non-covid UKB phenotype data

Data curation

### load non-covid UKB phenotype data ----
ukb_covid = read.csv("data/29_4_2020_ukb41646_covid19_subset.csv")[,-1]
#check distribution

#here, I will aggregate the columns by selecting the last results, for each 
# reshape
ukb_covid_t = melt(ukb_covid, id='eid')
ukb_covid_t <- ukb_covid_t[order(ukb_covid_t$variable),] 
#delete everything after period 
ukb_covid_t$variable = gsub("\\..*","",ukb_covid_t$variable)

#aggregate by last 
ukb_covid_t_na = na.omit(ukb_covid_t)
ukb_covid_t_na = aggregate(ukb_covid_t_na, by=list(ukb_covid_t_na$eid,ukb_covid_t_na$variable), FUN=last)

# put it back straight 
ukb_covid_t_na$variable = as.factor(ukb_covid_t_na$variable)
ukb_covid_t_na = ukb_covid_t_na[,-c(1,2)]
ukb_covid_cur = cast(ukb_covid_t_na, eid~variable)

### label UKB data ----
#get levels and labels 
lvl.0493 <- c(-141,-131,-121,0)
lbl.0493 <- c("Often","Sometimes","Do not know","Rarely/never")

lvl.0007 <- c(0,1)
lbl.0007 <- c("No","Yes")

lvl.100349 <- c(-3,-1,0,1)
lbl.100349 <- c("Prefer not to answer","Do not know","No","Yes")

lvl.0090 <- c(-3,0,1,2)
lbl.0090 <- c("Prefer not to answer","Never","Previous","Current")

lvl.0009 <- c(0,1)
lbl.0009 <- c("Female","Male")

### eid replacement, saved here ----
yy_replace <- function(string, patterns, replacements) {
  for (i in seq_along(patterns))
    string <- gsub(patterns[i], replacements[i], string, perl=TRUE)
  string
}
#label = replacement, lvl = to replace

ukb_covid_cur$X22609 <- yy_replace(ukb_covid_cur$X22609, lvl.0493, lbl.0493)
ukb_covid_cur$X22610 <- yy_replace(ukb_covid_cur$X22610, lvl.0493, lbl.0493)
ukb_covid_cur$X22611 <- yy_replace(ukb_covid_cur$X22611, lvl.0493, lbl.0493)
ukb_covid_cur$X22615 <- yy_replace(ukb_covid_cur$X22615, lvl.0493, lbl.0493)
ukb_covid_cur$X22616 <- yy_replace(ukb_covid_cur$X22616, lvl.0007, lbl.0007)
ukb_covid_cur$X2316 <- yy_replace(ukb_covid_cur$X2316, lvl.100349, lbl.100349)
ukb_covid_cur$X31 <- yy_replace(ukb_covid_cur$X31, lvl.0009, lbl.0009)
ukb_covid_cur$X22130 <- yy_replace(ukb_covid_cur$X22130, lvl.0007, lbl.0007)
ukb_covid_cur$X2443 <- yy_replace(ukb_covid_cur$X2443, lvl.100349, lbl.100349)
ukb_covid_cur$X20116 <- yy_replace(ukb_covid_cur$X20116, lvl.0090, lbl.0090)


### put custom column names----
#get the ID that I designed
UID_names = read.csv("data/UKB_list_columns_AIRcovid.csv")[c("FieldID","my_colname")]
UID_names$FieldID = paste("X",UID_names$FieldID, sep = "")
UID_names$my_colname = lapply(UID_names$my_colname,toString)
UID_names[nrow(UID_names) + 1,] = c("eid","eid")

# sort it according to the dataset

UID_names_match = colnames(ukb_covid_cur)

UID_sort_df = data.frame(UID_names_match)
colnames(UID_sort_df) <- "FieldID"

#merge to sort
UID_sort_df = merge(UID_sort_df, UID_names, by = "FieldID")
# this is now sorted according to the correct df

#replace column names
ukb_covid_df = ukb_covid_cur

colnames(ukb_covid_df) <- UID_sort_df$my_colname

### merge both ukb datasets together ----
ukb_covid_merge = merge(covid_id_unique[,-1], ukb_covid_df, by = "eid")
nrow(ukb_covid_merge)
## [1] 1474

further processing…

# do 2020 since COVID occurred in this year
ukb_covid_merge$age = 2020 - ukb_covid_merge$birthYear
ukb_covid_merge$whr = ukb_covid_merge$waist / ukb_covid_merge$hip

#definition from the NHS: Diastolic >= 90 OR Systolic >=140
ukb_covid_merge$highBP[ukb_covid_merge$diaBP>=90 | ukb_covid_merge$sysBP>=140] <-1
ukb_covid_merge$highBP[ukb_covid_merge$diaBP<90 | ukb_covid_merge$sysBP<140] <-0

# add a column: COVID + and inpatient
ukb_covid_merge$inpatient_covid  <-0
ukb_covid_merge$inpatient_covid [ukb_covid_merge$origin == 1 & ukb_covid_merge$origin == 1] <-1

write.csv(ukb_covid_merge, "data_output/processed_29_4_2020_ukb41646_covid19.csv")

Preliminary analysis with PM2.5

I will load the air pollutants & match the covid patients to their nearest pollution climate mapping location.

### load pm2.5 data ----
#note the pop weighted data have the area code, but not the x y coords

pm25 = read.csv("data/processed_PM25_uk-air_annual_mean_mappm252018g.csv", na.strings = "MISSING")[c("x","y","pm252018g")]
nrow(pm25)
## [1] 281802
pm25_na = na.omit(pm25)
nrow(pm25_na)
## [1] 254877
### convert to lon lat - pm2.5 ----

# transform: easting/northing -> long/lat
### shortcuts (from https://stephendavidgregory.github.io/useful/UKgrid_to_LatLon)
ukgrid <- "+init=epsg:27700"
latlong <- "+init=epsg:4326"

### Create coordinates variable
pm25_coords <- cbind(Easting = as.numeric(as.character(pm25_na$x)),
                Northing = as.numeric(as.character(pm25_na$y)))

### Create the SpatialPointsDataFrame
pm25_SP <- SpatialPointsDataFrame(pm25_coords,
                                 data = pm25_na,
                                 proj4string = CRS("+init=epsg:27700"))

### Convert
pm25_LL <- spTransform(pm25_SP, CRS(latlong))

pm25_ll_df = data.frame('pm25_lon' = coordinates(pm25_LL)[, 1], 'pm25_lat' = coordinates(pm25_LL)[, 2], 'pm25_val' = pm25_LL$pm252018g)


write.csv(pm25_ll_df,"data_output/processed_pm25_lonlat.csv")

convert UKB data to long lat

# convert ukb data eastings northing to lon lat 

ukb_covid_coord_na = na.omit(ukb_covid_merge[c("eid","x_coord","y_coord")])
nrow(ukb_covid_coord_na)
## [1] 1464
ukb_covid_coords <- cbind(Easting = as.numeric(as.character(ukb_covid_coord_na$x_coord)),
                     Northing = as.numeric(as.character(ukb_covid_coord_na$y_coord)))
### Create the SpatialPointsDataFrame
ukb_covid_SP <- SpatialPointsDataFrame(ukb_covid_coords,
                                  data = ukb_covid_coord_na,
                                  proj4string = CRS("+init=epsg:27700"))

### Convert
ukb_covid_ll <- spTransform(ukb_covid_SP, CRS(latlong))

ukb_covid_ll_df = data.frame('ukb_lon' = coordinates(ukb_covid_ll)[, 1], 
                             'ukb_lat' = coordinates(ukb_covid_ll)[, 2], 
                             'eid' = ukb_covid_ll$eid)
write.csv(ukb_covid_ll_df, ("data_output/ukb_covid_lonlat_df.csv"))
# plot UKB locations
world <- ne_countries(scale = "medium", returnclass = "sf")
ggplot(data = world) +
  geom_sf() +
  geom_point(data = ukb_covid_ll_df, aes(x = ukb_lon, y = ukb_lat), size = 1, 
             shape = 23, fill = "darkred") +
  coord_sf(ylim = c(min(ukb_covid_ll_df$ukb_lat)-2, max(ukb_covid_ll_df$ukb_lat)+4), 
           xlim = c(min(ukb_covid_ll_df$ukb_lon)-4, max(ukb_covid_ll_df$ukb_lon)+3), expand = FALSE)

ggsave('fig_out/UKB_COVID_participants_locations.pdf')

I wrote a python script: match_air_to_UKB_covid

Merge and QC data for PM2.5

ukb_eid_pm25_merged = read.csv('data_output/merged_ukb_pm25.csv')[c("eid","distance","pm25_val")]

ukb_covid_pm25_df = merge(ukb_covid_merge, ukb_eid_pm25_merged, by = 'eid')
colnames(ukb_covid_pm25_df)
##  [1] "eid"             "origin"          "result"          "n_cancers"      
##  [5] "townsend"        "x_coord"         "y_coord"         "smoking"        
##  [9] "fev"             "copd"            "WP_dusty"        "WP_chemicals"   
## [13] "WP_cig"          "WP_diesel"       "breathing"       "whistling"      
## [17] "diabetes"        "sex"             "birthYear"       "diaBP"          
## [21] "sysBP"           "waist"           "hip"             "height"         
## [25] "age"             "whr"             "highBP"          "inpatient_covid"
## [29] "distance"        "pm25_val"
#omit those without coordinates in the ukb data 

### merge pop density----
ukb_cities_eid = read.csv("data_output/2_5_2020_ukb_eid_match_cities.csv")[c("eid","spec_area","gen_area")]
nrow(ukb_cities_eid)
## [1] 1464
popDens_2018 = read.csv("data/2018_official_popDensity.csv")[c("Name","X2018.people.per.sq..km")]
popDens_2018$X2018.people.per.sq..km = gsub(",", "", popDens_2018$X2018.people.per.sq..km)
popDens_2018$X2018.people.per.sq..km = as.numeric(popDens_2018$X2018.people.per.sq..km)

#Make everything lower case
ukb_cities_eid$spec_area = tolower(ukb_cities_eid$spec_area)
ukb_cities_eid$gen_area = tolower(ukb_cities_eid$gen_area)
popDens_2018$Name = tolower(popDens_2018$Name)

#delete repeated strings
ukb_cities_eid$spec_area = gsub("london borough of ", "", ukb_cities_eid$spec_area)
ukb_cities_eid$spec_area = gsub("royal borough of ", "", ukb_cities_eid$spec_area)
ukb_cities_eid$spec_area = gsub("city of ", "", ukb_cities_eid$spec_area)
ukb_cities_eid$spec_area = gsub("st helens", "helens", ukb_cities_eid$spec_area)
ukb_cities_eid$gen_area = gsub(" combined authority", "", ukb_cities_eid$gen_area)
popDens_2018$Name = gsub(", city of", "", popDens_2018$Name)
popDens_2018$Name = gsub("st. ", "", popDens_2018$Name)
#delete west midlands because its redundant (all cities are well defined) and there are several regions called west midlands so it's confusing
popDens_2018$Name = gsub("west midlands", "deleted_west_midlands_because_redundant", popDens_2018$Name)
popDens_2018$Name = gsub("\\s*\\([^\\)]+\\)","",popDens_2018$Name)

ukb_cities_eid_merged = merge(ukb_cities_eid, popDens_2018, by.x="spec_area",by.y="Name", all.x=TRUE)
colnames(ukb_cities_eid_merged)[4] <- "spec_Pop_dens_perkm2"
nrow(ukb_cities_eid_merged)
## [1] 1464
#check for duplicated rows in popDens
length(unique(popDens_2018$Name))
## [1] 434
length(popDens_2018$Name) # there is a dupicate
## [1] 435
popDens_2018$Name[duplicated(popDens_2018$Name)]
## [1] "deleted_west_midlands_because_redundant"
ukb_cities_eid_merged2 = merge(ukb_cities_eid_merged, popDens_2018, by.x="gen_area",by.y="Name", all.x=TRUE,all.y=FALSE)
colnames(ukb_cities_eid_merged2)[5] <- "gen_Pop_dens_perkm2"
nrow(ukb_cities_eid_merged2)
## [1] 1464
#QC
sum(is.na(ukb_cities_eid_merged2$spec_Pop_dens_perkm2))
## [1] 214
sum(is.na(ukb_cities_eid_merged2$gen_Pop_dens_perkm2))
## [1] 516
ukb_cities_eid_merged2$merged_popDens_km2 <- ifelse(is.na(ukb_cities_eid_merged2$spec_Pop_dens_perkm2), 
                                                    ukb_cities_eid_merged2$gen_Pop_dens_perkm2, ukb_cities_eid_merged2$spec_Pop_dens_perkm2)
ukb_cities_eid_out = ukb_cities_eid_merged2[c("eid","merged_popDens_km2")]
ukb_covid_pm25_popDens_df = merge(ukb_covid_pm25_df, ukb_cities_eid_out, by = 'eid')
nrow(ukb_covid_pm25_df)
## [1] 1464
ukb_covid_pm25_popDens_df = read.csv("data_output/2_5_2020_full_cov_CV_UKB_air_dataset.csv")
ukb_covid_pm25_popDens_df$smoker = ifelse(ukb_covid_pm25_popDens_df$smoking =="Current", "smoking", "not_smoking")

library(MASS)
summary(ukb_covid_pm25.nb <-glm.nb(data = ukb_covid_pm25_popDens_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
                               whr + sex + age + pm25_val))
## 
## Call:
## glm.nb(formula = result ~ merged_popDens_km2 + n_cancers + townsend + 
##     smoking + whistling + diabetes + whr + sex + age + pm25_val, 
##     data = ukb_covid_pm25_popDens_df, init.theta = 15229.88441, 
##     link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2059  -0.9479  -0.8028   0.6589   1.1708  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                   -2.065e+00  8.543e-01  -2.418  0.01562 * 
## merged_popDens_km2            -9.956e-06  1.884e-05  -0.529  0.59711   
## n_cancers                     -1.258e-01  1.272e-01  -0.989  0.32253   
## townsend                       1.354e-02  1.310e-02   1.034  0.30121   
## smokingNever                   3.665e-01  1.379e-01   2.658  0.00785 **
## smokingPrefer not to answer    7.608e-01  4.478e-01   1.699  0.08929 . 
## smokingPrevious                4.056e-01  1.377e-01   2.946  0.00322 **
## whistlingNo                    2.671e-01  2.698e-01   0.990  0.32220   
## whistlingPrefer not to answer  4.568e-02  1.276e+00   0.036  0.97145   
## whistlingYes                   2.533e-01  2.742e-01   0.924  0.35559   
## diabetesNo                    -1.619e-01  6.106e-01  -0.265  0.79085   
## diabetesPrefer not to answer  -6.877e-01  1.433e+00  -0.480  0.63140   
## diabetesYes                   -1.956e-01  6.195e-01  -0.316  0.75215   
## whr                            7.856e-01  5.839e-01   1.346  0.17846   
## sexMale                        3.340e-02  1.000e-01   0.334  0.73838   
## age                           -5.681e-03  4.748e-03  -1.196  0.23153   
## pm25_val                       6.035e-02  2.866e-02   2.106  0.03520 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(15229.88) family taken to be 1)
## 
##     Null deviance: 1040.2  on 1449  degrees of freedom
## Residual deviance: 1015.6  on 1433  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 2365.6
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  15230 
##           Std. Err.:  58208 
## Warning while fitting theta: iteration limit reached 
## 
##  2 x log-likelihood:  -2329.646
summary(ukb_covid_pm25.p <-glm(data = ukb_covid_pm25_popDens_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
                                     whr + sex + age + pm25_val, family = 'poisson'))
## 
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend + 
##     smoking + whistling + diabetes + whr + sex + age + pm25_val, 
##     family = "poisson", data = ukb_covid_pm25_popDens_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2060  -0.9479  -0.8028   0.6589   1.1708  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                   -2.065e+00  8.543e-01  -2.418  0.01562 * 
## merged_popDens_km2            -9.956e-06  1.883e-05  -0.529  0.59711   
## n_cancers                     -1.258e-01  1.272e-01  -0.989  0.32252   
## townsend                       1.354e-02  1.310e-02   1.034  0.30120   
## smokingNever                   3.665e-01  1.379e-01   2.658  0.00785 **
## smokingPrefer not to answer    7.609e-01  4.478e-01   1.699  0.08928 . 
## smokingPrevious                4.056e-01  1.377e-01   2.946  0.00321 **
## whistlingNo                    2.671e-01  2.698e-01   0.990  0.32219   
## whistlingPrefer not to answer  4.567e-02  1.276e+00   0.036  0.97145   
## whistlingYes                   2.533e-01  2.742e-01   0.924  0.35558   
## diabetesNo                    -1.619e-01  6.105e-01  -0.265  0.79084   
## diabetesPrefer not to answer  -6.877e-01  1.433e+00  -0.480  0.63138   
## diabetesYes                   -1.956e-01  6.194e-01  -0.316  0.75214   
## whr                            7.856e-01  5.839e-01   1.346  0.17845   
## sexMale                        3.340e-02  1.000e-01   0.334  0.73837   
## age                           -5.681e-03  4.748e-03  -1.196  0.23151   
## pm25_val                       6.035e-02  2.866e-02   2.106  0.03520 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1040.2  on 1449  degrees of freedom
## Residual deviance: 1015.6  on 1433  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 2363.6
## 
## Number of Fisher Scoring iterations: 5
#I dont' think that pop density should be an offset here
#check for multicolinearity
car::vif(ukb_covid_pm25.nb)
##                        GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 2.443689  1        1.563230
## n_cancers          1.019337  1        1.009622
## townsend           1.442148  1        1.200895
## smoking            1.259363  3        1.039182
## whistling          1.875826  3        1.110535
## diabetes           2.049216  3        1.127019
## whr                1.850854  1        1.360461
## sex                1.619879  1        1.272744
## age                1.138711  1        1.067104
## pm25_val           2.011763  1        1.418366
# according to http://www.jstor.org/stable/2290467, GVIF^(1/(2*Df)) < 5 is ok 
# model reduction: 

summary(ukb_covid_pm25.nb_red <-glm.nb(data = ukb_covid_pm25_popDens_df, result ~ merged_popDens_km2 + smoking +pm25_val))
## 
## Call:
## glm.nb(formula = result ~ merged_popDens_km2 + smoking + pm25_val, 
##     data = ukb_covid_pm25_popDens_df, init.theta = 15585.93096, 
##     link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1217  -0.9499  -0.8075   0.6755   1.0620  
## 
## Coefficients:
##                               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                 -1.639e+00  2.629e-01  -6.235 4.52e-10 ***
## merged_popDens_km2          -9.261e-07  1.680e-05  -0.055  0.95604    
## smokingNever                 3.369e-01  1.344e-01   2.507  0.01216 *  
## smokingPrefer not to answer  6.113e-01  4.261e-01   1.435  0.15139    
## smokingPrevious              3.899e-01  1.353e-01   2.882  0.00395 ** 
## pm25_val                     5.801e-02  2.833e-02   2.048  0.04055 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(15585.93) family taken to be 1)
## 
##     Null deviance: 1049.0  on 1461  degrees of freedom
## Residual deviance: 1032.7  on 1456  degrees of freedom
##   (2 observations deleted due to missingness)
## AIC: 2370.7
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  15586 
##           Std. Err.:  60471 
## Warning while fitting theta: iteration limit reached 
## 
##  2 x log-likelihood:  -2356.739
summary(ukb_covid_pm25.p_red <-glm(data = ukb_covid_pm25_popDens_df, result ~ merged_popDens_km2 + smoking + pm25_val, family = 'poisson'))
## 
## Call:
## glm(formula = result ~ merged_popDens_km2 + smoking + pm25_val, 
##     family = "poisson", data = ukb_covid_pm25_popDens_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1218  -0.9499  -0.8075   0.6756   1.0620  
## 
## Coefficients:
##                               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                 -1.639e+00  2.629e-01  -6.235 4.52e-10 ***
## merged_popDens_km2          -9.261e-07  1.680e-05  -0.055  0.95604    
## smokingNever                 3.369e-01  1.344e-01   2.508  0.01216 *  
## smokingPrefer not to answer  6.113e-01  4.261e-01   1.435  0.15138    
## smokingPrevious              3.899e-01  1.353e-01   2.882  0.00395 ** 
## pm25_val                     5.801e-02  2.832e-02   2.048  0.04055 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1049.0  on 1461  degrees of freedom
## Residual deviance: 1032.7  on 1456  degrees of freedom
##   (2 observations deleted due to missingness)
## AIC: 2368.7
## 
## Number of Fisher Scoring iterations: 5
#check 
anova(ukb_covid_pm25.nb, ukb_covid_pm25.nb_red) # p < 0.05 -> not significantly different 
## Likelihood ratio tests of Negative Binomial Models
## 
## Response: result
##                                                                                                     Model
## 1                                                                 merged_popDens_km2 + smoking + pm25_val
## 2 merged_popDens_km2 + n_cancers + townsend + smoking + whistling + diabetes + whr + sex + age + pm25_val
##      theta Resid. df    2 x log-lik.   Test    df LR stat.   Pr(Chi)
## 1 15585.93      1456       -2356.739                                
## 2 15229.88      1433       -2329.646 1 vs 2    23  27.0924 0.2520556

Conclusion:
1. PM2.5 is a significant predictor of COVID-19: more cases with PM2.5 levels are high
2. Smoking interpretation
The probability of infection of people who never smoked is the expected difference in log count between reference group -> 1.442676 * rate of current smoker
That of previous smokers = 1.5001871 * rate of current smoker
3. In the actual analysis, I can explore using binary models

Convert the rest of the pollutant variables from X Y to long lat

### convert NO2 x y to lat lon ----
### shortcuts (from https://stephendavidgregory.github.io/useful/UKgrid_to_LatLon)
no2_raw_dt = na.omit(read.csv("data/processed_30_4_2020_NO2_map2018g.csv", na.strings = "MISSING")[c("x","y","no22018")])

ukgrid <- "+init=epsg:27700"
latlong <- "+init=epsg:4326"

### Create coordinates variable
no2_coords <- cbind(Easting = as.numeric(as.character(no2_raw_dt$x)),
                     Northing = as.numeric(as.character(no2_raw_dt$y)))

no2_LL <- spTransform(SpatialPointsDataFrame(no2_coords,
                                  data = no2_raw_dt,
                                  proj4string = CRS("+init=epsg:27700")), CRS(latlong))

no2_LL_df = data.frame('no2_lon' = coordinates(no2_LL)[, 1], 'no2_lat' = coordinates(no2_LL)[, 2], 'no2_val' = no2_LL$no22018)

write.csv(no2_LL_df,"data_output/processed_no2_lonlat.csv")

### convert SO2 x y to lat lon ----
so2_raw_dt = na.omit(read.csv("data/processed_30_4_2020_SO2_map2018g.csv", na.strings = "MISSING")[c("x","y","so22018")])

so2_coords <- cbind(Easting = as.numeric(as.character(so2_raw_dt$x)),
                    Northing = as.numeric(as.character(so2_raw_dt$y)))

so2_LL <- spTransform(SpatialPointsDataFrame(so2_coords,
                                             data = so2_raw_dt,
                                             proj4string = CRS("+init=epsg:27700")), CRS(latlong))

so2_LL_df = data.frame('so2_lon' = coordinates(so2_LL)[, 1], 'so2_lat' = coordinates(so2_LL)[, 2], 'so2_val' = so2_LL$so22018)

write.csv(so2_LL_df,"data_output/processed_so2_lonlat.csv")

### convert 03 x y to lat lon ----

o3_raw_dt = na.omit(read.csv("data/processed_30_4_2020_O3_map2018g.csv", na.strings = "MISSING")[c("x","y","dgt12018")])

o3_coords <- cbind(Easting = as.numeric(as.character(o3_raw_dt$x)),
                    Northing = as.numeric(as.character(o3_raw_dt$y)))

o3_LL <- spTransform(SpatialPointsDataFrame(o3_coords,
                                             data = o3_raw_dt,
                                             proj4string = CRS("+init=epsg:27700")), CRS(latlong))

o3_LL_df = data.frame('o3_lon' = coordinates(o3_LL)[, 1], 'o3_lat' = coordinates(o3_LL)[, 2], 'o3_val' = o3_LL$dgt12018)

write.csv(o3_LL_df,"data_output/processed_o3_lonlat.csv")

### convert PM10 x y to lat lon ----

pm10_raw_dt = na.omit(read.csv("data/processed_30_4_2020_PM10_mappm102018g.csv", na.strings = "MISSING")[c("x","y","pm102018g")])

pm10_coords <- cbind(Easting = as.numeric(as.character(pm10_raw_dt$x)),
                   Northing = as.numeric(as.character(pm10_raw_dt$y)))

pm10_LL <- spTransform(SpatialPointsDataFrame(pm10_coords,
                                            data = pm10_raw_dt,
                                            proj4string = CRS("+init=epsg:27700")), CRS(latlong))

pm10_LL_df = data.frame('pm10_lon' = coordinates(pm10_LL)[, 1], 'pm10_lat' = coordinates(pm10_LL)[, 2], 'pm10_val' = pm10_LL$pm102018g)

write.csv(pm10_LL_df,"data_output/processed_pm10_lonlat.csv")


### convert NOx x y to lat lon ----

nox_raw_dt = na.omit(read.csv("data/processed_30_4_2020_NOX_map2018g.csv", na.strings = "MISSING")[c("x","y","nox2018")])

nox_coords <- cbind(Easting = as.numeric(as.character(nox_raw_dt$x)),
                     Northing = as.numeric(as.character(nox_raw_dt$y)))

nox_LL <- spTransform(SpatialPointsDataFrame(nox_coords,
                                              data = nox_raw_dt,
                                              proj4string = CRS("+init=epsg:27700")), CRS(latlong))

nox_LL_df = data.frame('nox_lon' = coordinates(nox_LL)[, 1], 'nox_lat' = coordinates(nox_LL)[, 2], 'nox_val' = nox_LL$nox2018)

write.csv(nox_LL_df,"data_output/processed_nox_lonlat.csv")

Analysis

load and merge all data

no2_ukb = read.csv("data_output/merged_ukb_no2.csv")[c('eid','no2_val')]
so2_ukb = read.csv("data_output/merged_ukb_so2.csv")[c('eid','so2_val')]
o3_ukb = read.csv("data_output/merged_ukb_o3.csv")[c('eid','o3_val')]
nox_ukb = read.csv("data_output/merged_ukb_nox.csv")[c('eid','nox_val')]
pm10_ukb = read.csv("data_output/merged_ukb_pm10.csv")[c('eid','pm10_val')]

ukb_additional_pol = merge(no2_ukb, so2_ukb, by = 'eid')
ukb_additional_pol = merge(ukb_additional_pol, o3_ukb, by = 'eid')
ukb_additional_pol = merge(ukb_additional_pol, nox_ukb, by = 'eid')
ukb_additional_pol = merge(ukb_additional_pol, pm10_ukb, by = 'eid')

ukb_covid_allPol_df = merge(ukb_covid_pm25_popDens_df, ukb_additional_pol, by = 'eid')

Descriptive statistics of the UK Biobank data

library(arsenal)
ukb_descript_stats <- tableby(result ~ ., data = ukb_covid_allPol_df[,3:ncol(ukb_covid_allPol_df)])
summary(ukb_descript_stats, title = "Descriptive statistics of the UK Biobank data")
## 
## Table: Descriptive statistics of the UK Biobank data
## 
## |                                       |        0 (N=800)        |        1 (N=664)        |     Total (N=1464)      | p value|
## |:--------------------------------------|:-----------------------:|:-----------------------:|:-----------------------:|-------:|
## |**origin**                             |                         |                         |                         | < 0.001|
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |      0.672 (0.470)      |      0.858 (0.349)      |      0.757 (0.429)      |        |
## |&nbsp;&nbsp;&nbsp;Range                |      0.000 - 1.000      |      0.000 - 1.000      |      0.000 - 1.000      |        |
## |**n_cancers**                          |                         |                         |                         |   0.173|
## |&nbsp;&nbsp;&nbsp;N-Miss               |            0            |            1            |            1            |        |
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |      0.119 (0.350)      |      0.095 (0.309)      |      0.108 (0.332)      |        |
## |&nbsp;&nbsp;&nbsp;Range                |      0.000 - 2.000      |      0.000 - 2.000      |      0.000 - 2.000      |        |
## |**townsend**                           |                         |                         |                         |   0.042|
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |     -0.338 (3.506)      |      0.038 (3.551)      |     -0.167 (3.530)      |        |
## |&nbsp;&nbsp;&nbsp;Range                |     -6.156 - 8.873      |     -6.079 - 8.940      |     -6.156 - 8.940      |        |
## |**x_coord**                            |                         |                         |                         |   0.010|
## |&nbsp;&nbsp;&nbsp;Mean (SD)            | 428271.250 (56003.689)  | 436143.072 (60211.610)  | 431841.530 (58062.493)  |        |
## |&nbsp;&nbsp;&nbsp;Range                | 321000.000 - 542000.000 | 327000.000 - 540000.000 | 321000.000 - 542000.000 |        |
## |**y_coord**                            |                         |                         |                         |   0.108|
## |&nbsp;&nbsp;&nbsp;Mean (SD)            | 342893.750 (133805.914) | 331689.759 (131765.710) | 337812.158 (132956.294) |        |
## |&nbsp;&nbsp;&nbsp;Range                | 149000.000 - 580000.000 | 152000.000 - 580000.000 | 149000.000 - 580000.000 |        |
## |**smoking**                            |                         |                         |                         |   0.002|
## |&nbsp;&nbsp;&nbsp;N-Miss               |            0            |            2            |            2            |        |
## |&nbsp;&nbsp;&nbsp;Current              |       135 (16.9%)       |       68 (10.3%)        |       203 (13.9%)       |        |
## |&nbsp;&nbsp;&nbsp;Never                |       353 (44.1%)       |       302 (45.6%)       |       655 (44.8%)       |        |
## |&nbsp;&nbsp;&nbsp;Prefer not to answer |        4 (0.5%)         |        6 (0.9%)         |        10 (0.7%)        |        |
## |&nbsp;&nbsp;&nbsp;Previous             |       308 (38.5%)       |       286 (43.2%)       |       594 (40.6%)       |        |
## |**fev**                                |                         |                         |                         |   0.084|
## |&nbsp;&nbsp;&nbsp;N-Miss               |           559           |           484           |          1043           |        |
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |      2.899 (0.531)      |      2.992 (0.567)      |      2.939 (0.548)      |        |
## |&nbsp;&nbsp;&nbsp;Range                |      1.658 - 4.522      |      1.954 - 4.098      |      1.658 - 4.522      |        |
## |**copd**                               |                         |                         |                         |   0.055|
## |&nbsp;&nbsp;&nbsp;N-Miss               |           644           |           564           |          1208           |        |
## |&nbsp;&nbsp;&nbsp;No                   |       147 (94.2%)       |       99 (99.0%)        |       246 (96.1%)       |        |
## |&nbsp;&nbsp;&nbsp;Yes                  |        9 (5.8%)         |        1 (1.0%)         |        10 (3.9%)        |        |
## |**WP_dusty**                           |                         |                         |                         |   0.189|
## |&nbsp;&nbsp;&nbsp;N-Miss               |           645           |           566           |          1211           |        |
## |&nbsp;&nbsp;&nbsp;Often                |        7 (4.5%)         |       10 (10.2%)        |        17 (6.7%)        |        |
## |&nbsp;&nbsp;&nbsp;Rarely/never         |       119 (76.8%)       |       73 (74.5%)        |       192 (75.9%)       |        |
## |&nbsp;&nbsp;&nbsp;Sometimes            |       29 (18.7%)        |       15 (15.3%)        |       44 (17.4%)        |        |
## |**WP_chemicals**                       |                         |                         |                         |   0.182|
## |&nbsp;&nbsp;&nbsp;N-Miss               |           646           |           566           |          1212           |        |
## |&nbsp;&nbsp;&nbsp;Do not know          |        1 (0.6%)         |        2 (2.0%)         |        3 (1.2%)         |        |
## |&nbsp;&nbsp;&nbsp;Often                |        2 (1.3%)         |        5 (5.1%)         |        7 (2.8%)         |        |
## |&nbsp;&nbsp;&nbsp;Rarely/never         |       136 (88.3%)       |       79 (80.6%)        |       215 (85.3%)       |        |
## |&nbsp;&nbsp;&nbsp;Sometimes            |        15 (9.7%)        |       12 (12.2%)        |       27 (10.7%)        |        |
## |**WP_cig**                             |                         |                         |                         |   0.843|
## |&nbsp;&nbsp;&nbsp;N-Miss               |           645           |           566           |          1211           |        |
## |&nbsp;&nbsp;&nbsp;Do not know          |        1 (0.6%)         |        0 (0.0%)         |        1 (0.4%)         |        |
## |&nbsp;&nbsp;&nbsp;Often                |        8 (5.2%)         |        6 (6.1%)         |        14 (5.5%)        |        |
## |&nbsp;&nbsp;&nbsp;Rarely/never         |       115 (74.2%)       |       74 (75.5%)        |       189 (74.7%)       |        |
## |&nbsp;&nbsp;&nbsp;Sometimes            |       31 (20.0%)        |       18 (18.4%)        |       49 (19.4%)        |        |
## |**WP_diesel**                          |                         |                         |                         |   0.383|
## |&nbsp;&nbsp;&nbsp;N-Miss               |           645           |           566           |          1211           |        |
## |&nbsp;&nbsp;&nbsp;Do not know          |        2 (1.3%)         |        3 (3.1%)         |        5 (2.0%)         |        |
## |&nbsp;&nbsp;&nbsp;Often                |        2 (1.3%)         |        4 (4.1%)         |        6 (2.4%)         |        |
## |&nbsp;&nbsp;&nbsp;Rarely/never         |       141 (91.0%)       |       85 (86.7%)        |       226 (89.3%)       |        |
## |&nbsp;&nbsp;&nbsp;Sometimes            |        10 (6.5%)        |        6 (6.1%)         |        16 (6.3%)        |        |
## |**breathing**                          |                         |                         |                         |   0.256|
## |&nbsp;&nbsp;&nbsp;N-Miss               |           645           |           566           |          1211           |        |
## |&nbsp;&nbsp;&nbsp;No                   |       144 (92.9%)       |       87 (88.8%)        |       231 (91.3%)       |        |
## |&nbsp;&nbsp;&nbsp;Yes                  |        11 (7.1%)        |       11 (11.2%)        |        22 (8.7%)        |        |
## |**whistling**                          |                         |                         |                         |   0.834|
## |&nbsp;&nbsp;&nbsp;N-Miss               |            0            |            2            |            2            |        |
## |&nbsp;&nbsp;&nbsp;Do not know          |        25 (3.1%)        |        16 (2.4%)        |        41 (2.8%)        |        |
## |&nbsp;&nbsp;&nbsp;No                   |       527 (65.9%)       |       442 (66.8%)       |       969 (66.3%)       |        |
## |&nbsp;&nbsp;&nbsp;Prefer not to answer |        2 (0.2%)         |        1 (0.2%)         |        3 (0.2%)         |        |
## |&nbsp;&nbsp;&nbsp;Yes                  |       246 (30.8%)       |       203 (30.7%)       |       449 (30.7%)       |        |
## |**diabetes**                           |                         |                         |                         |   0.816|
## |&nbsp;&nbsp;&nbsp;N-Miss               |            0            |            2            |            2            |        |
## |&nbsp;&nbsp;&nbsp;Do not know          |        3 (0.4%)         |        3 (0.5%)         |        6 (0.4%)         |        |
## |&nbsp;&nbsp;&nbsp;No                   |       721 (90.1%)       |       588 (88.8%)       |      1309 (89.5%)       |        |
## |&nbsp;&nbsp;&nbsp;Prefer not to answer |        2 (0.2%)         |        1 (0.2%)         |        3 (0.2%)         |        |
## |&nbsp;&nbsp;&nbsp;Yes                  |        74 (9.2%)        |       70 (10.6%)        |       144 (9.8%)        |        |
## |**sex**                                |                         |                         |                         |   0.043|
## |&nbsp;&nbsp;&nbsp;Female               |       393 (49.1%)       |       291 (43.8%)       |       684 (46.7%)       |        |
## |&nbsp;&nbsp;&nbsp;Male                 |       407 (50.9%)       |       373 (56.2%)       |       780 (53.3%)       |        |
## |**birthYear**                          |                         |                         |                         |   0.396|
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |    1950.341 (8.665)     |    1950.729 (8.737)     |    1950.517 (8.697)     |        |
## |&nbsp;&nbsp;&nbsp;Range                |   1937.000 - 1970.000   |   1937.000 - 1969.000   |   1937.000 - 1970.000   |        |
## |**diaBP**                              |                         |                         |                         |   0.045|
## |&nbsp;&nbsp;&nbsp;N-Miss               |           16            |           21            |           37            |        |
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |     81.922 (10.923)     |     83.075 (10.603)     |     82.441 (10.791)     |        |
## |&nbsp;&nbsp;&nbsp;Range                |    46.000 - 125.000     |    57.000 - 120.000     |    46.000 - 125.000     |        |
## |**sysBP**                              |                         |                         |                         |   0.361|
## |&nbsp;&nbsp;&nbsp;N-Miss               |           17            |           21            |           38            |        |
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |    136.498 (19.270)     |    137.440 (19.499)     |    136.923 (19.372)     |        |
## |&nbsp;&nbsp;&nbsp;Range                |    92.000 - 219.000     |    91.000 - 229.000     |    91.000 - 229.000     |        |
## |**waist**                              |                         |                         |                         |   0.010|
## |&nbsp;&nbsp;&nbsp;N-Miss               |            7            |            5            |           12            |        |
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |     94.368 (15.035)     |     96.393 (14.517)     |     95.287 (14.831)     |        |
## |&nbsp;&nbsp;&nbsp;Range                |    61.000 - 151.000     |    62.000 - 166.000     |    61.000 - 166.000     |        |
## |**hip**                                |                         |                         |                         |   0.142|
## |&nbsp;&nbsp;&nbsp;N-Miss               |            7            |            5            |           12            |        |
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |    104.880 (11.008)     |    105.732 (11.006)     |    105.267 (11.012)     |        |
## |&nbsp;&nbsp;&nbsp;Range                |    83.000 - 158.000     |    82.000 - 172.000     |    82.000 - 172.000     |        |
## |**height**                             |                         |                         |                         |   0.995|
## |&nbsp;&nbsp;&nbsp;N-Miss               |            8            |            7            |           15            |        |
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |     168.848 (9.503)     |     168.851 (9.247)     |     168.849 (9.385)     |        |
## |&nbsp;&nbsp;&nbsp;Range                |    143.000 - 197.000    |    141.000 - 196.000    |    141.000 - 197.000    |        |
## |**age**                                |                         |                         |                         |   0.396|
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |     69.659 (8.665)      |     69.271 (8.737)      |     69.483 (8.697)      |        |
## |&nbsp;&nbsp;&nbsp;Range                |     50.000 - 83.000     |     51.000 - 83.000     |     50.000 - 83.000     |        |
## |**whr**                                |                         |                         |                         |   0.010|
## |&nbsp;&nbsp;&nbsp;N-Miss               |            7            |            5            |           12            |        |
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |      0.898 (0.092)      |      0.910 (0.090)      |      0.904 (0.091)      |        |
## |&nbsp;&nbsp;&nbsp;Range                |      0.642 - 1.211      |      0.660 - 1.194      |      0.642 - 1.211      |        |
## |**highBP**                             |                         |                         |                         |   0.199|
## |&nbsp;&nbsp;&nbsp;N-Miss               |           16            |           21            |           37            |        |
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |      0.189 (0.392)      |      0.216 (0.412)      |      0.201 (0.401)      |        |
## |&nbsp;&nbsp;&nbsp;Range                |      0.000 - 1.000      |      0.000 - 1.000      |      0.000 - 1.000      |        |
## |**inpatient_covid**                    |                         |                         |                         | < 0.001|
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |      0.672 (0.470)      |      0.858 (0.349)      |      0.757 (0.429)      |        |
## |&nbsp;&nbsp;&nbsp;Range                |      0.000 - 1.000      |      0.000 - 1.000      |      0.000 - 1.000      |        |
## |**distance**                           |                         |                         |                         |   0.070|
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |    963.123 (21.328)     |    961.081 (21.525)     |    962.197 (21.434)     |        |
## |&nbsp;&nbsp;&nbsp;Range                |   930.406 - 1001.337    |   930.476 - 1001.203    |   930.406 - 1001.337    |        |
## |**pm25_val**                           |                         |                         |                         | < 0.001|
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |      8.859 (1.853)      |      9.218 (1.924)      |      9.022 (1.893)      |        |
## |&nbsp;&nbsp;&nbsp;Range                |     5.630 - 13.650      |     5.769 - 13.667      |     5.630 - 13.667      |        |
## |**merged_popDens_km2**                 |                         |                         |                         |   0.008|
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |   2770.719 (2909.404)   |   3197.601 (3228.642)   |   2964.332 (3064.644)   |        |
## |&nbsp;&nbsp;&nbsp;Range                |   64.000 - 16097.000    |   64.000 - 16097.000    |   64.000 - 16097.000    |        |
## |**smoker**                             |                         |                         |                         | < 0.001|
## |&nbsp;&nbsp;&nbsp;N-Miss               |            0            |            2            |            2            |        |
## |&nbsp;&nbsp;&nbsp;not_smoking          |       665 (83.1%)       |       594 (89.7%)       |      1259 (86.1%)       |        |
## |&nbsp;&nbsp;&nbsp;smoking              |       135 (16.9%)       |       68 (10.3%)        |       203 (13.9%)       |        |
## |**no2_val**                            |                         |                         |                         | < 0.001|
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |     15.485 (6.074)      |     16.789 (6.469)      |     16.076 (6.288)      |        |
## |&nbsp;&nbsp;&nbsp;Range                |     4.987 - 48.672      |     5.632 - 48.672      |     4.987 - 48.672      |        |
## |**so2_val**                            |                         |                         |                         |   0.027|
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |      1.668 (0.546)      |      1.731 (0.544)      |      1.697 (0.545)      |        |
## |&nbsp;&nbsp;&nbsp;Range                |      0.661 - 5.108      |      0.700 - 4.578      |      0.661 - 5.108      |        |
## |**o3_val**                             |                         |                         |                         |   0.753|
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |      5.621 (2.653)      |      5.576 (2.668)      |      5.601 (2.659)      |        |
## |&nbsp;&nbsp;&nbsp;Range                |     0.000 - 13.188      |     0.000 - 12.731      |     0.000 - 13.188      |        |
## |**nox_val**                            |                         |                         |                         | < 0.001|
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |     21.955 (10.770)     |     24.163 (11.841)     |     22.956 (11.318)     |        |
## |&nbsp;&nbsp;&nbsp;Range                |     6.254 - 95.880      |     7.116 - 95.880      |     6.254 - 95.880      |        |
## |**pm10_val**                           |                         |                         |                         | < 0.001|
## |&nbsp;&nbsp;&nbsp;Mean (SD)            |     13.430 (2.683)      |     13.925 (2.816)      |     13.655 (2.755)      |        |
## |&nbsp;&nbsp;&nbsp;Range                |     8.244 - 21.178      |     8.735 - 21.207      |     8.244 - 21.207      |        |

model with all pollutants: glm

# PM2.5
summary(ukb_covid_pm25.nb <-glm.nb(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
                                     whr + sex + age + pm25_val))
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## 
## Call:
## glm.nb(formula = result ~ merged_popDens_km2 + n_cancers + townsend + 
##     smoking + whistling + diabetes + whr + sex + age + pm25_val, 
##     data = ukb_covid_allPol_df, init.theta = 15229.88441, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2059  -0.9479  -0.8028   0.6589   1.1708  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                   -2.065e+00  8.543e-01  -2.418  0.01562 * 
## merged_popDens_km2            -9.956e-06  1.884e-05  -0.529  0.59711   
## n_cancers                     -1.258e-01  1.272e-01  -0.989  0.32253   
## townsend                       1.354e-02  1.310e-02   1.034  0.30121   
## smokingNever                   3.665e-01  1.379e-01   2.658  0.00785 **
## smokingPrefer not to answer    7.608e-01  4.478e-01   1.699  0.08929 . 
## smokingPrevious                4.056e-01  1.377e-01   2.946  0.00322 **
## whistlingNo                    2.671e-01  2.698e-01   0.990  0.32220   
## whistlingPrefer not to answer  4.568e-02  1.276e+00   0.036  0.97145   
## whistlingYes                   2.533e-01  2.742e-01   0.924  0.35559   
## diabetesNo                    -1.619e-01  6.106e-01  -0.265  0.79085   
## diabetesPrefer not to answer  -6.877e-01  1.433e+00  -0.480  0.63140   
## diabetesYes                   -1.956e-01  6.195e-01  -0.316  0.75215   
## whr                            7.856e-01  5.839e-01   1.346  0.17846   
## sexMale                        3.340e-02  1.000e-01   0.334  0.73838   
## age                           -5.681e-03  4.748e-03  -1.196  0.23153   
## pm25_val                       6.035e-02  2.866e-02   2.106  0.03520 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(15229.88) family taken to be 1)
## 
##     Null deviance: 1040.2  on 1449  degrees of freedom
## Residual deviance: 1015.6  on 1433  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 2365.6
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  15230 
##           Std. Err.:  58208 
## Warning while fitting theta: iteration limit reached 
## 
##  2 x log-likelihood:  -2329.646
summary(ukb_covid_pm25.p <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
                                 whr + sex + age + pm25_val, family = 'poisson'))
## 
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend + 
##     smoking + whistling + diabetes + whr + sex + age + pm25_val, 
##     family = "poisson", data = ukb_covid_allPol_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2060  -0.9479  -0.8028   0.6589   1.1708  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                   -2.065e+00  8.543e-01  -2.418  0.01562 * 
## merged_popDens_km2            -9.956e-06  1.883e-05  -0.529  0.59711   
## n_cancers                     -1.258e-01  1.272e-01  -0.989  0.32252   
## townsend                       1.354e-02  1.310e-02   1.034  0.30120   
## smokingNever                   3.665e-01  1.379e-01   2.658  0.00785 **
## smokingPrefer not to answer    7.609e-01  4.478e-01   1.699  0.08928 . 
## smokingPrevious                4.056e-01  1.377e-01   2.946  0.00321 **
## whistlingNo                    2.671e-01  2.698e-01   0.990  0.32219   
## whistlingPrefer not to answer  4.567e-02  1.276e+00   0.036  0.97145   
## whistlingYes                   2.533e-01  2.742e-01   0.924  0.35558   
## diabetesNo                    -1.619e-01  6.105e-01  -0.265  0.79084   
## diabetesPrefer not to answer  -6.877e-01  1.433e+00  -0.480  0.63138   
## diabetesYes                   -1.956e-01  6.194e-01  -0.316  0.75214   
## whr                            7.856e-01  5.839e-01   1.346  0.17845   
## sexMale                        3.340e-02  1.000e-01   0.334  0.73837   
## age                           -5.681e-03  4.748e-03  -1.196  0.23151   
## pm25_val                       6.035e-02  2.866e-02   2.106  0.03520 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1040.2  on 1449  degrees of freedom
## Residual deviance: 1015.6  on 1433  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 2363.6
## 
## Number of Fisher Scoring iterations: 5
car::vif(ukb_covid_pm25.nb)
##                        GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 2.443689  1        1.563230
## n_cancers          1.019337  1        1.009622
## townsend           1.442148  1        1.200895
## smoking            1.259363  3        1.039182
## whistling          1.875826  3        1.110535
## diabetes           2.049216  3        1.127019
## whr                1.850854  1        1.360461
## sex                1.619879  1        1.272744
## age                1.138711  1        1.067104
## pm25_val           2.011763  1        1.418366
# according to http://www.jstor.org/stable/2290467, GVIF^(1/(2*Df)) < 5 is ok 
# model reduction: 

summary(ukb_covid_pm25.nb_red <-glm.nb(data = ukb_covid_pm25_popDens_df, result ~ merged_popDens_km2 + smoking +pm25_val))
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## 
## Call:
## glm.nb(formula = result ~ merged_popDens_km2 + smoking + pm25_val, 
##     data = ukb_covid_pm25_popDens_df, init.theta = 15585.93096, 
##     link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1217  -0.9499  -0.8075   0.6755   1.0620  
## 
## Coefficients:
##                               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                 -1.639e+00  2.629e-01  -6.235 4.52e-10 ***
## merged_popDens_km2          -9.261e-07  1.680e-05  -0.055  0.95604    
## smokingNever                 3.369e-01  1.344e-01   2.507  0.01216 *  
## smokingPrefer not to answer  6.113e-01  4.261e-01   1.435  0.15139    
## smokingPrevious              3.899e-01  1.353e-01   2.882  0.00395 ** 
## pm25_val                     5.801e-02  2.833e-02   2.048  0.04055 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(15585.93) family taken to be 1)
## 
##     Null deviance: 1049.0  on 1461  degrees of freedom
## Residual deviance: 1032.7  on 1456  degrees of freedom
##   (2 observations deleted due to missingness)
## AIC: 2370.7
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  15586 
##           Std. Err.:  60471 
## Warning while fitting theta: iteration limit reached 
## 
##  2 x log-likelihood:  -2356.739
summary(ukb_covid_pm25.p_red <-glm(data = ukb_covid_pm25_popDens_df, result ~ merged_popDens_km2 + smoking + pm25_val, family = 'poisson'))
## 
## Call:
## glm(formula = result ~ merged_popDens_km2 + smoking + pm25_val, 
##     family = "poisson", data = ukb_covid_pm25_popDens_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1218  -0.9499  -0.8075   0.6756   1.0620  
## 
## Coefficients:
##                               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                 -1.639e+00  2.629e-01  -6.235 4.52e-10 ***
## merged_popDens_km2          -9.261e-07  1.680e-05  -0.055  0.95604    
## smokingNever                 3.369e-01  1.344e-01   2.508  0.01216 *  
## smokingPrefer not to answer  6.113e-01  4.261e-01   1.435  0.15138    
## smokingPrevious              3.899e-01  1.353e-01   2.882  0.00395 ** 
## pm25_val                     5.801e-02  2.832e-02   2.048  0.04055 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1049.0  on 1461  degrees of freedom
## Residual deviance: 1032.7  on 1456  degrees of freedom
##   (2 observations deleted due to missingness)
## AIC: 2368.7
## 
## Number of Fisher Scoring iterations: 5
#check 
anova(ukb_covid_pm25.nb, ukb_covid_pm25.nb_red) # p < 0.05 -> not significantly different 
## Likelihood ratio tests of Negative Binomial Models
## 
## Response: result
##                                                                                                     Model
## 1                                                                 merged_popDens_km2 + smoking + pm25_val
## 2 merged_popDens_km2 + n_cancers + townsend + smoking + whistling + diabetes + whr + sex + age + pm25_val
##      theta Resid. df    2 x log-lik.   Test    df LR stat.   Pr(Chi)
## 1 15585.93      1456       -2356.739                                
## 2 15229.88      1433       -2329.646 1 vs 2    23  27.0924 0.2520556
#PM10 
summary(ukb_covid_pm10.nb <-glm.nb(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
                                     whr + sex + age + pm10_val))
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## 
## Call:
## glm.nb(formula = result ~ merged_popDens_km2 + n_cancers + townsend + 
##     smoking + whistling + diabetes + whr + sex + age + pm10_val, 
##     data = ukb_covid_allPol_df, init.theta = 15278.17332, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2226  -0.9484  -0.8030   0.6608   1.1713  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                   -2.035e+00  8.581e-01  -2.371  0.01774 * 
## merged_popDens_km2            -9.198e-06  1.962e-05  -0.469  0.63913   
## n_cancers                     -1.255e-01  1.272e-01  -0.987  0.32375   
## townsend                       1.338e-02  1.310e-02   1.021  0.30706   
## smokingNever                   3.635e-01  1.378e-01   2.637  0.00836 **
## smokingPrefer not to answer    7.550e-01  4.479e-01   1.685  0.09189 . 
## smokingPrevious                4.008e-01  1.376e-01   2.913  0.00358 **
## whistlingNo                    2.608e-01  2.697e-01   0.967  0.33356   
## whistlingPrefer not to answer  4.399e-02  1.272e+00   0.035  0.97242   
## whistlingYes                   2.481e-01  2.741e-01   0.905  0.36539   
## diabetesNo                    -1.706e-01  6.104e-01  -0.279  0.77992   
## diabetesPrefer not to answer  -6.936e-01  1.430e+00  -0.485  0.62768   
## diabetesYes                   -2.053e-01  6.193e-01  -0.332  0.74021   
## whr                            7.986e-01  5.841e-01   1.367  0.17153   
## sexMale                        3.228e-02  1.000e-01   0.323  0.74690   
## age                           -5.740e-03  4.748e-03  -1.209  0.22678   
## pm10_val                       3.829e-02  2.056e-02   1.862  0.06254 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(15278.17) family taken to be 1)
## 
##     Null deviance: 1040.2  on 1449  degrees of freedom
## Residual deviance: 1016.5  on 1433  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 2366.6
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  15278 
##           Std. Err.:  58535 
## Warning while fitting theta: iteration limit reached 
## 
##  2 x log-likelihood:  -2330.591
summary(ukb_covid_pm10.p <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
                                 whr + sex + age + pm10_val, family = 'poisson'))
## 
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend + 
##     smoking + whistling + diabetes + whr + sex + age + pm10_val, 
##     family = "poisson", data = ukb_covid_allPol_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2226  -0.9484  -0.8030   0.6608   1.1713  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                   -2.035e+00  8.581e-01  -2.371  0.01774 * 
## merged_popDens_km2            -9.198e-06  1.962e-05  -0.469  0.63912   
## n_cancers                     -1.255e-01  1.272e-01  -0.987  0.32374   
## townsend                       1.338e-02  1.310e-02   1.021  0.30705   
## smokingNever                   3.635e-01  1.378e-01   2.637  0.00836 **
## smokingPrefer not to answer    7.550e-01  4.479e-01   1.686  0.09188 . 
## smokingPrevious                4.008e-01  1.376e-01   2.913  0.00358 **
## whistlingNo                    2.608e-01  2.697e-01   0.967  0.33355   
## whistlingPrefer not to answer  4.399e-02  1.272e+00   0.035  0.97242   
## whistlingYes                   2.481e-01  2.741e-01   0.905  0.36538   
## diabetesNo                    -1.706e-01  6.104e-01  -0.279  0.77992   
## diabetesPrefer not to answer  -6.936e-01  1.430e+00  -0.485  0.62766   
## diabetesYes                   -2.053e-01  6.193e-01  -0.332  0.74020   
## whr                            7.986e-01  5.841e-01   1.367  0.17152   
## sexMale                        3.228e-02  1.000e-01   0.323  0.74690   
## age                           -5.740e-03  4.748e-03  -1.209  0.22676   
## pm10_val                       3.829e-02  2.056e-02   1.863  0.06253 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1040.2  on 1449  degrees of freedom
## Residual deviance: 1016.6  on 1433  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 2364.6
## 
## Number of Fisher Scoring iterations: 5
car::vif(ukb_covid_pm10.nb)
##                        GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 2.690058  1        1.640140
## n_cancers          1.019347  1        1.009627
## townsend           1.448761  1        1.203645
## smoking            1.258582  3        1.039075
## whistling          1.863927  3        1.109357
## diabetes           2.037411  3        1.125934
## whr                1.852457  1        1.361050
## sex                1.620823  1        1.273115
## age                1.139335  1        1.067396
## pm10_val           2.234526  1        1.494833
#nox
summary(ukb_covid_nox.nb <-glm.nb(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
                                     whr + sex + age + nox_val))
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## 
## Call:
## glm.nb(formula = result ~ merged_popDens_km2 + n_cancers + townsend + 
##     smoking + whistling + diabetes + whr + sex + age + nox_val, 
##     data = ukb_covid_allPol_df, init.theta = 15323.76936, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3845  -0.9475  -0.8039   0.6603   1.1833  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                   -1.764e+00  8.293e-01  -2.127  0.03339 * 
## merged_popDens_km2            -1.289e-05  2.130e-05  -0.605  0.54515   
## n_cancers                     -1.220e-01  1.271e-01  -0.960  0.33727   
## townsend                       6.366e-03  1.337e-02   0.476  0.63384   
## smokingNever                   3.541e-01  1.377e-01   2.571  0.01015 * 
## smokingPrefer not to answer    7.226e-01  4.491e-01   1.609  0.10759   
## smokingPrevious                3.917e-01  1.374e-01   2.851  0.00436 **
## whistlingNo                    2.591e-01  2.697e-01   0.961  0.33670   
## whistlingPrefer not to answer  2.364e-03  1.245e+00   0.002  0.99849   
## whistlingYes                   2.438e-01  2.741e-01   0.890  0.37362   
## diabetesNo                    -1.971e-01  6.103e-01  -0.323  0.74668   
## diabetesPrefer not to answer  -7.274e-01  1.408e+00  -0.516  0.60552   
## diabetesYes                   -2.279e-01  6.191e-01  -0.368  0.71277   
## whr                            8.545e-01  5.847e-01   1.462  0.14387   
## sexMale                        1.812e-02  1.004e-01   0.180  0.85680   
## age                           -5.503e-03  4.747e-03  -1.159  0.24641   
## nox_val                        1.043e-02  5.565e-03   1.874  0.06097 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(15323.77) family taken to be 1)
## 
##     Null deviance: 1040.2  on 1449  degrees of freedom
## Residual deviance: 1016.6  on 1433  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 2366.6
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  15324 
##           Std. Err.:  58831 
## Warning while fitting theta: iteration limit reached 
## 
##  2 x log-likelihood:  -2330.647
summary(ukb_covid_nox.p <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
                                 whr + sex + age + nox_val, family = 'poisson'))
## 
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend + 
##     smoking + whistling + diabetes + whr + sex + age + nox_val, 
##     family = "poisson", data = ukb_covid_allPol_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3845  -0.9476  -0.8039   0.6603   1.1833  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                   -1.764e+00  8.293e-01  -2.127  0.03339 * 
## merged_popDens_km2            -1.288e-05  2.130e-05  -0.605  0.54515   
## n_cancers                     -1.220e-01  1.271e-01  -0.960  0.33726   
## townsend                       6.366e-03  1.337e-02   0.476  0.63383   
## smokingNever                   3.541e-01  1.377e-01   2.571  0.01015 * 
## smokingPrefer not to answer    7.226e-01  4.491e-01   1.609  0.10758   
## smokingPrevious                3.917e-01  1.374e-01   2.851  0.00436 **
## whistlingNo                    2.591e-01  2.697e-01   0.961  0.33668   
## whistlingPrefer not to answer  2.361e-03  1.245e+00   0.002  0.99849   
## whistlingYes                   2.438e-01  2.741e-01   0.890  0.37360   
## diabetesNo                    -1.971e-01  6.103e-01  -0.323  0.74667   
## diabetesPrefer not to answer  -7.274e-01  1.408e+00  -0.517  0.60549   
## diabetesYes                   -2.279e-01  6.190e-01  -0.368  0.71277   
## whr                            8.545e-01  5.847e-01   1.462  0.14387   
## sexMale                        1.812e-02  1.004e-01   0.180  0.85679   
## age                           -5.503e-03  4.747e-03  -1.159  0.24640   
## nox_val                        1.043e-02  5.565e-03   1.874  0.06097 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1040.2  on 1449  degrees of freedom
## Residual deviance: 1016.6  on 1433  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 2364.6
## 
## Number of Fisher Scoring iterations: 5
car::vif(ukb_covid_nox.nb)
##                        GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 3.190492  1        1.786195
## n_cancers          1.020066  1        1.009983
## townsend           1.522514  1        1.233902
## smoking            1.265027  3        1.039960
## whistling          1.781014  3        1.100976
## diabetes           1.961134  3        1.118797
## whr                1.848865  1        1.359730
## sex                1.633899  1        1.278241
## age                1.141394  1        1.068361
## nox_val            3.273151  1        1.809185
#no2
summary(ukb_covid_no2.nb <-glm.nb(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
                                     whr + sex + age + no2_val))
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## 
## Call:
## glm.nb(formula = result ~ merged_popDens_km2 + n_cancers + townsend + 
##     smoking + whistling + diabetes + whr + sex + age + no2_val, 
##     data = ukb_covid_allPol_df, init.theta = 15256.03065, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3433  -0.9455  -0.8015   0.6594   1.1994  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                   -1.867e+00  8.335e-01  -2.241  0.02506 * 
## merged_popDens_km2            -1.812e-05  2.128e-05  -0.852  0.39434   
## n_cancers                     -1.211e-01  1.271e-01  -0.953  0.34075   
## townsend                       5.084e-03  1.340e-02   0.379  0.70433   
## smokingNever                   3.554e-01  1.377e-01   2.580  0.00987 **
## smokingPrefer not to answer    7.209e-01  4.493e-01   1.605  0.10860   
## smokingPrevious                3.954e-01  1.375e-01   2.876  0.00402 **
## whistlingNo                    2.585e-01  2.700e-01   0.957  0.33843   
## whistlingPrefer not to answer -1.309e-02  1.238e+00  -0.011  0.99156   
## whistlingYes                   2.449e-01  2.744e-01   0.892  0.37217   
## diabetesNo                    -1.911e-01  6.113e-01  -0.313  0.75451   
## diabetesPrefer not to answer  -7.362e-01  1.403e+00  -0.525  0.59967   
## diabetesYes                   -2.218e-01  6.200e-01  -0.358  0.72050   
## whr                            8.312e-01  5.848e-01   1.421  0.15525   
## sexMale                        1.920e-02  1.003e-01   0.191  0.84815   
## age                           -5.471e-03  4.748e-03  -1.152  0.24919   
## no2_val                        2.283e-02  1.039e-02   2.197  0.02804 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(15256.03) family taken to be 1)
## 
##     Null deviance: 1040.2  on 1449  degrees of freedom
## Residual deviance: 1015.3  on 1433  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 2365.3
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  15256 
##           Std. Err.:  58376 
## Warning while fitting theta: iteration limit reached 
## 
##  2 x log-likelihood:  -2329.3
summary(ukb_covid_no2.p <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
                                 whr + sex + age + no2_val, family = 'poisson'))
## 
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend + 
##     smoking + whistling + diabetes + whr + sex + age + no2_val, 
##     family = "poisson", data = ukb_covid_allPol_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3434  -0.9455  -0.8015   0.6594   1.1994  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                   -1.867e+00  8.335e-01  -2.241  0.02505 * 
## merged_popDens_km2            -1.812e-05  2.127e-05  -0.852  0.39433   
## n_cancers                     -1.211e-01  1.271e-01  -0.953  0.34074   
## townsend                       5.084e-03  1.340e-02   0.379  0.70432   
## smokingNever                   3.554e-01  1.377e-01   2.580  0.00987 **
## smokingPrefer not to answer    7.209e-01  4.493e-01   1.605  0.10859   
## smokingPrevious                3.954e-01  1.375e-01   2.877  0.00402 **
## whistlingNo                    2.585e-01  2.700e-01   0.957  0.33841   
## whistlingPrefer not to answer -1.310e-02  1.238e+00  -0.011  0.99156   
## whistlingYes                   2.449e-01  2.744e-01   0.892  0.37215   
## diabetesNo                    -1.911e-01  6.113e-01  -0.313  0.75450   
## diabetesPrefer not to answer  -7.362e-01  1.402e+00  -0.525  0.59963   
## diabetesYes                   -2.218e-01  6.200e-01  -0.358  0.72049   
## whr                            8.312e-01  5.848e-01   1.421  0.15524   
## sexMale                        1.920e-02  1.003e-01   0.191  0.84815   
## age                           -5.471e-03  4.747e-03  -1.152  0.24918   
## no2_val                        2.283e-02  1.039e-02   2.197  0.02804 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1040.2  on 1449  degrees of freedom
## Residual deviance: 1015.3  on 1433  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 2363.3
## 
## Number of Fisher Scoring iterations: 5
car::vif(ukb_covid_no2.nb)
##                        GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 3.144160  1        1.773178
## n_cancers          1.020039  1        1.009970
## townsend           1.523933  1        1.234477
## smoking            1.267137  3        1.040249
## whistling          1.763147  3        1.099128
## diabetes           1.944539  3        1.117213
## whr                1.847941  1        1.359390
## sex                1.628714  1        1.276211
## age                1.141009  1        1.068180
## no2_val            3.255183  1        1.804212
#o3
summary(ukb_covid_o3.nb <-glm.nb(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
                                    whr + sex + age + o3_val))
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## 
## Call:
## glm.nb(formula = result ~ merged_popDens_km2 + n_cancers + townsend + 
##     smoking + whistling + diabetes + whr + sex + age + o3_val, 
##     data = ukb_covid_allPol_df, init.theta = 15368.98351, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2076  -0.9503  -0.8062   0.6636   1.1391  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                   -1.630e+00  8.293e-01  -1.966  0.04931 * 
## merged_popDens_km2             1.614e-05  1.399e-05   1.154  0.24867   
## n_cancers                     -1.289e-01  1.274e-01  -1.012  0.31151   
## townsend                       1.333e-02  1.346e-02   0.990  0.32206   
## smokingNever                   3.585e-01  1.381e-01   2.596  0.00942 **
## smokingPrefer not to answer    7.494e-01  4.493e-01   1.668  0.09528 . 
## smokingPrevious                3.905e-01  1.377e-01   2.837  0.00455 **
## whistlingNo                    2.435e-01  2.687e-01   0.906  0.36477   
## whistlingPrefer not to answer  2.093e-02  1.278e+00   0.016  0.98693   
## whistlingYes                   2.276e-01  2.730e-01   0.834  0.40454   
## diabetesNo                    -1.858e-01  6.083e-01  -0.305  0.76009   
## diabetesPrefer not to answer  -6.932e-01  1.435e+00  -0.483  0.62909   
## diabetesYes                   -2.278e-01  6.172e-01  -0.369  0.71205   
## whr                            8.310e-01  5.842e-01   1.422  0.15489   
## sexMale                        3.618e-02  1.002e-01   0.361  0.71815   
## age                           -5.563e-03  4.749e-03  -1.171  0.24143   
## o3_val                         7.573e-03  1.544e-02   0.491  0.62369   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(15368.98) family taken to be 1)
## 
##     Null deviance: 1040.2  on 1449  degrees of freedom
## Residual deviance: 1019.8  on 1433  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 2369.8
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  15369 
##           Std. Err.:  59189 
## Warning while fitting theta: iteration limit reached 
## 
##  2 x log-likelihood:  -2333.797
summary(ukb_covid_o3.p <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
                                whr + sex + age + o3_val, family = 'poisson'))
## 
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend + 
##     smoking + whistling + diabetes + whr + sex + age + o3_val, 
##     family = "poisson", data = ukb_covid_allPol_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2076  -0.9503  -0.8062   0.6636   1.1391  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                   -1.630e+00  8.292e-01  -1.966  0.04931 * 
## merged_popDens_km2             1.614e-05  1.399e-05   1.154  0.24866   
## n_cancers                     -1.289e-01  1.274e-01  -1.012  0.31150   
## townsend                       1.333e-02  1.346e-02   0.990  0.32205   
## smokingNever                   3.585e-01  1.381e-01   2.596  0.00942 **
## smokingPrefer not to answer    7.494e-01  4.492e-01   1.668  0.09527 . 
## smokingPrevious                3.905e-01  1.377e-01   2.837  0.00455 **
## whistlingNo                    2.435e-01  2.687e-01   0.906  0.36476   
## whistlingPrefer not to answer  2.092e-02  1.278e+00   0.016  0.98693   
## whistlingYes                   2.276e-01  2.730e-01   0.834  0.40453   
## diabetesNo                    -1.858e-01  6.083e-01  -0.305  0.76008   
## diabetesPrefer not to answer  -6.932e-01  1.435e+00  -0.483  0.62906   
## diabetesYes                   -2.278e-01  6.172e-01  -0.369  0.71205   
## whr                            8.310e-01  5.842e-01   1.423  0.15488   
## sexMale                        3.618e-02  1.002e-01   0.361  0.71814   
## age                           -5.563e-03  4.749e-03  -1.171  0.24142   
## o3_val                         7.573e-03  1.544e-02   0.491  0.62368   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1040.2  on 1449  degrees of freedom
## Residual deviance: 1019.8  on 1433  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 2367.8
## 
## Number of Fisher Scoring iterations: 5
car::vif(ukb_covid_o3.nb)
##                        GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 1.370137  1        1.170528
## n_cancers          1.019831  1        1.009867
## townsend           1.535336  1        1.239087
## smoking            1.267302  3        1.040271
## whistling          1.867480  3        1.109710
## diabetes           2.047511  3        1.126863
## whr                1.856088  1        1.362383
## sex                1.627746  1        1.275831
## age                1.140751  1        1.068060
## o3_val             1.098307  1        1.048001
#so2
summary(ukb_covid_so2.nb <-glm.nb(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
                                   whr + sex + age + so2_val))
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## 
## Call:
## glm.nb(formula = result ~ merged_popDens_km2 + n_cancers + townsend + 
##     smoking + whistling + diabetes + whr + sex + age + so2_val, 
##     data = ukb_covid_allPol_df, init.theta = 15326.12162, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2325  -0.9485  -0.8076   0.6604   1.2053  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                   -1.721e+00  8.342e-01  -2.063  0.03915 * 
## merged_popDens_km2             1.462e-05  1.409e-05   1.037  0.29962   
## n_cancers                     -1.272e-01  1.274e-01  -0.998  0.31824   
## townsend                       8.033e-03  1.345e-02   0.597  0.55043   
## smokingNever                   3.566e-01  1.378e-01   2.587  0.00967 **
## smokingPrefer not to answer    7.159e-01  4.487e-01   1.596  0.11060   
## smokingPrevious                3.901e-01  1.375e-01   2.837  0.00456 **
## whistlingNo                    2.472e-01  2.689e-01   0.920  0.35778   
## whistlingPrefer not to answer -7.566e-03  1.260e+00  -0.006  0.99521   
## whistlingYes                   2.310e-01  2.732e-01   0.846  0.39781   
## diabetesNo                    -1.926e-01  6.089e-01  -0.316  0.75176   
## diabetesPrefer not to answer  -6.932e-01  1.420e+00  -0.488  0.62534   
## diabetesYes                   -2.308e-01  6.178e-01  -0.374  0.70864   
## whr                            8.300e-01  5.845e-01   1.420  0.15560   
## sexMale                        2.495e-02  1.004e-01   0.248  0.80378   
## age                           -5.453e-03  4.748e-03  -1.148  0.25078   
## so2_val                        8.195e-02  7.699e-02   1.065  0.28709   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(15326.12) family taken to be 1)
## 
##     Null deviance: 1040.2  on 1449  degrees of freedom
## Residual deviance: 1018.9  on 1433  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 2368.9
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  15326 
##           Std. Err.:  58891 
## Warning while fitting theta: iteration limit reached 
## 
##  2 x log-likelihood:  -2332.922
summary(ukb_covid_so2.p <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
                               whr + sex + age + so2_val, family = 'poisson'))
## 
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend + 
##     smoking + whistling + diabetes + whr + sex + age + so2_val, 
##     family = "poisson", data = ukb_covid_allPol_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2325  -0.9485  -0.8076   0.6605   1.2054  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                   -1.721e+00  8.342e-01  -2.063  0.03915 * 
## merged_popDens_km2             1.462e-05  1.409e-05   1.037  0.29961   
## n_cancers                     -1.272e-01  1.274e-01  -0.998  0.31823   
## townsend                       8.033e-03  1.345e-02   0.597  0.55042   
## smokingNever                   3.566e-01  1.378e-01   2.587  0.00967 **
## smokingPrefer not to answer    7.159e-01  4.487e-01   1.596  0.11059   
## smokingPrevious                3.901e-01  1.375e-01   2.837  0.00456 **
## whistlingNo                    2.472e-01  2.689e-01   0.920  0.35776   
## whistlingPrefer not to answer -7.570e-03  1.260e+00  -0.006  0.99521   
## whistlingYes                   2.310e-01  2.732e-01   0.846  0.39779   
## diabetesNo                    -1.926e-01  6.089e-01  -0.316  0.75175   
## diabetesPrefer not to answer  -6.933e-01  1.420e+00  -0.488  0.62532   
## diabetesYes                   -2.309e-01  6.178e-01  -0.374  0.70863   
## whr                            8.300e-01  5.845e-01   1.420  0.15559   
## sexMale                        2.495e-02  1.004e-01   0.248  0.80377   
## age                           -5.453e-03  4.748e-03  -1.148  0.25077   
## so2_val                        8.195e-02  7.698e-02   1.065  0.28708   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1040.2  on 1449  degrees of freedom
## Residual deviance: 1018.9  on 1433  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 2366.9
## 
## Number of Fisher Scoring iterations: 5
car::vif(ukb_covid_so2.nb)
##                        GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 1.386609  1        1.177543
## n_cancers          1.019429  1        1.009668
## townsend           1.534094  1        1.238585
## smoking            1.264270  3        1.039856
## whistling          1.816373  3        1.104589
## diabetes           1.991278  3        1.121645
## whr                1.852603  1        1.361103
## sex                1.633541  1        1.278100
## age                1.141044  1        1.068196
## so2_val            1.185954  1        1.089015

model with all pollutants: binary

binary model to account for the fact that the response variable is 1/0

# PM2.5
summary(ukb_covid_pm25.b <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
                                 whr + sex + age + pm25_val, family = 'binomial'))
## 
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend + 
##     smoking + whistling + diabetes + whr + sex + age + pm25_val, 
##     family = "binomial", data = ukb_covid_allPol_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5015  -1.0980  -0.8723   1.2076   1.7802  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   -2.495e+00  1.201e+00  -2.078 0.037729 *  
## merged_popDens_km2            -1.665e-05  2.673e-05  -0.623 0.533410    
## n_cancers                     -2.279e-01  1.656e-01  -1.376 0.168882    
## townsend                       2.553e-02  1.803e-02   1.416 0.156877    
## smokingNever                   6.317e-01  1.760e-01   3.589 0.000332 ***
## smokingPrefer not to answer    1.517e+00  7.460e-01   2.033 0.042046 *  
## smokingPrevious                7.049e-01  1.764e-01   3.996 6.45e-05 ***
## whistlingNo                    4.731e-01  3.493e-01   1.355 0.175573    
## whistlingPrefer not to answer  1.233e-01  1.636e+00   0.075 0.939912    
## whistlingYes                   4.484e-01  3.556e-01   1.261 0.207277    
## diabetesNo                    -2.793e-01  8.588e-01  -0.325 0.744986    
## diabetesPrefer not to answer  -1.350e+00  1.910e+00  -0.707 0.479679    
## diabetesYes                   -3.466e-01  8.712e-01  -0.398 0.690727    
## whr                            1.465e+00  8.070e-01   1.816 0.069422 .  
## sexMale                        6.331e-02  1.375e-01   0.461 0.645148    
## age                           -1.069e-02  6.540e-03  -1.634 0.102234    
## pm25_val                       1.131e-01  3.964e-02   2.853 0.004331 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1997.4  on 1449  degrees of freedom
## Residual deviance: 1952.5  on 1433  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 1986.5
## 
## Number of Fisher Scoring iterations: 4
car::vif(ukb_covid_pm25.b)
##                        GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 2.324259  1        1.524552
## n_cancers          1.020306  1        1.010102
## townsend           1.406078  1        1.185782
## smoking            1.361982  3        1.052839
## whistling          1.946471  3        1.117398
## diabetes           2.222505  3        1.142371
## whr                1.864962  1        1.365636
## sex                1.637204  1        1.279533
## age                1.132009  1        1.063959
## pm25_val           1.946202  1        1.395064
#PM10 
summary(ukb_covid_pm10.b <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
                                 whr + sex + age + pm10_val, family = 'binomial'))
## 
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend + 
##     smoking + whistling + diabetes + whr + sex + age + pm10_val, 
##     family = "binomial", data = ukb_covid_allPol_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5265  -1.1001  -0.8752   1.2106   1.7815  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   -2.436e+00  1.204e+00  -2.023 0.043114 *  
## merged_popDens_km2            -1.452e-05  2.764e-05  -0.525 0.599243    
## n_cancers                     -2.270e-01  1.656e-01  -1.371 0.170288    
## townsend                       2.519e-02  1.802e-02   1.398 0.162234    
## smokingNever                   6.253e-01  1.758e-01   3.556 0.000376 ***
## smokingPrefer not to answer    1.503e+00  7.454e-01   2.017 0.043730 *  
## smokingPrevious                6.950e-01  1.761e-01   3.946 7.95e-05 ***
## whistlingNo                    4.602e-01  3.491e-01   1.318 0.187409    
## whistlingPrefer not to answer  1.219e-01  1.635e+00   0.075 0.940573    
## whistlingYes                   4.382e-01  3.554e-01   1.233 0.217639    
## diabetesNo                    -2.959e-01  8.585e-01  -0.345 0.730345    
## diabetesPrefer not to answer  -1.361e+00  1.909e+00  -0.713 0.475962    
## diabetesYes                   -3.652e-01  8.709e-01  -0.419 0.674974    
## whr                            1.483e+00  8.066e-01   1.839 0.065893 .  
## sexMale                        6.211e-02  1.374e-01   0.452 0.651265    
## age                           -1.075e-02  6.536e-03  -1.645 0.099922 .  
## pm10_val                       7.170e-02  2.826e-02   2.537 0.011168 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1997.4  on 1449  degrees of freedom
## Residual deviance: 1954.2  on 1433  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 1988.2
## 
## Number of Fisher Scoring iterations: 4
car::vif(ukb_covid_pm10.b)
##                        GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 2.479156  1        1.574534
## n_cancers          1.020245  1        1.010072
## townsend           1.405143  1        1.185387
## smoking            1.359251  3        1.052487
## whistling          1.938463  3        1.116631
## diabetes           2.215023  3        1.141729
## whr                1.865545  1        1.365850
## sex                1.637883  1        1.279798
## age                1.132501  1        1.064190
## pm10_val           2.089322  1        1.445449
#nox
summary(ukb_covid_nox.b <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
                                whr + sex + age + nox_val, family = 'binomial'))
## 
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend + 
##     smoking + whistling + diabetes + whr + sex + age + nox_val, 
##     family = "binomial", data = ukb_covid_allPol_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8158  -1.0981  -0.8681   1.2109   1.7967  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   -1.971e+00  1.167e+00  -1.688 0.091373 .  
## merged_popDens_km2            -2.821e-05  3.058e-05  -0.922 0.356330    
## n_cancers                     -2.220e-01  1.656e-01  -1.341 0.179989    
## townsend                       1.120e-02  1.841e-02   0.608 0.542937    
## smokingNever                   6.121e-01  1.758e-01   3.483 0.000497 ***
## smokingPrefer not to answer    1.418e+00  7.419e-01   1.911 0.056052 .  
## smokingPrevious                6.853e-01  1.761e-01   3.893 9.92e-05 ***
## whistlingNo                    4.534e-01  3.511e-01   1.291 0.196604    
## whistlingPrefer not to answer  3.230e-02  1.630e+00   0.020 0.984188    
## whistlingYes                   4.300e-01  3.574e-01   1.203 0.228927    
## diabetesNo                    -3.427e-01  8.598e-01  -0.399 0.690220    
## diabetesPrefer not to answer  -1.409e+00  1.907e+00  -0.739 0.460161    
## diabetesYes                   -4.073e-01  8.721e-01  -0.467 0.640507    
## whr                            1.558e+00  8.069e-01   1.930 0.053549 .  
## sexMale                        3.870e-02  1.378e-01   0.281 0.778826    
## age                           -1.023e-02  6.537e-03  -1.564 0.117746    
## nox_val                        2.286e-02  8.580e-03   2.665 0.007708 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1997.4  on 1449  degrees of freedom
## Residual deviance: 1953.4  on 1433  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 1987.4
## 
## Number of Fisher Scoring iterations: 4
#no2
summary(ukb_covid_no2.b <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
                                whr + sex + age + no2_val, family = 'binomial'))
## 
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend + 
##     smoking + whistling + diabetes + whr + sex + age + no2_val, 
##     family = "binomial", data = ukb_covid_allPol_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7152  -1.0980  -0.8676   1.2074   1.8112  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   -2.146e+00  1.175e+00  -1.826 0.067787 .  
## merged_popDens_km2            -3.474e-05  3.012e-05  -1.153 0.248722    
## n_cancers                     -2.206e-01  1.657e-01  -1.332 0.182872    
## townsend                       9.422e-03  1.844e-02   0.511 0.609470    
## smokingNever                   6.136e-01  1.759e-01   3.489 0.000485 ***
## smokingPrefer not to answer    1.414e+00  7.420e-01   1.906 0.056635 .  
## smokingPrevious                6.904e-01  1.762e-01   3.918 8.94e-05 ***
## whistlingNo                    4.519e-01  3.508e-01   1.288 0.197732    
## whistlingPrefer not to answer  5.918e-03  1.629e+00   0.004 0.997102    
## whistlingYes                   4.301e-01  3.571e-01   1.204 0.228481    
## diabetesNo                    -3.365e-01  8.622e-01  -0.390 0.696330    
## diabetesPrefer not to answer  -1.424e+00  1.908e+00  -0.746 0.455449    
## diabetesYes                   -4.023e-01  8.745e-01  -0.460 0.645508    
## whr                            1.520e+00  8.073e-01   1.883 0.059643 .  
## sexMale                        4.126e-02  1.378e-01   0.300 0.764553    
## age                           -1.014e-02  6.542e-03  -1.550 0.121023    
## no2_val                        4.586e-02  1.507e-02   3.042 0.002350 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1997.4  on 1449  degrees of freedom
## Residual deviance: 1951.3  on 1433  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 1985.3
## 
## Number of Fisher Scoring iterations: 4
car::vif(ukb_covid_no2.b)
##                        GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 2.908510  1        1.705436
## n_cancers          1.020530  1        1.010213
## townsend           1.465272  1        1.210484
## smoking            1.361503  3        1.052777
## whistling          1.873400  3        1.110295
## diabetes           2.153442  3        1.136376
## whr                1.867270  1        1.366481
## sex                1.643660  1        1.282053
## age                1.131877  1        1.063897
## no2_val            3.004997  1        1.733493
#o3
summary(ukb_covid_o3.b <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
                               whr + sex + age + o3_val, family = 'binomial'))
## 
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend + 
##     smoking + whistling + diabetes + whr + sex + age + o3_val, 
##     family = "binomial", data = ukb_covid_allPol_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5128  -1.1014  -0.8815   1.2147   1.7384  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   -1.691e+00  1.160e+00  -1.458 0.144974    
## merged_popDens_km2             3.266e-05  2.026e-05   1.613 0.106848    
## n_cancers                     -2.303e-01  1.657e-01  -1.390 0.164576    
## townsend                       2.509e-02  1.857e-02   1.351 0.176671    
## smokingNever                   6.132e-01  1.757e-01   3.491 0.000482 ***
## smokingPrefer not to answer    1.475e+00  7.434e-01   1.985 0.047180 *  
## smokingPrevious                6.728e-01  1.758e-01   3.828 0.000129 ***
## whistlingNo                    4.292e-01  3.476e-01   1.235 0.216935    
## whistlingPrefer not to answer  7.931e-02  1.637e+00   0.048 0.961358    
## whistlingYes                   4.005e-01  3.538e-01   1.132 0.257619    
## diabetesNo                    -3.154e-01  8.525e-01  -0.370 0.711418    
## diabetesPrefer not to answer  -1.337e+00  1.909e+00  -0.700 0.483791    
## diabetesYes                   -3.952e-01  8.650e-01  -0.457 0.647746    
## whr                            1.542e+00  8.047e-01   1.917 0.055253 .  
## sexMale                        6.833e-02  1.373e-01   0.498 0.618821    
## age                           -1.036e-02  6.521e-03  -1.589 0.112170    
## o3_val                         1.394e-02  2.102e-02   0.663 0.507271    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1997.4  on 1449  degrees of freedom
## Residual deviance: 1960.2  on 1433  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 1994.2
## 
## Number of Fisher Scoring iterations: 4
car::vif(ukb_covid_o3.b)
##                        GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 1.334395  1        1.155160
## n_cancers          1.020762  1        1.010327
## townsend           1.497646  1        1.223783
## smoking            1.363683  3        1.053058
## whistling          1.938926  3        1.116675
## diabetes           2.218764  3        1.142050
## whr                1.865199  1        1.365723
## sex                1.642995  1        1.281794
## age                1.132566  1        1.064221
## o3_val             1.092175  1        1.045072
#so2
summary(ukb_covid_so2.b <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
                                whr + sex + age + so2_val, family = 'binomial'))
## 
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend + 
##     smoking + whistling + diabetes + whr + sex + age + so2_val, 
##     family = "binomial", data = ukb_covid_allPol_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5419  -1.1008  -0.8828   1.2125   1.8154  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   -1.865e+00  1.168e+00  -1.597 0.110294    
## merged_popDens_km2             2.955e-05  2.042e-05   1.447 0.147827    
## n_cancers                     -2.274e-01  1.657e-01  -1.372 0.170114    
## townsend                       1.531e-02  1.853e-02   0.826 0.408529    
## smokingNever                   6.105e-01  1.755e-01   3.478 0.000505 ***
## smokingPrefer not to answer    1.415e+00  7.429e-01   1.905 0.056727 .  
## smokingPrevious                6.719e-01  1.756e-01   3.827 0.000130 ***
## whistlingNo                    4.384e-01  3.479e-01   1.260 0.207518    
## whistlingPrefer not to answer  2.860e-02  1.632e+00   0.018 0.986015    
## whistlingYes                   4.079e-01  3.540e-01   1.152 0.249305    
## diabetesNo                    -3.297e-01  8.535e-01  -0.386 0.699264    
## diabetesPrefer not to answer  -1.338e+00  1.905e+00  -0.703 0.482347    
## diabetesYes                   -4.055e-01  8.659e-01  -0.468 0.639609    
## whr                            1.542e+00  8.050e-01   1.916 0.055368 .  
## sexMale                        4.830e-02  1.377e-01   0.351 0.725700    
## age                           -1.015e-02  6.524e-03  -1.556 0.119623    
## so2_val                        1.553e-01  1.078e-01   1.441 0.149709    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1997.4  on 1449  degrees of freedom
## Residual deviance: 1958.6  on 1433  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 1992.6
## 
## Number of Fisher Scoring iterations: 4
car::vif(ukb_covid_so2.b)
##                        GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 1.352441  1        1.162945
## n_cancers          1.020029  1        1.009965
## townsend           1.489026  1        1.220257
## smoking            1.355795  3        1.052040
## whistling          1.916015  3        1.114465
## diabetes           2.192588  3        1.139793
## whr                1.865722  1        1.365914
## sex                1.649086  1        1.284167
## age                1.131975  1        1.063943
## so2_val            1.177317  1        1.085042

Supplementary Table 6. Effect of air pollutants on the probability of being infected, using the UK Biobank data

Add
plot binary modelsAdd

stargazer(ukb_covid_pm25.b, ukb_covid_pm10.b, ukb_covid_nox.b, ukb_covid_no2.b,
          ukb_covid_o3.b, ukb_covid_so2.b,type="html",out = "fig_out/ukb_covid_allPoll_binary.html",
          dep.var.labels="COVID positive or not",
          single.row=TRUE)
Dependent variable:
COVID positive or not
(1) (2) (3) (4) (5) (6)
merged\_popDens\_km2 \-0.00002 (0.00003) \-0.00001 (0.00003) \-0.00003 (0.00003) \-0.00003 (0.00003) 0.00003 (0.00002) 0.00003 (0.00002)
n\_cancers \-0.228 (0.166) \-0.227 (0.166) \-0.222 (0.166) \-0.221 (0.166) \-0.230 (0.166) \-0.227 (0.166)
townsend 0.026 (0.018) 0.025 (0.018) 0.011 (0.018) 0.009 (0.018) 0.025 (0.019) 0.015 (0.019)
smokingNever 0.632\*\*\* (0.176) 0.625\*\*\* (0.176) 0.612\*\*\* (0.176) 0.614\*\*\* (0.176) 0.613\*\*\* (0.176) 0.610\*\*\* (0.176)
smokingPrefer not to answer 1.517\*\* (0.746) 1.503\*\* (0.745) 1.418\* (0.742) 1.414\* (0.742) 1.475\*\* (0.743) 1.415\* (0.743)
smokingPrevious 0.705\*\*\* (0.176) 0.695\*\*\* (0.176) 0.685\*\*\* (0.176) 0.690\*\*\* (0.176) 0.673\*\*\* (0.176) 0.672\*\*\* (0.176)
whistlingNo 0.473 (0.349) 0.460 (0.349) 0.453 (0.351) 0.452 (0.351) 0.429 (0.348) 0.438 (0.348)
whistlingPrefer not to answer 0.123 (1.636) 0.122 (1.635) 0.032 (1.630) 0.006 (1.629) 0.079 (1.637) 0.029 (1.632)
whistlingYes 0.448 (0.356) 0.438 (0.355) 0.430 (0.357) 0.430 (0.357) 0.401 (0.354) 0.408 (0.354)
diabetesNo \-0.279 (0.859) \-0.296 (0.858) \-0.343 (0.860) \-0.337 (0.862) \-0.315 (0.853) \-0.330 (0.853)
diabetesPrefer not to answer \-1.350 (1.910) \-1.361 (1.909) \-1.409 (1.907) \-1.424 (1.908) \-1.337 (1.909) \-1.338 (1.905)
diabetesYes \-0.347 (0.871) \-0.365 (0.871) \-0.407 (0.872) \-0.402 (0.875) \-0.395 (0.865) \-0.405 (0.866)
whr 1.465\* (0.807) 1.483\* (0.807) 1.558\* (0.807) 1.520\* (0.807) 1.542\* (0.805) 1.542\* (0.805)
sexMale 0.063 (0.137) 0.062 (0.137) 0.039 (0.138) 0.041 (0.138) 0.068 (0.137) 0.048 (0.138)
age \-0.011 (0.007) \-0.011\* (0.007) \-0.010 (0.007) \-0.010 (0.007) \-0.010 (0.007) \-0.010 (0.007)
pm25\_val 0.113\*\*\* (0.040)
pm10\_val 0.072\*\* (0.028)
nox\_val 0.023\*\*\* (0.009)
no2\_val 0.046\*\*\* (0.015)
o3\_val 0.014 (0.021)
so2\_val 0.155 (0.108)
Constant \-2.495\*\* (1.201) \-2.436\*\* (1.204) \-1.971\* (1.167) \-2.146\* (1.175) \-1.691 (1.160) \-1.865 (1.168)
Observations 1,450 1,450 1,450 1,450 1,450 1,450
Log Likelihood \-976.235 \-977.093 \-976.681 \-975.636 \-980.105 \-979.286
Akaike Inf. Crit. 1,986.469 1,988.185 1,987.362 1,985.271 1,994.211 1,992.572
Note: *p\<0.1; **p\<0.05; ***p\<0.01


Each column of the table corresponds to a binary model used to predict whether one would get infected based on the concentration of an air pollutant, as well as covariates. We used different models to characterise each pollutant because the pollutants are highly correlated. The raw estimate values of each model are listed with their standard error in parentheses. The p-values are indicated using the number of asterisks beside the estimates. Average_Pop_densitykm2, average population density per square kilometer; NO.levels, nitrogen oxides levels; no2_val, nitrogen dioxide levels; O3.levels, ozone levels; Akaike Inf. Crit., Akaike’s Information Criteria

Calculate infectivity odds ratios

Format odds ratios

ukb_covid_pm25.b_odds = data.frame(cbind(exp(cbind(OR = coef(ukb_covid_pm25.b), confint(ukb_covid_pm25.b))), p_value = summary(ukb_covid_pm25.b)$coefficients[,4]))
ukb_covid_pm10.b_odds = data.frame(cbind(exp(cbind(OR = coef(ukb_covid_pm10.b), confint(ukb_covid_pm10.b))), p_value = summary(ukb_covid_pm10.b)$coefficients[,4]))
ukb_covid_nox.b_odds = data.frame(cbind(exp(cbind(OR = coef(ukb_covid_nox.b), confint(ukb_covid_nox.b))), p_value = summary(ukb_covid_nox.b)$coefficients[,4]))
ukb_covid_no2.b_odds = data.frame(cbind(exp(cbind(OR = coef(ukb_covid_no2.b), confint(ukb_covid_no2.b))), p_value = summary(ukb_covid_no2.b)$coefficients[,4]))
ukb_covid_o3.b_odds = data.frame(cbind(exp(cbind(OR = coef(ukb_covid_o3.b), confint(ukb_covid_o3.b))), p_value = summary(ukb_covid_o3.b)$coefficients[,4]))
ukb_covid_so2.b_odds = data.frame(cbind(exp(cbind(OR = coef(ukb_covid_so2.b), confint(ukb_covid_so2.b))), p_value = summary(ukb_covid_so2.b)$coefficients[,4]))
#make a df just with the pollutants 

ukb_covid_onlyPoll = data.frame(rbind(ukb_covid_pm25.b_odds[nrow(ukb_covid_pm25.b_odds),],
                           ukb_covid_pm10.b_odds[nrow(ukb_covid_pm10.b_odds),],
                           ukb_covid_nox.b_odds[nrow(ukb_covid_nox.b_odds),],
                           ukb_covid_no2.b_odds[nrow(ukb_covid_no2.b_odds),],
                           ukb_covid_o3.b_odds[nrow(ukb_covid_o3.b_odds),],
                           ukb_covid_so2.b_odds[nrow(ukb_covid_so2.b_odds),]))

Plot infectivity odds ratios

sort data

ukb_covid_onlyPoll$names = row.names(ukb_covid_onlyPoll)
ukb_covid_onlyPoll$significance = "p-value > 0.05"
ukb_covid_onlyPoll$significance[ukb_covid_onlyPoll$p_value < 0.05] <- "p-value < 0.05"

plot

ggplot(ukb_covid_onlyPoll, aes(x=reorder(names, OR), y=OR, color=significance)) + 
    geom_point(fill="white", shape=21, size = 2) +
    geom_errorbar(aes(ymin=X2.5.., ymax=X97.5..),
                  width=.2,                    # Width of the error bars
                  position=position_dodge(.9)) +
  theme_bw() + 
  geom_hline(yintercept = 1, linetype="dotted") +
  coord_flip()+ylab("Infectivity odds ratios") + 
  xlab("Pollutants")+
  ylim(0.75, 1.6)

ggsave('fig_out/UKB_infectivityOR_bar.pdf')
stargazer(ukb_covid_onlyPoll, type ="html", single.row=TRUE, summary = FALSE, out = "fig_out/ukb_covid_onlyPoll.html")
OR X2.5.. X97.5.. p\_value names significance
pm25\_val 1.120 1.036 1.211 0.004 pm25\_val p-value \< 0.05
pm10\_val 1.074 1.017 1.136 0.011 pm10\_val p-value \< 0.05
nox\_val 1.023 1.006 1.041 0.008 nox\_val p-value \< 0.05
no2\_val 1.047 1.017 1.079 0.002 no2\_val p-value \< 0.05
o3\_val 1.014 0.973 1.057 0.507 o3\_val p-value \> 0.05
so2\_val 1.168 0.946 1.444 0.150 so2\_val p-value \> 0.05

### UKB analysis with 5YA

#### Adding the 5YA air pollution data

ukb_dt = read.csv("data_output/2_5_2020_full_cov_CV_UKB_air_dataset.csv")
ukb_dt= ukb_dt[ , -which(names(ukb_dt) %in% c("pm25_val"))]
ukb_5yAP = read.csv("data_output/ukb_covid_dt_out.csv")[,c("eid","no2_5yAvg","nox_5yAvg","pm10_5yAvg","o3_5yAvg","pm25_5yAvg","no2_val",    "nox_val",  "pm10_val", "o3_val",   "pm25_val")]
ukb_dt = merge(ukb_dt,ukb_5yAP, by = "eid", all.x = T)
nrow(ukb_dt)
## [1] 1464

models - cases only

ukb_covid.b_pm25 <-glm(data = ukb_dt, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+whr + sex + age+
                          pm25_val, family = 'binomial')

ukb_covid.b_no2 <-glm(data = ukb_dt, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+whr + sex + age+
                          no2_val, family = 'binomial')

ukb_covid.b_nox <-glm(data = ukb_dt, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+whr + sex + age+
                          nox_val, family = 'binomial')

ukb_covid.b_pm10 <-glm(data = ukb_dt, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+whr + sex + age+
                          pm10_val, family = 'binomial')

ukb_covid.b_o3 <-glm(data = ukb_dt, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+whr + sex + age+
                          o3_val, family = 'binomial')


ukb_covid.b_pm25_5yAvg <-glm(data = ukb_dt, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+whr + sex + age+
                          pm25_5yAvg, family = 'binomial')

ukb_covid.b_no2_5yAvg <-glm(data = ukb_dt, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+whr + sex + age+
                          no2_5yAvg, family = 'binomial')

ukb_covid.b_nox_5yAvg <-glm(data = ukb_dt, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+whr + sex + age+
                          nox_5yAvg, family = 'binomial')

ukb_covid.b_pm10_5yAvg <-glm(data = ukb_dt, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+whr + sex + age+
                          pm10_5yAvg, family = 'binomial')

ukb_covid.b_o3_5yAvg <-glm(data = ukb_dt, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+whr + sex + age+
                          o3_5yAvg, family = 'binomial')

stargazer::stargazer(ukb_covid.b_pm25,ukb_covid.b_pm10,ukb_covid.b_no2,ukb_covid.b_nox,ukb_covid.b_o3,
                     ukb_covid.b_pm25_5yAvg,ukb_covid.b_pm10_5yAvg,ukb_covid.b_no2_5yAvg,
                         ukb_covid.b_nox_5yAvg,ukb_covid.b_o3_5yAvg,
                     type="html",out = "fig_out/ukb_covidInfections_binomial_resub.html",
          dep.var.labels="COVID-19 positive or not",
          single.row=TRUE)
## 
## <table style="text-align:center"><tr><td colspan="11" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="10"><em>Dependent variable:</em></td></tr>
## <tr><td></td><td colspan="10" style="border-bottom: 1px solid black"></td></tr>
## <tr><td style="text-align:left"></td><td colspan="10">COVID-19 positive or not</td></tr>
## <tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td><td>(3)</td><td>(4)</td><td>(5)</td><td>(6)</td><td>(7)</td><td>(8)</td><td>(9)</td><td>(10)</td></tr>
## <tr><td colspan="11" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">merged_popDens_km2</td><td>-0.00002 (0.00003)</td><td>-0.00002 (0.00003)</td><td>-0.00004 (0.00003)</td><td>-0.00003 (0.00003)</td><td>0.00003 (0.00002)</td><td>-0.00001 (0.00003)</td><td>-0.00001 (0.00003)</td><td>-0.00003 (0.00003)</td><td>-0.00003 (0.00003)</td><td>0.00003 (0.00002)</td></tr>
## <tr><td style="text-align:left">n_cancers</td><td>-0.229 (0.166)</td><td>-0.227 (0.166)</td><td>-0.214 (0.166)</td><td>-0.216 (0.166)</td><td>-0.228 (0.166)</td><td>-0.227 (0.166)</td><td>-0.226 (0.165)</td><td>-0.216 (0.166)</td><td>-0.219 (0.166)</td><td>-0.224 (0.166)</td></tr>
## <tr><td style="text-align:left">townsend</td><td>0.025 (0.018)</td><td>0.024 (0.018)</td><td>0.008 (0.018)</td><td>0.010 (0.018)</td><td>0.024 (0.019)</td><td>0.023 (0.018)</td><td>0.023 (0.018)</td><td>0.008 (0.018)</td><td>0.010 (0.018)</td><td>0.020 (0.019)</td></tr>
## <tr><td style="text-align:left">smokingNever</td><td>0.635<sup>***</sup> (0.176)</td><td>0.627<sup>***</sup> (0.176)</td><td>0.614<sup>***</sup> (0.176)</td><td>0.612<sup>***</sup> (0.176)</td><td>0.611<sup>***</sup> (0.176)</td><td>0.624<sup>***</sup> (0.176)</td><td>0.619<sup>***</sup> (0.176)</td><td>0.609<sup>***</sup> (0.176)</td><td>0.608<sup>***</sup> (0.176)</td><td>0.603<sup>***</sup> (0.175)</td></tr>
## <tr><td style="text-align:left">smokingPrefer not to answer</td><td>1.520<sup>**</sup> (0.746)</td><td>1.505<sup>**</sup> (0.746)</td><td>1.411<sup>*</sup> (0.742)</td><td>1.413<sup>*</sup> (0.742)</td><td>1.470<sup>**</sup> (0.743)</td><td>1.500<sup>**</sup> (0.746)</td><td>1.489<sup>**</sup> (0.745)</td><td>1.408<sup>*</sup> (0.743)</td><td>1.411<sup>*</sup> (0.742)</td><td>1.435<sup>*</sup> (0.742)</td></tr>
## <tr><td style="text-align:left">smokingPrevious</td><td>0.703<sup>***</sup> (0.177)</td><td>0.690<sup>***</sup> (0.176)</td><td>0.682<sup>***</sup> (0.176)</td><td>0.677<sup>***</sup> (0.176)</td><td>0.668<sup>***</sup> (0.176)</td><td>0.690<sup>***</sup> (0.176)</td><td>0.682<sup>***</sup> (0.176)</td><td>0.677<sup>***</sup> (0.176)</td><td>0.674<sup>***</sup> (0.176)</td><td>0.658<sup>***</sup> (0.176)</td></tr>
## <tr><td style="text-align:left">whistlingNo</td><td>0.475 (0.349)</td><td>0.461 (0.349)</td><td>0.462 (0.351)</td><td>0.461 (0.351)</td><td>0.427 (0.348)</td><td>0.466 (0.349)</td><td>0.458 (0.349)</td><td>0.462 (0.351)</td><td>0.462 (0.351)</td><td>0.422 (0.348)</td></tr>
## <tr><td style="text-align:left">whistlingPrefer not to answer</td><td>0.123 (1.637)</td><td>0.127 (1.636)</td><td>0.031 (1.629)</td><td>0.053 (1.629)</td><td>0.082 (1.637)</td><td>0.118 (1.633)</td><td>0.126 (1.632)</td><td>0.014 (1.629)</td><td>0.038 (1.629)</td><td>0.070 (1.635)</td></tr>
## <tr><td style="text-align:left">whistlingYes</td><td>0.458 (0.356)</td><td>0.446 (0.356)</td><td>0.450 (0.357)</td><td>0.447 (0.358)</td><td>0.400 (0.354)</td><td>0.448 (0.356)</td><td>0.442 (0.356)</td><td>0.448 (0.357)</td><td>0.445 (0.358)</td><td>0.396 (0.354)</td></tr>
## <tr><td style="text-align:left">diabetesNo</td><td>-0.275 (0.859)</td><td>-0.296 (0.858)</td><td>-0.343 (0.861)</td><td>-0.351 (0.859)</td><td>-0.317 (0.852)</td><td>-0.293 (0.858)</td><td>-0.311 (0.857)</td><td>-0.341 (0.860)</td><td>-0.350 (0.858)</td><td>-0.323 (0.852)</td></tr>
## <tr><td style="text-align:left">diabetesPrefer not to answer</td><td>-1.368 (1.912)</td><td>-1.377 (1.910)</td><td>-1.422 (1.907)</td><td>-1.407 (1.906)</td><td>-1.338 (1.909)</td><td>-1.466 (1.909)</td><td>-1.462 (1.908)</td><td>-1.435 (1.906)</td><td>-1.428 (1.906)</td><td>-1.344 (1.907)</td></tr>
## <tr><td style="text-align:left">diabetesYes</td><td>-0.324 (0.871)</td><td>-0.345 (0.871)</td><td>-0.389 (0.874)</td><td>-0.396 (0.872)</td><td>-0.379 (0.865)</td><td>-0.337 (0.870)</td><td>-0.356 (0.870)</td><td>-0.380 (0.872)</td><td>-0.389 (0.870)</td><td>-0.381 (0.865)</td></tr>
## <tr><td style="text-align:left">whr</td><td>1.487<sup>*</sup> (0.808)</td><td>1.511<sup>*</sup> (0.808)</td><td>1.555<sup>*</sup> (0.808)</td><td>1.595<sup>**</sup> (0.808)</td><td>1.580<sup>**</sup> (0.806)</td><td>1.492<sup>*</sup> (0.808)</td><td>1.508<sup>*</sup> (0.808)</td><td>1.547<sup>*</sup> (0.808)</td><td>1.582<sup>*</sup> (0.808)</td><td>1.588<sup>**</sup> (0.806)</td></tr>
## <tr><td style="text-align:left">sexMale</td><td>0.059 (0.138)</td><td>0.057 (0.138)</td><td>0.034 (0.138)</td><td>0.030 (0.138)</td><td>0.061 (0.138)</td><td>0.050 (0.138)</td><td>0.050 (0.138)</td><td>0.031 (0.138)</td><td>0.028 (0.138)</td><td>0.056 (0.138)</td></tr>
## <tr><td style="text-align:left">age</td><td>-0.011 (0.007)</td><td>-0.011<sup>*</sup> (0.007)</td><td>-0.010 (0.007)</td><td>-0.011 (0.007)</td><td>-0.010 (0.007)</td><td>-0.011<sup>*</sup> (0.007)</td><td>-0.011<sup>*</sup> (0.007)</td><td>-0.011 (0.007)</td><td>-0.011 (0.007)</td><td>-0.010 (0.007)</td></tr>
## <tr><td style="text-align:left">pm25_val</td><td>0.120<sup>***</sup> (0.040)</td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">pm10_val</td><td></td><td>0.076<sup>***</sup> (0.028)</td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">no2_val</td><td></td><td></td><td>0.049<sup>***</sup> (0.015)</td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">nox_val</td><td></td><td></td><td></td><td>0.024<sup>***</sup> (0.009)</td><td></td><td></td><td></td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">o3_val</td><td></td><td></td><td></td><td></td><td>0.012 (0.021)</td><td></td><td></td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">pm25_5yAvg</td><td></td><td></td><td></td><td></td><td></td><td>0.120<sup>***</sup> (0.044)</td><td></td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">pm10_5yAvg</td><td></td><td></td><td></td><td></td><td></td><td></td><td>0.077<sup>**</sup> (0.030)</td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">no2_5yAvg</td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td>0.044<sup>***</sup> (0.014)</td><td></td><td></td></tr>
## <tr><td style="text-align:left">nox_5yAvg</td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td>0.021<sup>***</sup> (0.008)</td><td></td></tr>
## <tr><td style="text-align:left">o3_5yAvg</td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td>-0.019 (0.079)</td></tr>
## <tr><td style="text-align:left">Constant</td><td>-2.577<sup>**</sup> (1.203)</td><td>-2.502<sup>**</sup> (1.205)</td><td>-2.197<sup>*</sup> (1.175)</td><td>-2.006<sup>*</sup> (1.167)</td><td>-1.701 (1.160)</td><td>-2.604<sup>**</sup> (1.213)</td><td>-2.532<sup>**</sup> (1.212)</td><td>-2.203<sup>*</sup> (1.175)</td><td>-2.009<sup>*</sup> (1.167)</td><td>-1.581 (1.171)</td></tr>
## <tr><td colspan="11" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>1,447</td><td>1,447</td><td>1,447</td><td>1,447</td><td>1,447</td><td>1,447</td><td>1,447</td><td>1,447</td><td>1,447</td><td>1,447</td></tr>
## <tr><td style="text-align:left">Log Likelihood</td><td>-973.678</td><td>-974.683</td><td>-972.953</td><td>-974.207</td><td>-978.070</td><td>-974.439</td><td>-975.008</td><td>-973.235</td><td>-974.345</td><td>-978.204</td></tr>
## <tr><td style="text-align:left">Akaike Inf. Crit.</td><td>1,981.357</td><td>1,983.366</td><td>1,979.906</td><td>1,982.413</td><td>1,990.139</td><td>1,982.878</td><td>1,984.015</td><td>1,980.469</td><td>1,982.690</td><td>1,990.408</td></tr>
## <tr><td colspan="11" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="10" style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
## </table>
list_df_full = do.call("rbind", list(
                summary(ukb_covid.b_pm25)$coefficients[nrow(summary(ukb_covid.b_pm25)$coefficients),1:4],
                 summary(ukb_covid.b_pm10)$coefficients[nrow(summary(ukb_covid.b_pm10)$coefficients),1:4],
                 summary(ukb_covid.b_no2)$coefficients[nrow(summary(ukb_covid.b_no2)$coefficients),1:4],
                 summary(ukb_covid.b_nox)$coefficients[nrow(summary(ukb_covid.b_nox)$coefficients),1:4],
                 summary(ukb_covid.b_o3)$coefficients[nrow(summary(ukb_covid.b_o3)$coefficients),1:4],
                 summary(ukb_covid.b_pm25_5yAvg)$coefficients[nrow(summary(ukb_covid.b_pm25_5yAvg)$coefficients),1:4],
                 summary(ukb_covid.b_pm10_5yAvg)$coefficients[nrow(summary(ukb_covid.b_pm10_5yAvg)$coefficients),1:4],
                 summary(ukb_covid.b_no2_5yAvg)$coefficients[nrow(summary(ukb_covid.b_no2_5yAvg)$coefficients),1:4],
                 summary(ukb_covid.b_nox_5yAvg)$coefficients[nrow(summary(ukb_covid.b_nox_5yAvg)$coefficients),1:4],
                 summary(ukb_covid.b_o3_5yAvg)$coefficients[nrow(summary(ukb_covid.b_o3_5yAvg)$coefficients),1:4]))
#add labels 

pollutants = c("pm25","pm10","no2","nox","o3",
         "pm25_5yAvg","pm10_5yAvg","no2_5yAvg","nox_5yAvg","o3_5yAvg")
pollutants = data.frame(pollutants)

list_df_full = cbind(pollutants, list_df_full)

colnames(list_df_full) <- c("pollutants", "Estimate","Std", "z","p")

list_df_full$OR = exp(list_df_full$Estimate)
list_df_full$upper = exp(list_df_full$Estimate + list_df_full$Std)
list_df_full$lower = exp(list_df_full$Estimate - list_df_full$Std)

list_df_full$significance = "p-value<0.05"
list_df_full$significance[list_df_full$p > 0.05] <- "p-value>0.05"

write.csv(list_df_full, "data_output/UKB_covidInfection_IORs_resub.csv", row.names=F)

## odds ratios and 95% CI

library(ggplot2)
ggplot(list_df_full, 
       aes(x=reorder(pollutants, OR), y=OR, color = significance)) + 
    geom_point(shape=21, size = 2) +
    geom_errorbar(aes(ymin=lower, ymax=upper),
                  width=.2,                    # Width of the error bars
                  position=position_dodge(.9)) +
  theme_classic() + 
  geom_hline(yintercept = 1, linetype="dotted") +
  coord_flip()+ylab("Infectivity odds ratios, resub") + 
  xlab("pollutants")

ggsave('fig_out/UKB_covidInfection_IORs_resub.pdf', width = 6, height = 4)