Marco Travaglio, Yizhou Yu, Rebeka Popovic, Liza Selley, Nuno Santos Leal, and Luis Miguel Martins
MRC Toxicology Unit, University of Cambridge
Yizhou Yu, MRC Toxicology Unit, University of
Cambridge, (yzy21 [at] mrc-tox.cam.ac.uk)
September 2020
This pipeline relates to data included in Figures 3 to 5 of our study titled Links between air pollution and COVID-19 in England, as well as Supplementary tables included in the supplementary materials.
The input files for this analysis pipeline are on the master branch of
this GitHub page:
https://github.com/M1gus/AirPollutionCOVID19
This study was first deposited in a preprint server (medRxiv) in April 2020: https://www.medrxiv.org/content/10.1101/2020.04.16.20067405v5
After peer-review, the study was published in the journal “Environmental Pollution” in October 2020:
https://www.sciencedirect.com/science/article/pii/S0269749120365489?via%3Dihub#abs0020
The main aims of the workflow presented here were as follows:
1.
Determine a relationship between air pollutants and COVID-19-associated
deaths/cases in England
2. Investigate whether any relationship
between air pollution and COVID-19 remains significant in the presence
of confounding factors at the regional and subregional level
3.
Determine the effect of air pollutants on infectivity at individual
levels.
Only the UK Biobank data is not available in the repository as they require separate application. Please visit ukbiobank.ac.uk for more information. A detailed list of the variables used in the UK Biobank analysis is available here:
library(stargazer)
stargazer(read.csv("data/suppl_table_1.csv"), summary=FALSE, rownames=FALSE, type = "html")
Variable.name | UK.Biobank.ID | Description.of.the.variable |
x\_coord | 20074 | Home location at assessment - east co-ordinate |
y\_coord | 20075 | Home location at assessment - north co-ordinate |
townsend | 189 | Townsend deprivation index at recruitment |
Age | 34 | 2020 subtracted by year of birth |
whr | 48, 49 | Waist circumference divided by hip circumference |
highBP | 4079, 4080 | High Blood pressure\* |
WP\_dusty | 22609 | Workplace very dusty\* |
WP\_chemicals | 22610 | Workplace full of chemical or other fumes\* |
WP\_cig | 22611 | Workplace had a lot of cigarette smoke from other people smoking\* |
WP\_diesel | 22615 | Workplace had a lot of diesel exhaust\* |
breathing | 22616 | Breathing problems during period of job\* |
whistling | 2316 | Wheeze or whistling in the chest in last year |
sex | 31 | Sex |
fev | 20153 | Forced expiratory volume in 1-second (FEV1), predicted\* |
copd | 22130 | Doctor diagnosed COPD (chronic obstructive pulmonary disease)\* |
diabetes | 2443 | Diabetes diagnosed by doctor |
n\_cancers | 134 | Number of self-reported cancers |
smoking | 20116 | Smoking status\* |
*These variables contained too few data and were excluded from
analysis.
The detailed information for every variable can be
retrieved by searching for the UK Biobank ID at ukbiobank.ac.uk.
Deaths/cases ~ air pollutants + population
Then, Model comparisons
library(MASS)
library(stargazer)
library(ggplot2)
library(dplyr)
library(tidyverse)
library(purrr)
library(AER)
library(reshape)
library(rgdal) # for spTransform, requires sudo apt install libgdal-dev
require(stringr)
library(sp)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
library(rgeos)
Please note:
the following packages may need to be installed:
liblapack-dev
liblapack3
libopenblas-base
libopenblas-dev
texlive-fonts-recommended
lmodern
Can be
done with apt-get on ubuntu
Read and verify the data:
preL_dt = read.csv("data/26-4-2020_yyAIR_COVID_PRE_LD_dt.csv")[,-1]
head(preL_dt)
## Region cases_preL deaths_preL Date_cases
## 1 East Of England 5356 746 NA
## 2 London 16913 2120 NA
## 3 Midlands 10501 1491 NA
## 4 North East And Yorkshire 8004 893 NA
## 5 North West 9394 847 NA
## 6 South East 8547 783 NA
## Population_size_2018 Average_Pop_density_personkm2 Cases Deaths NO.levels
## 1 6201214 324.0 6499 1448 9.502135
## 2 8908081 5666.0 19511 3522 25.193133
## 3 10704906 380.5 14844 2684 14.529437
## 4 8137524 333.0 10633 1641 16.501209
## 5 7292093 517.0 12093 1801 8.581661
## 6 9133625 479.0 10919 1430 10.646127
## NO2.levels O3.levels
## 1 19.55372 54.36748
## 2 38.51252 36.91391
## 3 24.11985 47.69889
## 4 25.15139 45.29534
## 5 20.06274 48.95081
## 6 20.47013 52.30382
Quickly visualise the distribution of each variable:
require (reshape)
#normalise per column & reshape the data for plotting
preL_dt_scale = as.data.frame(sapply(preL_dt[,-c(1,4)], scale))
preL_dt_scale$Region = preL_dt$Region
preL_dt_long = melt(preL_dt_scale, in.vars = "Region")
ggplot(preL_dt_long, aes (value)) +
geom_density() +
geom_histogram() +
facet_wrap(~variable) +
theme_bw() +
theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
summary(full_lm <- lm(data = preL_dt, deaths_preL ~ Average_Pop_density_personkm2 + NO.levels + NO2.levels + O3.levels))
##
## Call:
## lm(formula = deaths_preL ~ Average_Pop_density_personkm2 + NO.levels +
## NO2.levels + O3.levels, data = preL_dt)
##
## Residuals:
## 1 2 3 4 5 6 7
## -101.701 -1.428 192.949 -187.616 -13.232 44.475 66.554
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.235e+04 4.844e+03 -2.549 0.1256
## Average_Pop_density_personkm2 -6.310e-01 2.544e-01 -2.480 0.1313
## NO.levels -3.326e+02 1.124e+02 -2.958 0.0978 .
## NO2.levels 6.015e+02 1.782e+02 3.375 0.0777 .
## O3.levels 8.824e+01 5.409e+01 1.631 0.2444
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 211.4 on 2 degrees of freedom
## Multiple R-squared: 0.956, Adjusted R-squared: 0.8681
## F-statistic: 10.87 on 4 and 2 DF, p-value: 0.08599
The model is overdispersed - use negative binomial or poisson
regressions.
The poisson and Negative Binomial models work better
with count data. Changing these will not affect the OLS results.
preL_dt$NO.levels = round(preL_dt$NO.levels * 1000)
preL_dt$NO2.levels = round(preL_dt$NO2.levels * 1000)
preL_dt$O3.levels = round(preL_dt$O3.levels * 1000)
preL_dt$Average_Pop_density_personkm2 = round(preL_dt$Average_Pop_density_personkm2)
It is important to ensure that they are integers:
preL_dt_int = as.data.frame(sapply(preL_dt[,2:ncol(preL_dt)], as.integer))
preL_dt_int$Region = preL_dt$Region
preL_dt_int
## cases_preL deaths_preL Date_cases Population_size_2018
## 1 5356 746 NA 6201214
## 2 16913 2120 NA 8908081
## 3 10501 1491 NA 10704906
## 4 8004 893 NA 8137524
## 5 9394 847 NA 7292093
## 6 8547 783 NA 9133625
## 7 2898 368 NA 5599735
## Average_Pop_density_personkm2 Cases Deaths NO.levels NO2.levels O3.levels
## 1 324 6499 1448 9502 19554 54367
## 2 5666 19511 3522 25193 38513 36914
## 3 380 14844 2684 14529 24120 47699
## 4 333 10633 1641 16501 25151 45295
## 5 517 12093 1801 8582 20063 48951
## 6 479 10919 1430 10646 20470 52304
## 7 235 3913 608 14047 21991 48054
## Region
## 1 East Of England
## 2 London
## 3 Midlands
## 4 North East And Yorkshire
## 5 North West
## 6 South East
## 7 South West
### Number of death
summary(full_lm.nb <- glm.nb(data = preL_dt_int, deaths_preL ~ Average_Pop_density_personkm2 + NO.levels + NO2.levels + O3.levels))
##
## Call:
## glm.nb(formula = deaths_preL ~ Average_Pop_density_personkm2 +
## NO.levels + NO2.levels + O3.levels, data = preL_dt_int, init.theta = 196.4212515,
## link = log)
##
## Deviance Residuals:
## 1 2 3 4 5 6 7
## -1.50927 -0.04204 1.13015 -1.12365 -0.18404 1.44286 0.13514
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.090e+01 1.890e+00 -5.767 8.07e-09 ***
## Average_Pop_density_personkm2 -9.331e-04 9.829e-05 -9.494 < 2e-16 ***
## NO.levels -4.728e-04 4.487e-05 -10.539 < 2e-16 ***
## NO2.levels 8.135e-04 7.086e-05 11.482 < 2e-16 ***
## O3.levels 1.200e-04 2.068e-05 5.802 6.56e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(196.4213) family taken to be 1)
##
## Null deviance: 294.5029 on 6 degrees of freedom
## Residual deviance: 6.9535 on 2 degrees of freedom
## AIC: 91.745
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 196
## Std. Err.: 129
##
## 2 x log-likelihood: -79.745
summary(full_lm.p <- glm(data = preL_dt_int, deaths_preL ~ Average_Pop_density_personkm2 + NO.levels + NO2.levels + O3.levels, family = "poisson"))
##
## Call:
## glm(formula = deaths_preL ~ Average_Pop_density_personkm2 + NO.levels +
## NO2.levels + O3.levels, family = "poisson", data = preL_dt_int)
##
## Deviance Residuals:
## 1 2 3 4 5 6 7
## -3.4193 -0.0597 2.5243 -2.8870 -0.4216 3.3526 0.8474
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.159e+01 9.091e-01 -12.75 <2e-16 ***
## Average_Pop_density_personkm2 -9.771e-04 4.570e-05 -21.38 <2e-16 ***
## NO.levels -4.916e-04 2.301e-05 -21.36 <2e-16 ***
## NO2.levels 8.457e-04 3.571e-05 23.68 <2e-16 ***
## O3.levels 1.247e-04 9.462e-06 13.18 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1833.017 on 6 degrees of freedom
## Residual deviance: 38.539 on 2 degrees of freedom
## AIC: 109.09
##
## Number of Fisher Scoring iterations: 4
#pchisq(2 * (logLik(full_lm.nb) - logLik(full_lm.p)), df = 1, lower.tail = FALSE)
#(est <- cbind(Estimate = coef(full_lm.nb), confint(full_lm.nb)))
summary(cases_lm <- lm(data = preL_dt, cases_preL ~ Average_Pop_density_personkm2 + NO.levels + NO2.levels + O3.levels))
##
## Call:
## lm(formula = cases_preL ~ Average_Pop_density_personkm2 + NO.levels +
## NO2.levels + O3.levels, data = preL_dt)
##
## Residuals:
## 1 2 3 4 5 6 7
## -1610.04 -62.57 495.92 -375.69 -200.13 2067.35 -314.85
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.887e+04 4.405e+04 -1.337 0.313
## Average_Pop_density_personkm2 -3.805e+00 2.313e+00 -1.645 0.242
## NO.levels -2.778e+00 1.022e+00 -2.717 0.113
## NO2.levels 4.118e+00 1.620e+00 2.541 0.126
## O3.levels 2.380e-01 4.921e-01 0.484 0.676
##
## Residual standard error: 1923 on 2 degrees of freedom
## Multiple R-squared: 0.9365, Adjusted R-squared: 0.8095
## F-statistic: 7.373 on 4 and 2 DF, p-value: 0.123
summary(cases_lm.nb <- glm.nb(data = preL_dt_int, cases_preL ~ Average_Pop_density_personkm2 + NO.levels + NO2.levels + O3.levels))
##
## Call:
## glm.nb(formula = cases_preL ~ Average_Pop_density_personkm2 +
## NO.levels + NO2.levels + O3.levels, data = preL_dt_int, init.theta = 33.53196858,
## link = log)
##
## Deviance Residuals:
## 1 2 3 4 5 6 7
## -1.62262 -0.06666 -0.12959 0.24412 -0.18307 1.97477 -0.62489
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.451e+00 3.968e+00 -1.122 0.261994
## Average_Pop_density_personkm2 -7.942e-04 2.084e-04 -3.811 0.000138 ***
## NO.levels -4.916e-04 9.215e-05 -5.334 9.60e-08 ***
## NO2.levels 7.404e-04 1.461e-04 5.069 4.01e-07 ***
## O3.levels 6.958e-05 4.430e-05 1.570 0.116300
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(33.532) family taken to be 1)
##
## Null deviance: 55.8888 on 6 degrees of freedom
## Residual deviance: 7.0375 on 2 degrees of freedom
## AIC: 132.87
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 33.5
## Std. Err.: 17.9
##
## 2 x log-likelihood: -120.868
summary(cases_lm.p <- glm(data = preL_dt_int, cases_preL ~ Average_Pop_density_personkm2 + NO.levels + NO2.levels + O3.levels, family = "poisson"))
##
## Call:
## glm(formula = cases_preL ~ Average_Pop_density_personkm2 + NO.levels +
## NO2.levels + O3.levels, family = "poisson", data = preL_dt_int)
##
## Deviance Residuals:
## 1 2 3 4 5 6 7
## -20.3835 -0.5511 -0.2285 1.7246 -2.0609 28.5914 -10.4853
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.934e+00 3.006e-01 -9.761 <2e-16 ***
## Average_Pop_density_personkm2 -7.301e-04 1.543e-05 -47.324 <2e-16 ***
## NO.levels -4.524e-04 7.507e-06 -60.273 <2e-16 ***
## NO2.levels 6.787e-04 1.186e-05 57.231 <2e-16 ***
## O3.levels 5.611e-05 3.104e-06 18.080 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 13239.2 on 6 degrees of freedom
## Residual deviance: 1350.5 on 2 degrees of freedom
## AIC: 1436.1
##
## Number of Fisher Scoring iterations: 4
Cases data: table
stargazer(cases_lm.p, cases_lm.nb,type="html",
dep.var.labels="Number of cases until 8 April 2020",
ci=TRUE, ci.level=0.95, single.row=TRUE)
Dependent variable: | ||
Number of cases until 8 April 2020 | ||
Poisson | negative | |
binomial | ||
(1) | (2) | |
Average\_Pop\_density\_personkm2 | \-0.001\*\*\* (-0.001, -0.001) | \-0.001\*\*\* (-0.001, -0.0004) |
NO.levels | \-0.0005\*\*\* (-0.0005, -0.0004) | \-0.0005\*\*\* (-0.001, -0.0003) |
NO2.levels | 0.001\*\*\* (0.001, 0.001) | 0.001\*\*\* (0.0005, 0.001) |
O3.levels | 0.0001\*\*\* (0.0001, 0.0001) | 0.0001 (-0.00002, 0.0002) |
Constant | \-2.934\*\*\* (-3.523, -2.345) | \-4.451 (-12.229, 3.327) |
Observations | 7 | 7 |
Log Likelihood | \-713.041 | \-61.434 |
theta | 33.532\* (17.929) | |
Akaike Inf. Crit. | 1,436.081 | 132.868 |
Note: | *p\<0.1; **p\<0.05; ***p\<0.01 |
Each column of the table corresponds to a different type of
regression model as indicated at the top. The raw estimate values of
each model are listed with their 95% confidence intervals in
parentheses. The p-values are indicated using the number of asterisks
beside the estimates. OLS, ordinary least square;
Average_Pop_densitykm2, average population density per square
kilometer; NO.levels, nitrogen oxide levels; NO2.levels, nitrogen
dioxide levels; O3.levels, ozone levels; Akaike Inf. Crit., Akaike’s
Information Criteria; Residual Std. Error, Residual standard error.
stargazer(full_lm.p, full_lm.nb,type="html",
dep.var.labels="Number of deaths until 8 April 2020",
ci=TRUE, ci.level=0.95, single.row=TRUE)
Dependent variable: | ||
Number of deaths until 8 April 2020 | ||
Poisson | negative | |
binomial | ||
(1) | (2) | |
Average\_Pop\_density\_personkm2 | \-0.001\*\*\* (-0.001, -0.001) | \-0.001\*\*\* (-0.001, -0.001) |
NO.levels | \-0.0005\*\*\* (-0.001, -0.0004) | \-0.0005\*\*\* (-0.001, -0.0004) |
NO2.levels | 0.001\*\*\* (0.001, 0.001) | 0.001\*\*\* (0.001, 0.001) |
O3.levels | 0.0001\*\*\* (0.0001, 0.0001) | 0.0001\*\*\* (0.0001, 0.0002) |
Constant | \-11.590\*\*\* (-13.372, -9.809) | \-10.900\*\*\* (-14.604, -7.195) |
Observations | 7 | 7 |
Log Likelihood | \-49.547 | \-40.873 |
theta | 196.421 (128.791) | |
Akaike Inf. Crit. | 109.094 | 91.745 |
Note: | *p\<0.1; **p\<0.05; ***p\<0.01 |
Each column of the table corresponds to a different type of
regression model as indicated at the top. The raw estimate values of
each model are listed with their 95% confidence intervals in
parentheses. The p-values are indicated using the number of asterisks
beside the estimates. OLS, ordinary least square;
Average_Pop_densitykm2, average population density per square
kilometer; NO.levels, nitrogen oxide levels; NO2.levels, nitrogen
dioxide levels; O3.levels, ozone levels; Akaike Inf. Crit., Akaike’s
Information Criteria; Residual Std. Error, Residual standard error.
Aim: Analyse the effect of air pollutants on COVID cases and deaths at the regional level
pop_dens = read.csv("data/2018_official_popDensity.csv")[c("Code","X2018.people.per.sq..km")]
earnings =read.csv("data/ann_earning_2018_perLA.csv")[c("Code","Mean_ann_earnings")]
age = read.csv("data/processed_median_age_of_population_perLA.csv")[c("Code","median_age_2018","Name")]
covid_deaths = read.csv("data/covid_deaths_until10April_byAreaCode.csv")
covid_deaths$total_deaths = covid_deaths$Home + covid_deaths$Hospital + covid_deaths$Care.home + covid_deaths$Hospice + covid_deaths$Other.communal.establishment + covid_deaths$Elsewhere
covid_deaths = covid_deaths[c("Area.code","total_deaths")]
colnames(covid_deaths) <- c("Code","deaths")
nrow(pop_dens)
## [1] 435
nrow(earnings)
## [1] 422
nrow(age)
## [1] 433
nrow(covid_deaths)
## [1] 346
#merge data
merged_covid_dt_LA = merge(covid_deaths, pop_dens, by = "Code")
merged_covid_dt_LA = merge(merged_covid_dt_LA, earnings, by = "Code")
merged_covid_dt_LA = merge(merged_covid_dt_LA, age, by = "Code")
nrow(merged_covid_dt_LA)
## [1] 334
Now, we will use a python script (in the form of a Jupyter Notebook)
to match air pollution data to local authorities.
####
Analysis
##### Data visualisation for model selection
Load data and set correct formats
# load data
colnames(merged_covid_dt_LA)
## [1] "Code" "deaths"
## [3] "X2018.people.per.sq..km" "Mean_ann_earnings"
## [5] "median_age_2018" "Name"
merged_covid_dt_LA$X2018.people.per.sq..km = as.numeric(gsub(",","",merged_covid_dt_LA$X2018.people.per.sq..km))
merged_covid_dt_LA$Mean_ann_earnings = as.numeric(gsub(",","",merged_covid_dt_LA$Mean_ann_earnings))
write.csv(merged_covid_dt_LA, "data_output/merged_covid_cov_dt_LA.csv")
re-read here
covid_air_dt = read.csv("data_output/merged_covidAir_cov_dt_LA.csv", na.strings = "x")
covid_air_dt$X2018.people.per.sq..km = as.numeric(gsub(",","",covid_air_dt$X2018.people.per.sq..km))
covid_air_dt$Mean_ann_earnings = as.numeric(gsub(",","",covid_air_dt$Mean_ann_earnings))
covid_air_dt$X2018.people.per.sq..km = as.numeric(covid_air_dt$X2018.people.per.sq..km)
covid_air_dt$Mean_ann_earnings = as.numeric(covid_air_dt$Mean_ann_earnings)
Make sure that only England data is included
covid_air_dt = covid_air_dt[startsWith(as.character(covid_air_dt$Code), 'E'),]
nrow(covid_air_dt)
## [1] 312
Visualise all numeric data
covid_air_dt_vis = subset(covid_air_dt, select=c(deaths, X2018.people.per.sq..km, Mean_ann_earnings,median_age_2018,pm25_val,no2_val,o3_val,
pm10_val,so2_val, nox_val))
covid_air_dt_vis = as.data.frame(sapply(covid_air_dt_vis, function(x) as.numeric(x) ) )
#normalise per column & reshape the data for plotting
covid_air_dt_vis = as.data.frame(sapply(covid_air_dt_vis, scale))
covid_air_dt_vis$Code = covid_air_dt$Code
covid_air_dt_vis_long = melt(covid_air_dt_vis, in.vars = "Code")
## Using Code as id variables
ggplot(covid_air_dt_vis_long, aes (value)) +
geom_histogram() +
geom_density() +
facet_wrap(~variable) +
theme_bw() +
theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Negative binomial regression model
pm25_deaths.nb <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km +
Mean_ann_earnings + median_age_2018 + pm25_val)
car::vif(pm25_deaths.nb)
## X2018.people.per.sq..km Mean_ann_earnings median_age_2018
## 2.080989 1.441089 2.135625
## pm25_val
## 1.950479
pm10_deaths.nb <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km +
Mean_ann_earnings + median_age_2018 + pm10_val)
car::vif(pm10_deaths.nb)
## X2018.people.per.sq..km Mean_ann_earnings median_age_2018
## 2.099725 1.385040 1.950317
## pm10_val
## 1.667899
nox_deaths.nb <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km +
Mean_ann_earnings + median_age_2018 + nox_val)
car::vif(nox_deaths.nb)
## X2018.people.per.sq..km Mean_ann_earnings median_age_2018
## 2.529754 1.277034 2.128073
## nox_val
## 2.650417
no2_deaths.nb <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km +
Mean_ann_earnings + median_age_2018 + no2_val)
car::vif(no2_deaths.nb)
## X2018.people.per.sq..km Mean_ann_earnings median_age_2018
## 2.417227 1.263175 2.314129
## no2_val
## 2.736928
o3_deaths.nb <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km +
Mean_ann_earnings + median_age_2018 + o3_val)
car::vif(o3_deaths.nb)
## X2018.people.per.sq..km Mean_ann_earnings median_age_2018
## 2.094533 1.278499 1.827214
## o3_val
## 1.148228
#summary(so2_deaths.nb <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km + #
# Mean_ann_earnings + median_age_2018 + so2_val))
#car::vif(so2_deaths.nb)
We note that annual earnings in our models is not significant. We therefore proceed to remove this variable.
# glm -earnings, since not significant
summary(pm25_deaths.nb_red <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km +
median_age_2018 + pm25_val))
car::vif(pm25_deaths.nb_red)
summary(pm10_deaths.nb_red <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km +
median_age_2018 + pm10_val))
car::vif(pm10_deaths.nb_red)
summary(nox_deaths.nb_red <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km +
median_age_2018 + nox_val))
car::vif(nox_deaths.nb_red)
summary(no2_deaths.nb_red <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km +
median_age_2018 + no2_val))
car::vif(no2_deaths.nb_red)
summary(o3_deaths.nb_red <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km +
median_age_2018 + o3_val))
car::vif(o3_deaths.nb_red)
summary(so2_deaths.nb_red <- glm.nb(data = covid_air_dt, deaths ~ X2018.people.per.sq..km +
median_age_2018 + so2_val))
car::vif(so2_deaths.nb_red)
pm25_deaths.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(pm25_deaths.nb), confint(pm25_deaths.nb))), p_value = summary(pm25_deaths.nb)$coefficients[,4]))
pm10_deaths.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(pm10_deaths.nb), confint(pm10_deaths.nb))), p_value = summary(pm10_deaths.nb)$coefficients[,4]))
nox_deaths.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(nox_deaths.nb), confint(nox_deaths.nb))), p_value = summary(nox_deaths.nb)$coefficients[,4]))
no2_deaths.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(no2_deaths.nb), confint(no2_deaths.nb))), p_value = summary(no2_deaths.nb)$coefficients[,4]))
o3_deaths.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(o3_deaths.nb), confint(o3_deaths.nb))), p_value = summary(o3_deaths.nb)$coefficients[,4]))
#so2_deaths.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(so2_deaths.nb), confint(so2_deaths.nb))), p_value = summary(so2_deaths.nb)$coefficients[,4]))
#make a df just with the pollutants
LA_covid_onlyPoll = data.frame(rbind(pm25_deaths.nb_mrr[nrow(pm25_deaths.nb_mrr),],
pm10_deaths.nb_mrr[nrow(pm10_deaths.nb_mrr),],
nox_deaths.nb_mrr[nrow(nox_deaths.nb_mrr),],
no2_deaths.nb_mrr[nrow(no2_deaths.nb_mrr),],
o3_deaths.nb_mrr[nrow(o3_deaths.nb_mrr),]))
sort data
LA_covid_onlyPoll$names = row.names(LA_covid_onlyPoll)
LA_covid_onlyPoll$significance = "p-value > 0.05"
LA_covid_onlyPoll$significance[LA_covid_onlyPoll$p_value < 0.05] <- "p-value < 0.05"
write.csv(LA_covid_onlyPoll, "data_output/LA_covid_onlyPoll_preL_AP2018.csv", row.names = F)
plot
ggplot(LA_covid_onlyPoll, aes(x=reorder(names, OR), y=OR, color=significance)) +
geom_point(fill="white", shape=21, size = 2) +
geom_errorbar(aes(ymin=X2.5.., ymax=X97.5..),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
theme_classic() +
geom_hline(yintercept = 1, linetype="dotted") +
coord_flip()+ylab("Mortality rate ratios") +
xlab("Pollutants")+
ylim(0.75, 1.6)
ggsave('fig_out/LA_deaths_MRR.pdf')
cases_LA_raw = read.csv("data/coronavirus-cases_latest-18_5_2020.csv")
cases_LA_dt = subset(cases_LA_raw, Specimen.date == "2020-04-10", select = c(Area.code, Cumulative.lab.confirmed.cases))
cases_LA_dt.agg <-aggregate(cases_LA_dt, by=list(cases_LA_dt$Area.code),
FUN=mean, na.rm=TRUE)
nrow(cases_LA_dt.agg)
## [1] 342
Merge cases data to main data
deaths_LA_dt = read.csv("data_output/merged_covidAir_cov_dt_LA.csv", na.strings = "x")
deaths_LA_dt$X2018.people.per.sq..km = gsub(",", "", deaths_LA_dt$X2018.people.per.sq..km)
deaths_LA_dt$X2018.people.per.sq..km = as.numeric(deaths_LA_dt$X2018.people.per.sq..km)
deaths_LA_dt$Mean_ann_earnings = gsub(",", "", deaths_LA_dt$Mean_ann_earnings)
deaths_LA_dt$Mean_ann_earnings = as.numeric(deaths_LA_dt$Mean_ann_earnings)
cases_deaths_LA_dt = merge(deaths_LA_dt, cases_LA_dt.agg, by.x = "Code",by.y = "Group.1")
nrow(cases_deaths_LA_dt)
## [1] 301
There are only 301 observations matched.
Visualisation
ggplot(data = cases_deaths_LA_dt, aes(x = Cumulative.lab.confirmed.cases))+
geom_histogram(stat="count")
cases_deaths_LA_dt$X2018.people.per.sq..km = as.numeric(cases_deaths_LA_dt$X2018.people.per.sq..km)
cases_deaths_LA_dt$Mean_ann_earnings = as.numeric(cases_deaths_LA_dt$Mean_ann_earnings)
Model
summary(pm25_cases.nb <- glm.nb(data = cases_deaths_LA_dt, Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km +
Mean_ann_earnings + median_age_2018 + pm25_val))
##
## Call:
## glm.nb(formula = Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km +
## Mean_ann_earnings + median_age_2018 + pm25_val, data = cases_deaths_LA_dt,
## init.theta = 2.724975873, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.8555 -0.8682 -0.3414 0.2772 2.8464
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.593e+00 5.567e-01 17.232 < 2e-16 ***
## X2018.people.per.sq..km 9.626e-05 1.953e-05 4.930 8.21e-07 ***
## Mean_ann_earnings 6.492e-06 5.788e-06 1.122 0.261996
## median_age_2018 -9.122e-02 1.037e-02 -8.800 < 2e-16 ***
## pm25_val -9.629e-02 2.770e-02 -3.476 0.000509 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(2.725) family taken to be 1)
##
## Null deviance: 562.55 on 294 degrees of freedom
## Residual deviance: 311.82 on 290 degrees of freedom
## (6 observations deleted due to missingness)
## AIC: 3583.7
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 2.725
## Std. Err.: 0.216
##
## 2 x log-likelihood: -3571.745
car::vif(pm25_cases.nb)
## X2018.people.per.sq..km Mean_ann_earnings median_age_2018
## 2.085624 1.427671 2.125130
## pm25_val
## 1.933232
summary(pm10_cases.nb <- glm.nb(data = cases_deaths_LA_dt, Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km +
Mean_ann_earnings + median_age_2018 + pm10_val))
##
## Call:
## glm.nb(formula = Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km +
## Mean_ann_earnings + median_age_2018 + pm10_val, data = cases_deaths_LA_dt,
## init.theta = 2.745559885, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.8501 -0.8823 -0.3287 0.2934 2.9348
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.513e+00 5.216e-01 18.238 < 2e-16 ***
## X2018.people.per.sq..km 1.002e-04 1.956e-05 5.122 3.03e-07 ***
## Mean_ann_earnings 5.806e-06 5.650e-06 1.028 0.304083
## median_age_2018 -8.743e-02 9.861e-03 -8.866 < 2e-16 ***
## pm10_val -6.748e-02 1.762e-02 -3.831 0.000128 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(2.7456) family taken to be 1)
##
## Null deviance: 566.73 on 294 degrees of freedom
## Residual deviance: 311.67 on 290 degrees of freedom
## (6 observations deleted due to missingness)
## AIC: 3581.3
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 2.746
## Std. Err.: 0.218
##
## 2 x log-likelihood: -3569.297
car::vif(pm10_cases.nb)
## X2018.people.per.sq..km Mean_ann_earnings median_age_2018
## 2.109349 1.370273 1.937009
## pm10_val
## 1.650101
summary(nox_cases.nb <- glm.nb(data = cases_deaths_LA_dt, Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km +
Mean_ann_earnings + median_age_2018 + nox_val))
##
## Call:
## glm.nb(formula = Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km +
## Mean_ann_earnings + median_age_2018 + nox_val, data = cases_deaths_LA_dt,
## init.theta = 2.721556683, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.7665 -0.8356 -0.3858 0.3034 3.4862
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.409e+00 4.954e-01 14.956 < 2e-16 ***
## X2018.people.per.sq..km 5.246e-05 2.220e-05 2.363 0.01813 *
## Mean_ann_earnings -4.812e-06 5.385e-06 -0.894 0.37145
## median_age_2018 -5.803e-02 1.051e-02 -5.522 3.35e-08 ***
## nox_val 1.740e-02 5.344e-03 3.255 0.00113 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(2.7216) family taken to be 1)
##
## Null deviance: 561.85 on 294 degrees of freedom
## Residual deviance: 312.05 on 290 degrees of freedom
## (6 observations deleted due to missingness)
## AIC: 3584.4
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 2.722
## Std. Err.: 0.216
##
## 2 x log-likelihood: -3572.364
car::vif(nox_cases.nb)
## X2018.people.per.sq..km Mean_ann_earnings median_age_2018
## 2.692080 1.233985 2.182823
## nox_val
## 2.905970
summary(no2_cases.nb <- glm.nb(data = cases_deaths_LA_dt, Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km +
Mean_ann_earnings + median_age_2018 + no2_val))
##
## Call:
## glm.nb(formula = Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km +
## Mean_ann_earnings + median_age_2018 + no2_val, data = cases_deaths_LA_dt,
## init.theta = 2.729294735, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.7594 -0.8476 -0.3940 0.3071 3.5818
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.190e+00 5.306e-01 13.550 < 2e-16 ***
## X2018.people.per.sq..km 5.698e-05 2.142e-05 2.661 0.00780 **
## Mean_ann_earnings -4.853e-06 5.365e-06 -0.904 0.36574
## median_age_2018 -5.476e-02 1.094e-02 -5.007 5.52e-07 ***
## no2_val 3.005e-02 9.141e-03 3.287 0.00101 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(2.7293) family taken to be 1)
##
## Null deviance: 563.42 on 294 degrees of freedom
## Residual deviance: 312.03 on 290 degrees of freedom
## (6 observations deleted due to missingness)
## AIC: 3583.5
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 2.729
## Std. Err.: 0.216
##
## 2 x log-likelihood: -3571.468
car::vif(no2_cases.nb)
## X2018.people.per.sq..km Mean_ann_earnings median_age_2018
## 2.512648 1.228516 2.370271
## no2_val
## 2.910421
summary(o3_cases.nb <- glm.nb(data = cases_deaths_LA_dt, Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km +
Mean_ann_earnings + median_age_2018 + o3_val))
##
## Call:
## glm.nb(formula = Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km +
## Mean_ann_earnings + median_age_2018 + o3_val, data = cases_deaths_LA_dt,
## init.theta = 2.808347379, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.9822 -0.8344 -0.3153 0.3226 2.8354
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 8.206e+00 4.265e-01 19.241 < 2e-16 ***
## X2018.people.per.sq..km 6.994e-05 1.934e-05 3.616 0.000299 ***
## Mean_ann_earnings 7.689e-06 5.480e-06 1.403 0.160572
## median_age_2018 -6.928e-02 9.456e-03 -7.327 2.36e-13 ***
## o3_val -5.259e-02 1.111e-02 -4.733 2.21e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(2.8083) family taken to be 1)
##
## Null deviance: 579.47 on 294 degrees of freedom
## Residual deviance: 311.40 on 290 degrees of freedom
## (6 observations deleted due to missingness)
## AIC: 3574.1
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 2.808
## Std. Err.: 0.223
##
## 2 x log-likelihood: -3562.114
car::vif(o3_cases.nb)
## X2018.people.per.sq..km Mean_ann_earnings median_age_2018
## 2.108263 1.318626 1.820923
## o3_val
## 1.177405
summary(so2_cases.nb <- glm.nb(data = cases_deaths_LA_dt, Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km +
Mean_ann_earnings + median_age_2018 + so2_val))
##
## Call:
## glm.nb(formula = Cumulative.lab.confirmed.cases ~ X2018.people.per.sq..km +
## Mean_ann_earnings + median_age_2018 + so2_val, data = cases_deaths_LA_dt,
## init.theta = 2.717269061, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.7230 -0.8505 -0.3570 0.3183 3.4351
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.239e+00 5.311e-01 13.629 < 2e-16 ***
## X2018.people.per.sq..km 7.709e-05 1.964e-05 3.925 8.66e-05 ***
## Mean_ann_earnings 2.288e-06 5.451e-06 0.420 0.67462
## median_age_2018 -6.018e-02 1.036e-02 -5.811 6.20e-09 ***
## so2_val 2.370e-01 7.467e-02 3.174 0.00151 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(2.7173) family taken to be 1)
##
## Null deviance: 560.98 on 294 degrees of freedom
## Residual deviance: 312.07 on 290 degrees of freedom
## (6 observations deleted due to missingness)
## AIC: 3584.9
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 2.717
## Std. Err.: 0.215
##
## 2 x log-likelihood: -3572.864
car::vif(so2_cases.nb)
## X2018.people.per.sq..km Mean_ann_earnings median_age_2018
## 2.104075 1.262575 2.115436
## so2_val
## 1.525447
stargazer(pm25_cases.nb, pm10_cases.nb, nox_cases.nb, no2_cases.nb,
o3_cases.nb, so2_cases.nb,type="html",out = "fig_out/LA_covid_allPoll_CASES_nb.html",
dep.var.labels="Number of COVID-19-related cases",
single.row=TRUE)
Dependent variable: | ||||||
Number of COVID-19-related cases | ||||||
(1) | (2) | (3) | (4) | (5) | (6) | |
X2018.people.per.sq..km | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) | 0.0001\*\* (0.00002) | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) |
Mean\_ann\_earnings | 0.00001 (0.00001) | 0.00001 (0.00001) | \-0.00000 (0.00001) | \-0.00000 (0.00001) | 0.00001 (0.00001) | 0.00000 (0.00001) |
median\_age\_2018 | \-0.091\*\*\* (0.010) | \-0.087\*\*\* (0.010) | \-0.058\*\*\* (0.011) | \-0.055\*\*\* (0.011) | \-0.069\*\*\* (0.009) | \-0.060\*\*\* (0.010) |
pm25\_val | \-0.096\*\*\* (0.028) | |||||
pm10\_val | \-0.067\*\*\* (0.018) | |||||
nox\_val | 0.017\*\*\* (0.005) | |||||
no2\_val | 0.030\*\*\* (0.009) | |||||
o3\_val | \-0.053\*\*\* (0.011) | |||||
so2\_val | 0.237\*\*\* (0.075) | |||||
Constant | 9.593\*\*\* (0.557) | 9.513\*\*\* (0.522) | 7.409\*\*\* (0.495) | 7.190\*\*\* (0.531) | 8.206\*\*\* (0.426) | 7.239\*\*\* (0.531) |
Observations | 295 | 295 | 295 | 295 | 295 | 295 |
Log Likelihood | \-1,786.872 | \-1,785.649 | \-1,787.182 | \-1,786.734 | \-1,782.057 | \-1,787.432 |
theta | 2.725\*\*\* (0.216) | 2.746\*\*\* (0.218) | 2.722\*\*\* (0.216) | 2.729\*\*\* (0.216) | 2.808\*\*\* (0.223) | 2.717\*\*\* (0.215) |
Akaike Inf. Crit. | 3,583.745 | 3,581.297 | 3,584.364 | 3,583.468 | 3,574.114 | 3,584.864 |
Note: | *p\<0.1; **p\<0.05; ***p\<0.01 |
pm25_cases.nb_irr = data.frame(cbind(exp(cbind(OR = coef(pm25_cases.nb), confint(pm25_cases.nb))), p_value = summary(pm25_cases.nb)$coefficients[,4]))
pm10_cases.nb_irr = data.frame(cbind(exp(cbind(OR = coef(pm10_cases.nb), confint(pm10_cases.nb))), p_value = summary(pm10_cases.nb)$coefficients[,4]))
nox_cases.nb_irr = data.frame(cbind(exp(cbind(OR = coef(nox_cases.nb), confint(nox_cases.nb))), p_value = summary(nox_cases.nb)$coefficients[,4]))
no2_cases.nb_irr = data.frame(cbind(exp(cbind(OR = coef(no2_cases.nb), confint(no2_cases.nb))), p_value = summary(no2_cases.nb)$coefficients[,4]))
o3_cases.nb_irr = data.frame(cbind(exp(cbind(OR = coef(o3_cases.nb), confint(o3_cases.nb))), p_value = summary(o3_cases.nb)$coefficients[,4]))
so2_cases.nb_irr = data.frame(cbind(exp(cbind(OR = coef(so2_cases.nb), confint(so2_cases.nb))), p_value = summary(so2_cases.nb)$coefficients[,4]))
#make a df just with the pollutants
LA_covid_CASES_onlyPoll = data.frame(rbind(pm25_cases.nb_irr[nrow(pm25_cases.nb_irr),],
pm10_cases.nb_irr[nrow(pm10_cases.nb_irr),],
nox_cases.nb_irr[nrow(nox_cases.nb_irr),],
no2_cases.nb_irr[nrow(no2_cases.nb_irr),],
o3_cases.nb_irr[nrow(o3_cases.nb_irr),],
so2_cases.nb_irr[nrow(so2_cases.nb_irr),]))
sort data
LA_covid_CASES_onlyPoll$names = row.names(LA_covid_CASES_onlyPoll)
LA_covid_CASES_onlyPoll$significance = "p-value > 0.05"
LA_covid_CASES_onlyPoll$significance[LA_covid_CASES_onlyPoll$p_value < 0.05] <- "p-value < 0.05"
write.csv(LA_covid_CASES_onlyPoll, "data_output/LA_covid_CASES_onlyPoll_preL_2018AP.csv", row.names = F)
plot
ggplot(LA_covid_CASES_onlyPoll, aes(x=reorder(names, OR), y=OR, color=significance)) +
geom_point(fill="white", shape=21, size = 2) +
geom_errorbar(aes(ymin=X2.5.., ymax=X97.5..),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
theme_classic() +
geom_hline(yintercept = 1, linetype="dotted") +
coord_flip()+ylab("Infectivity rate ratios") +
xlab("Pollutants")+
ylim(0.75, 1.6)
ggsave('fig_out/LA_CASES_odds_IRR.pdf')
Save the numbers
library(stargazer)
stargazer(LA_covid_CASES_onlyPoll, summary=FALSE ,type="html",out
="fig_out/LA_covid_infectivity_IRR.html")
OR | X2.5.. | X97.5.. | p\_value | names | significance | |
pm25\_val | 0.908 | 0.860 | 0.958 | 0.001 | pm25\_val | p-value \< 0.05 |
pm10\_val | 0.935 | 0.903 | 0.967 | 0.0001 | pm10\_val | p-value \< 0.05 |
nox\_val | 1.018 | 1.008 | 1.028 | 0.001 | nox\_val | p-value \< 0.05 |
no2\_val | 1.031 | 1.014 | 1.047 | 0.001 | no2\_val | p-value \< 0.05 |
o3\_val | 0.949 | 0.928 | 0.970 | 0.00000 | o3\_val | p-value \< 0.05 |
so2\_val | 1.267 | 1.103 | 1.459 | 0.002 | so2\_val | p-value \< 0.05 |
Add
#### Supplementary Table 4. Effect of air pollutants on the
number of COVID-related cases at the subregional level Add
stargazer(pm25_cases.nb, pm10_cases.nb, nox_cases.nb, no2_cases.nb,
o3_cases.nb, so2_cases.nb,type="html",out = "fig_out/LA_covid_allPoll_CASES_nb.html",
dep.var.labels="Number of COVID-19-related cases",
single.row=TRUE)
Dependent variable: | ||||||
Number of COVID-19-related cases | ||||||
(1) | (2) | (3) | (4) | (5) | (6) | |
X2018.people.per.sq..km | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) | 0.0001\*\* (0.00002) | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) |
Mean\_ann\_earnings | 0.00001 (0.00001) | 0.00001 (0.00001) | \-0.00000 (0.00001) | \-0.00000 (0.00001) | 0.00001 (0.00001) | 0.00000 (0.00001) |
median\_age\_2018 | \-0.091\*\*\* (0.010) | \-0.087\*\*\* (0.010) | \-0.058\*\*\* (0.011) | \-0.055\*\*\* (0.011) | \-0.069\*\*\* (0.009) | \-0.060\*\*\* (0.010) |
pm25\_val | \-0.096\*\*\* (0.028) | |||||
pm10\_val | \-0.067\*\*\* (0.018) | |||||
nox\_val | 0.017\*\*\* (0.005) | |||||
no2\_val | 0.030\*\*\* (0.009) | |||||
o3\_val | \-0.053\*\*\* (0.011) | |||||
so2\_val | 0.237\*\*\* (0.075) | |||||
Constant | 9.593\*\*\* (0.557) | 9.513\*\*\* (0.522) | 7.409\*\*\* (0.495) | 7.190\*\*\* (0.531) | 8.206\*\*\* (0.426) | 7.239\*\*\* (0.531) |
Observations | 295 | 295 | 295 | 295 | 295 | 295 |
Log Likelihood | \-1,786.872 | \-1,785.649 | \-1,787.182 | \-1,786.734 | \-1,782.057 | \-1,787.432 |
theta | 2.725\*\*\* (0.216) | 2.746\*\*\* (0.218) | 2.722\*\*\* (0.216) | 2.729\*\*\* (0.216) | 2.808\*\*\* (0.223) | 2.717\*\*\* (0.215) |
Akaike Inf. Crit. | 3,583.745 | 3,581.297 | 3,584.364 | 3,583.468 | 3,574.114 | 3,584.864 |
Note: | *p\<0.1; **p\<0.05; ***p\<0.01 |
The value in parentheses represent the standard error.
Summary
of the effects of individual air pollutant species on lab-confirmed
COVID-related cases up to April 10th 2020. The three asterisks near the
theta indicate that the models are significantly better than null.
Add
stargazer(pm25_deaths.nb, pm10_deaths.nb, nox_deaths.nb, no2_deaths.nb,
o3_deaths.nb,type="html",out = "fig_out/LA_covid_allPoll_nb.html",
dep.var.labels="Number of COVID-19-related deaths",
single.row=TRUE)
Dependent variable: | |||||
Number of COVID-19-related deaths | |||||
(1) | (2) | (3) | (4) | (5) | |
X2018.people.per.sq..km | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) |
Mean\_ann\_earnings | 0.00000 (0.00001) | 0.00001 (0.00001) | \-0.00000 (0.00001) | \-0.00000 (0.00001) | 0.00001 (0.00001) |
median\_age\_2018 | \-0.079\*\*\* (0.012) | \-0.079\*\*\* (0.011) | \-0.055\*\*\* (0.012) | \-0.050\*\*\* (0.012) | \-0.067\*\*\* (0.011) |
pm25\_val | \-0.039 (0.031) | ||||
pm10\_val | \-0.036\* (0.020) | ||||
nox\_val | 0.016\*\*\* (0.006) | ||||
no2\_val | 0.030\*\*\* (0.010) | ||||
o3\_val | \-0.037\*\*\* (0.013) | ||||
Constant | 6.988\*\*\* (0.635) | 7.103\*\*\* (0.597) | 5.541\*\*\* (0.542) | 5.268\*\*\* (0.580) | 6.323\*\*\* (0.489) |
Observations | 306 | 306 | 306 | 306 | 306 |
Log Likelihood | \-1,395.565 | \-1,394.788 | \-1,392.433 | \-1,391.427 | \-1,392.320 |
theta | 2.099\*\*\* (0.171) | 2.109\*\*\* (0.171) | 2.143\*\*\* (0.175) | 2.157\*\*\* (0.176) | 2.142\*\*\* (0.175) |
Akaike Inf. Crit. | 2,801.130 | 2,799.577 | 2,794.865 | 2,792.853 | 2,794.640 |
Note: | *p\<0.1; **p\<0.05; ***p\<0.01 |
Summary of the effects of individual air pollutant species on
lab-confirmed COVID-related deaths up to April 10th 2020. The three
asterisks near the theta indicate that the models are significantly
better than null. The MRR (mortality rate ratios) of O3 is interesting:
more O3 = ~3% less deaths. NOx, NO2 and SO2 are all significant
predictors; they will increase the number of deaths by 1.3, 2.4 and
17.2% respectively, per 1 µg/m3.
NO2
no2_2014 = read.csv("data/raw_mapno22014.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'no22014')]
no2_2015 = read.csv("data/raw_mapno22015.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'no22015')]
no2_2016 = read.csv("data/raw_mapno22016.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'no22016')]
no2_2017 = read.csv("data/raw_mapno22017.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'no22017')]
no2_2018 = read.csv("data/raw_mapno22018.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'no22018')]
no2_5y = Reduce(merge, list(no2_2014,no2_2015,no2_2016,no2_2017,no2_2018))
no2_5y$no2_5yAvg = (no2_5y$no22014+no2_5y$no22015+no2_5y$no22016+no2_5y$no22017+no2_5y$no22018)/5
no2_5y = no2_5y[ ,c('ukgridcode', 'no22018','no2_5yAvg')]
NOX
nox_2014 = read.csv("data/raw_mapnox2014.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'nox2014')]
nox_2015 = read.csv("data/raw_mapnox2015.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'nox2015')]
nox_2016 = read.csv("data/raw_mapnox2016.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'nox2016')]
nox_2017 = read.csv("data/raw_mapnox2017.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'nox2017')]
nox_2018 = read.csv("data/raw_mapnox2018.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'nox2018')]
nox_5y = Reduce(merge, list(nox_2014,nox_2015,nox_2016,nox_2017,nox_2018))
nox_5y$nox_5yAvg = (nox_5y$nox2014+nox_5y$nox2015+nox_5y$nox2016+nox_5y$nox2017+nox_5y$nox2018)/5
nox_5y = nox_5y[ ,c('ukgridcode', 'nox2018','nox_5yAvg')]
PM25
pm25_2014 = read.csv("data/raw_mappm252014.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'pm252014g')]
pm25_2015 = read.csv("data/raw_mappm252015.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'pm252015g')]
pm25_2016 = read.csv("data/raw_mappm252016.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'pm252016g')]
pm25_2017 = read.csv("data/raw_mappm252017.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'pm252017g')]
pm25_2018 = read.csv("data/raw_mappm252018.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'pm252018g')]
pm25_5y = Reduce(merge, list(pm25_2014,pm25_2015,pm25_2016,pm25_2017,pm25_2018))
pm25_5y$pm25_5yAvg = (pm25_5y$pm252014g+pm25_5y$pm252015g+pm25_5y$pm252016g+pm25_5y$pm252017g+pm25_5y$pm252018g)/5
pm25_5y = pm25_5y[ ,c('ukgridcode', 'pm252018g','pm25_5yAvg')]
PM10
pm10_2014 = read.csv("data/raw_mappm102014.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'pm102014g')]
pm10_2015 = read.csv("data/raw_mappm102015.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'pm102015g')]
pm10_2016 = read.csv("data/raw_mappm102016.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'pm102016g')]
pm10_2017 = read.csv("data/raw_mappm102017.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'pm102017g')]
pm10_2018 = read.csv("data/raw_mappm102018.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'pm102018g')]
pm10_5y = Reduce(merge, list(pm10_2014,pm10_2015,pm10_2016,pm10_2017,pm10_2018))
pm10_5y$pm10_5yAvg = (pm10_5y$pm102014g+pm10_5y$pm102015g+pm10_5y$pm102016g+pm10_5y$pm102017g+pm10_5y$pm102018g)/5
pm10_5y = pm10_5y[ ,c('ukgridcode', 'pm102018g','pm10_5yAvg')]
O3
o3_2014 = read.csv("data/raw_mapo312014.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'dgt12014')]
o3_2015 = read.csv("data/raw_mapo312015.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'dgt12015')]
o3_2016 = read.csv("data/raw_mapo312016.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'dgt12016')]
o3_2017 = read.csv("data/raw_mapo32017.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'dgt12017')]
o3_2018 = read.csv("data/raw_mapo312018.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'dgt12018')]
o3_5y = Reduce(merge, list(o3_2014,o3_2015,o3_2016,o3_2017,o3_2018))
o3_5y$o3_5yAvg = (o3_5y$dgt12014+o3_5y$dgt12015+o3_5y$dgt12016+o3_5y$dgt12017+o3_5y$dgt12018)/5
o3_5y = o3_5y[ ,c('ukgridcode', 'dgt12018','o3_5yAvg')]
ap_dt = Reduce(merge, list(no2_5y,nox_5y,pm25_5y,pm10_5y,o3_5y))
write.csv(ap_dt, "data_output/5Yaverage_AP_PCMdata.csv", row.names = F)
write.csv(na.omit(ap_dt), "data_output/5Yaverage_AP_PCMdata_na.csv", row.names = F)
Convert the XY coordinates to lon-lat
ap_dt = read.csv("data_output/5Yaverage_AP_PCMdata_na.csv")
# Add X Y coordinates from any of the data
o3_2014 = read.csv("data/raw_mapo312014.csv", skip = 5, header = T,na.strings="MISSING")[ ,c('ukgridcode', 'x','y')]
ap_dt = merge(ap_dt,o3_2014, by = "ukgridcode")
ap_dt_out = ap_dt[ , -which(names(ap_dt) %in% c("x","y"))]
### convert to lon lat ----
# transform: easting/northing -> long/lat
### shortcuts (from https://stephendavidgregory.github.io/useful/UKgrid_to_LatLon)
ukgrid <- "+init=epsg:27700"
latlong <- "+init=epsg:4326"
### Create coordinates variable
ap_coords <- cbind(Easting = as.numeric(as.character(ap_dt$x)),
Northing = as.numeric(as.character(ap_dt$y)))
### Create the SpatialPointsDataFrame
ap_SP <- SpatialPointsDataFrame(ap_coords,
data = ap_dt,
proj4string = CRS("+init=epsg:27700"))
### Convert
ap_LL <- spTransform(ap_SP, CRS(latlong))
ap_ll_df = data.frame('lon' = coordinates(ap_LL)[, 1], 'lat' = coordinates(ap_LL)[, 2], 'ukgridcode' = ap_LL$ukgridcode)
ap_dt_ll = merge(ap_dt_out,ap_ll_df, by = "ukgridcode")
write.csv(ap_dt_ll,"data_output/processed_allAP_lonlat.csv", row.names = F)
##### Data curation
avgAP_dt = read.csv("data_output/merged_covidAir_cov_LA_5YA.csv", na.strings = "x")
avgAP_dt$X2018.people.per.sq..km = as.numeric(gsub(",","",avgAP_dt$X2018.people.per.sq..km))
avgAP_dt$Mean_ann_earnings = as.numeric(gsub(",","",avgAP_dt$Mean_ann_earnings))
avgAP_dt$X2018.people.per.sq..km = as.numeric(avgAP_dt$X2018.people.per.sq..km)
avgAP_dt$Mean_ann_earnings = as.numeric(avgAP_dt$Mean_ann_earnings)
Additional variables for 5YA
avgEarnings_2014 = na.omit(read.csv("data_output/avgEarnings_2014.csv", na.strings = "x")[,c("Code","avgEarn_2014")])
avgEarnings_2015 = na.omit(read.csv("data_output/avgEarnings_2015.csv", na.strings = "x")[,c("Code","avgEarn_2015")])
avgEarnings_2016 = na.omit(read.csv("data_output/avgEarnings_2016.csv", na.strings = "x")[,c("Code","avgEarn_2016")])
avgEarnings_2017 = na.omit(read.csv("data_output/avgEarnings_2017.csv", na.strings = "x")[,c("Code","avgEarn_2017")])
avgEarnings_2018 = na.omit(read.csv("data_output/avgEarnings_2018.csv", na.strings = "x")[,c("Code","avgEarn_2018")])
avgEarnings_5ya = Reduce(merge, list(avgEarnings_2014,avgEarnings_2015,avgEarnings_2016,avgEarnings_2017,avgEarnings_2018))
avgEarnings_5ya$avgEarn_2014 = as.numeric(gsub(",","",avgEarnings_5ya$avgEarn_2014))
avgEarnings_5ya$avgEarn_2015 = as.numeric(gsub(",","",avgEarnings_5ya$avgEarn_2015))
avgEarnings_5ya$avgEarn_2016 = as.numeric(gsub(",","",avgEarnings_5ya$avgEarn_2016))
avgEarnings_5ya$avgEarn_2017 = as.numeric(gsub(",","",avgEarnings_5ya$avgEarn_2017))
avgEarnings_5ya$avgEarn_2018 = as.numeric(gsub(",","",avgEarnings_5ya$avgEarn_2018))
avgEarnings_5ya$earnings_5ya = (avgEarnings_5ya$avgEarn_2014 + avgEarnings_5ya$avgEarn_2015 + avgEarnings_5ya$avgEarn_2016 + avgEarnings_5ya$avgEarn_2017 + avgEarnings_5ya$avgEarn_2018)/5
avgEarnings_5ya = avgEarnings_5ya[,c("Code","earnings_5ya")]
Pop_density_5ya = read.csv("data/Pop_density_full_dt.csv")[,c("Code","pop_dens_5ya")]
age_5ya = read.csv("data_output/age_5ya.csv")[,c("Code","age_5ya")]
covariates_5ya = Reduce(merge, list(avgEarnings_5ya,Pop_density_5ya,age_5ya))
#avgAP_dt = merge(avgAP_dt, covariates_5ya, by = "Code")
avgAP_dt = merge(avgAP_dt, covariates_5ya[,-3], by = "Code")
avgAP_dt$pop_dens_5ya = as.numeric(gsub(",","",avgAP_dt$pop_dens_5ya))
write.csv(avgAP_dt,"data_output/merged_covidAir_cov_LA_5YA.csv", row.names = F)
summary(pm25_deaths_5YA.nb <- glm.nb(data = avgAP_dt, deaths ~ pop_dens_5ya +
earnings_5ya + age_5ya + pm25_5yAvg))
##
## Call:
## glm.nb(formula = deaths ~ pop_dens_5ya + earnings_5ya + age_5ya +
## pm25_5yAvg, data = avgAP_dt, init.theta = 2.228067951, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.5294 -0.8876 -0.3636 0.2695 3.3767
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.390e+00 5.943e-01 10.752 < 2e-16 ***
## pop_dens_5ya 9.994e-05 2.273e-05 4.397 1.10e-05 ***
## earnings_5ya -2.086e-07 6.848e-06 -0.030 0.976
## age_5ya -7.057e-02 1.204e-02 -5.861 4.59e-09 ***
## pm25_5yAvg 7.720e-03 1.938e-02 0.398 0.690
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(2.2281) family taken to be 1)
##
## Null deviance: 487.47 on 282 degrees of freedom
## Residual deviance: 300.11 on 278 degrees of freedom
## AIC: 2612.1
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 2.228
## Std. Err.: 0.188
##
## 2 x log-likelihood: -2600.111
car::vif(pm25_deaths_5YA.nb)
## pop_dens_5ya earnings_5ya age_5ya pm25_5yAvg
## 2.142889 1.189121 1.982406 1.134312
summary(pm10_deaths_5YA.nb <- glm.nb(data = avgAP_dt, deaths ~ pop_dens_5ya +
earnings_5ya + age_5ya + pm10_5yAvg))
##
## Call:
## glm.nb(formula = deaths ~ pop_dens_5ya + earnings_5ya + age_5ya +
## pm10_5yAvg, data = avgAP_dt, init.theta = 2.227020703, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.5260 -0.8855 -0.3567 0.2658 3.3505
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.456e+00 5.880e-01 10.980 < 2e-16 ***
## pop_dens_5ya 9.992e-05 2.276e-05 4.390 1.13e-05 ***
## earnings_5ya 2.476e-08 6.845e-06 0.004 0.997
## age_5ya -7.137e-02 1.193e-02 -5.983 2.20e-09 ***
## pm10_5yAvg 2.242e-03 1.366e-02 0.164 0.870
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(2.227) family taken to be 1)
##
## Null deviance: 487.25 on 282 degrees of freedom
## Residual deviance: 300.11 on 278 degrees of freedom
## AIC: 2612.2
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 2.227
## Std. Err.: 0.187
##
## 2 x log-likelihood: -2600.239
car::vif(pm10_deaths_5YA.nb)
## pop_dens_5ya earnings_5ya age_5ya pm10_5yAvg
## 2.148181 1.187764 1.945466 1.101626
summary(nox_deaths_5YA.nb <- glm.nb(data = avgAP_dt, deaths ~ pop_dens_5ya +
earnings_5ya + age_5ya + nox_5yAvg))
##
## Call:
## glm.nb(formula = deaths ~ pop_dens_5ya + earnings_5ya + age_5ya +
## nox_5yAvg, data = avgAP_dt, init.theta = 2.330563559, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.4959 -0.8776 -0.3217 0.2737 3.8186
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.264e+00 5.935e-01 8.870 < 2e-16 ***
## pop_dens_5ya 8.440e-05 2.331e-05 3.621 0.000293 ***
## earnings_5ya 2.352e-06 6.711e-06 0.350 0.726007
## age_5ya -5.045e-02 1.244e-02 -4.055 5e-05 ***
## nox_5yAvg 1.469e-02 4.179e-03 3.515 0.000440 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(2.3306) family taken to be 1)
##
## Null deviance: 508.44 on 282 degrees of freedom
## Residual deviance: 299.44 on 278 degrees of freedom
## AIC: 2598.9
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 2.331
## Std. Err.: 0.198
##
## 2 x log-likelihood: -2586.947
car::vif(nox_deaths_5YA.nb)
## pop_dens_5ya earnings_5ya age_5ya nox_5yAvg
## 2.352705 1.191368 2.207728 1.827694
summary(no2_deaths_5YA.nb <- glm.nb(data = avgAP_dt, deaths ~ pop_dens_5ya +
earnings_5ya + age_5ya + no2_5yAvg))
##
## Call:
## glm.nb(formula = deaths ~ pop_dens_5ya + earnings_5ya + age_5ya +
## no2_5yAvg, data = avgAP_dt, init.theta = 2.327684382, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.5071 -0.8829 -0.3292 0.2712 3.8454
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.232e+00 6.097e-01 8.582 < 2e-16 ***
## pop_dens_5ya 8.883e-05 2.288e-05 3.882 0.000103 ***
## earnings_5ya 2.010e-06 6.712e-06 0.299 0.764600
## age_5ya -5.038e-02 1.261e-02 -3.995 6.47e-05 ***
## no2_5yAvg 2.384e-02 7.044e-03 3.385 0.000713 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(2.3277) family taken to be 1)
##
## Null deviance: 507.85 on 282 degrees of freedom
## Residual deviance: 299.49 on 278 degrees of freedom
## AIC: 2599.3
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 2.328
## Std. Err.: 0.197
##
## 2 x log-likelihood: -2587.340
car::vif(no2_deaths_5YA.nb)
## pop_dens_5ya earnings_5ya age_5ya no2_5yAvg
## 2.263950 1.190001 2.265576 1.747459
summary(o3_deaths_5YA.nb <- glm.nb(data = avgAP_dt, deaths ~ pop_dens_5ya +
earnings_5ya + age_5ya + o3_5yAvg))
##
## Call:
## glm.nb(formula = deaths ~ pop_dens_5ya + earnings_5ya + age_5ya +
## o3_5yAvg, data = avgAP_dt, init.theta = 2.353153533, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.3590 -0.8889 -0.3458 0.3287 3.3772
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.250e+00 5.195e-01 12.031 < 2e-16 ***
## pop_dens_5ya 7.587e-05 2.285e-05 3.320 0.000899 ***
## earnings_5ya 9.901e-06 6.856e-06 1.444 0.148705
## age_5ya -5.912e-02 1.171e-02 -5.047 4.49e-07 ***
## o3_5yAvg -1.793e-01 4.201e-02 -4.268 1.97e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(2.3532) family taken to be 1)
##
## Null deviance: 513.04 on 282 degrees of freedom
## Residual deviance: 299.22 on 278 degrees of freedom
## AIC: 2596.1
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 2.353
## Std. Err.: 0.200
##
## 2 x log-likelihood: -2584.060
car::vif(o3_deaths_5YA.nb)
## pop_dens_5ya earnings_5ya age_5ya o3_5yAvg
## 2.282899 1.257509 1.969455 1.311695
stargazer(pm25_deaths_5YA.nb, pm10_deaths_5YA.nb, nox_deaths_5YA.nb, no2_deaths_5YA.nb,o3_deaths_5YA.nb,type="html",out = "fig_out/LA_covid_allPoll_DEATHS_5YA_nb.html",
dep.var.labels="Number of COVID-19-related deaths, 5-year average",
single.row=TRUE)
Dependent variable: | |||||
Number of COVID-19-related deaths, 5-year average | |||||
(1) | (2) | (3) | (4) | (5) | |
pop\_dens\_5ya | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) |
earnings\_5ya | \-0.00000 (0.00001) | 0.00000 (0.00001) | 0.00000 (0.00001) | 0.00000 (0.00001) | 0.00001 (0.00001) |
age\_5ya | \-0.071\*\*\* (0.012) | \-0.071\*\*\* (0.012) | \-0.050\*\*\* (0.012) | \-0.050\*\*\* (0.013) | \-0.059\*\*\* (0.012) |
pm25\_5yAvg | 0.008 (0.019) | ||||
pm10\_5yAvg | 0.002 (0.014) | ||||
nox\_5yAvg | 0.015\*\*\* (0.004) | ||||
no2\_5yAvg | 0.024\*\*\* (0.007) | ||||
o3\_5yAvg | \-0.179\*\*\* (0.042) | ||||
Constant | 6.390\*\*\* (0.594) | 6.456\*\*\* (0.588) | 5.264\*\*\* (0.594) | 5.232\*\*\* (0.610) | 6.250\*\*\* (0.519) |
Observations | 283 | 283 | 283 | 283 | 283 |
Log Likelihood | \-1,301.056 | \-1,301.119 | \-1,294.474 | \-1,294.670 | \-1,293.030 |
theta | 2.228\*\*\* (0.188) | 2.227\*\*\* (0.187) | 2.331\*\*\* (0.198) | 2.328\*\*\* (0.197) | 2.353\*\*\* (0.200) |
Akaike Inf. Crit. | 2,612.111 | 2,612.239 | 2,598.947 | 2,599.340 | 2,596.060 |
Note: | *p\<0.1; **p\<0.05; ***p\<0.01 |
pm25_deaths_5YA.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(pm25_deaths_5YA.nb), confint(pm25_deaths_5YA.nb))), p_value = summary(pm25_deaths_5YA.nb)$coefficients[,4]))
pm10_deaths_5YA.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(pm10_deaths_5YA.nb), confint(pm10_deaths_5YA.nb))), p_value = summary(pm10_deaths_5YA.nb)$coefficients[,4]))
nox_deaths_5YA.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(nox_deaths_5YA.nb), confint(nox_deaths_5YA.nb))), p_value = summary(nox_deaths_5YA.nb)$coefficients[,4]))
no2_deaths_5YA.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(no2_deaths_5YA.nb), confint(no2_deaths_5YA.nb))), p_value = summary(no2_deaths_5YA.nb)$coefficients[,4]))
o3_deaths_5YA.nb_mrr = data.frame(cbind(exp(cbind(OR = coef(o3_deaths_5YA.nb), confint(o3_deaths_5YA.nb))), p_value = summary(o3_deaths_5YA.nb)$coefficients[,4]))
#make a df just with the pollutants
LA_covid_DEATHS_onlyPoll_5YA = data.frame(rbind(pm25_deaths_5YA.nb_mrr[nrow(pm25_deaths_5YA.nb_mrr),],
pm10_deaths_5YA.nb_mrr[nrow(pm10_deaths_5YA.nb_mrr),],
nox_deaths_5YA.nb_mrr[nrow(nox_deaths_5YA.nb_mrr),],
no2_deaths_5YA.nb_mrr[nrow(no2_deaths_5YA.nb_mrr),],
o3_deaths_5YA.nb_mrr[nrow(o3_deaths_5YA.nb_mrr),]))
sort data
LA_covid_DEATHS_onlyPoll_5YA$names = row.names(LA_covid_DEATHS_onlyPoll_5YA)
LA_covid_DEATHS_onlyPoll_5YA$significance = "p-value > 0.05"
LA_covid_DEATHS_onlyPoll_5YA$significance[LA_covid_DEATHS_onlyPoll_5YA$p_value < 0.05] <- "p-value < 0.05"
write.csv(LA_covid_DEATHS_onlyPoll_5YA,"data_output/LA_covid_DEATHS_onlyPoll_5YA.csv", row.names = F)
plot
ggplot(LA_covid_DEATHS_onlyPoll_5YA, aes(x=reorder(names, OR), y=OR, color=significance)) +
geom_point(fill="white", shape=21, size = 2) +
geom_errorbar(aes(ymin=X2.5.., ymax=X97.5..),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
geom_hline(yintercept = 1, linetype="dotted") +
coord_flip()+ylab("Mortality rate ratios, pre-lockdown") +
xlab("Pollutants, 5 year average (2014-2018)") + theme_classic()
ggsave('fig_out/LA_CASES_odds_MRR_5YA.pdf')
cases_LA_raw = read.csv("data/coronavirus-cases_latest-18_5_2020.csv")
cases_LA_dt = subset(cases_LA_raw, Specimen.date == "2020-04-10", select = c(Area.code, Cumulative.lab.confirmed.cases))
library(dplyr)
cases_LA_dt = cases_LA_dt %>%
group_by(Area.code) %>%
summarise(cases = mean(Cumulative.lab.confirmed.cases))
## `summarise()` ungrouping output (override with `.groups` argument)
nrow(cases_LA_dt)
## [1] 342
avgAP_dt = read.csv("data_output/merged_covidAir_cov_LA_5YA.csv", na.strings = "x")
avgAP_dt$pop_dens_5ya = as.numeric(gsub(",","",avgAP_dt$pop_dens_5ya))
avgAP_dt$X2018.people.per.sq..km = as.numeric(gsub(",","",avgAP_dt$X2018.people.per.sq..km))
avgAP_dt$Mean_ann_earnings = as.numeric(gsub(",","",avgAP_dt$Mean_ann_earnings))
avgAP_dt$X2018.people.per.sq..km = as.numeric(avgAP_dt$X2018.people.per.sq..km)
avgAP_dt$Mean_ann_earnings = as.numeric(avgAP_dt$Mean_ann_earnings)
#avgAP_dt = merge(avgAP_dt,cases_LA_dt, by.x = "Code",by.y = "Area.code")
#write.csv(avgAP_dt,"data_output/merged_covidAir_cov_LA_5YA.csv", row.names = F)
library(MASS)
pm25_cases_5YA.nb <- glm.nb(data = avgAP_dt, cases ~ pop_dens_5ya +
earnings_5ya + age_5ya + pm25_5yAvg)
car::vif(pm25_cases_5YA.nb)
## pop_dens_5ya earnings_5ya age_5ya pm25_5yAvg
## 2.127469 1.183202 1.976028 1.133826
pm10_cases_5YA.nb <- glm.nb(data = avgAP_dt, cases ~ pop_dens_5ya +
earnings_5ya + age_5ya + pm10_5yAvg)
car::vif(pm10_cases_5YA.nb)
## pop_dens_5ya earnings_5ya age_5ya pm10_5yAvg
## 2.132595 1.181741 1.939209 1.100561
nox_cases_5YA.nb <- glm.nb(data = avgAP_dt, cases ~ pop_dens_5ya +
earnings_5ya + age_5ya + nox_5yAvg)
car::vif(nox_cases_5YA.nb)
## pop_dens_5ya earnings_5ya age_5ya nox_5yAvg
## 2.334119 1.184576 2.204639 1.828731
no2_cases_5YA.nb <- glm.nb(data = avgAP_dt, cases ~ pop_dens_5ya +
earnings_5ya + age_5ya + no2_5yAvg)
car::vif(no2_cases_5YA.nb)
## pop_dens_5ya earnings_5ya age_5ya no2_5yAvg
## 2.246629 1.183251 2.261649 1.748043
o3_cases_5YA.nb <- glm.nb(data = avgAP_dt, cases ~ pop_dens_5ya +
earnings_5ya + age_5ya + o3_5yAvg)
car::vif(o3_cases_5YA.nb)
## pop_dens_5ya earnings_5ya age_5ya o3_5yAvg
## 2.261263 1.250229 1.966950 1.314072
stargazer(pm25_cases_5YA.nb, pm10_cases_5YA.nb, nox_cases_5YA.nb, no2_cases_5YA.nb,o3_cases_5YA.nb,type="html",out = "fig_out/LA_covid_allPoll_CASES_5YA_nb.html",
dep.var.labels="Number of COVID-19-related cases, pre-lockdown, 5-year averages",
single.row=TRUE)
Dependent variable: | |||||
Number of COVID-19-related cases, pre-lockdown, 5-year averages | |||||
(1) | (2) | (3) | (4) | (5) | |
pop\_dens\_5ya | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) | 0.00005\*\* (0.00002) |
earnings\_5ya | \-0.00000 (0.00001) | \-0.00000 (0.00001) | \-0.00000 (0.00001) | \-0.00000 (0.00001) | 0.00001 (0.00001) |
age\_5ya | \-0.082\*\*\* (0.011) | \-0.082\*\*\* (0.011) | \-0.062\*\*\* (0.011) | \-0.062\*\*\* (0.012) | \-0.063\*\*\* (0.010) |
pm25\_5yAvg | \-0.028 (0.018) | ||||
pm10\_5yAvg | \-0.023\* (0.012) | ||||
nox\_5yAvg | 0.012\*\*\* (0.004) | ||||
no2\_5yAvg | 0.019\*\*\* (0.006) | ||||
o3\_5yAvg | \-0.228\*\*\* (0.037) | ||||
Constant | 8.855\*\*\* (0.540) | 8.900\*\*\* (0.533) | 7.538\*\*\* (0.544) | 7.527\*\*\* (0.559) | 8.194\*\*\* (0.460) |
Observations | 283 | 283 | 283 | 283 | 283 |
Log Likelihood | \-1,729.358 | \-1,728.939 | \-1,725.318 | \-1,725.733 | \-1,713.342 |
theta | 2.598\*\*\* (0.209) | 2.605\*\*\* (0.210) | 2.667\*\*\* (0.215) | 2.660\*\*\* (0.215) | 2.882\*\*\* (0.234) |
Akaike Inf. Crit. | 3,468.715 | 3,467.878 | 3,460.637 | 3,461.467 | 3,436.684 |
Note: | *p\<0.1; **p\<0.05; ***p\<0.01 |
pm25_cases_5YA.nb_irr = data.frame(cbind(exp(cbind(OR = coef(pm25_cases_5YA.nb), confint(pm25_cases_5YA.nb))), p_value = summary(pm25_cases_5YA.nb)$coefficients[,4]))
pm10_cases_5YA.nb_irr = data.frame(cbind(exp(cbind(OR = coef(pm10_cases_5YA.nb), confint(pm10_cases_5YA.nb))), p_value = summary(pm10_cases_5YA.nb)$coefficients[,4]))
nox_cases_5YA.nb_irr = data.frame(cbind(exp(cbind(OR = coef(nox_cases_5YA.nb), confint(nox_cases_5YA.nb))), p_value = summary(nox_cases_5YA.nb)$coefficients[,4]))
no2_cases_5YA.nb_irr = data.frame(cbind(exp(cbind(OR = coef(no2_cases_5YA.nb), confint(no2_cases_5YA.nb))), p_value = summary(no2_cases_5YA.nb)$coefficients[,4]))
o3_cases_5YA.nb_irr = data.frame(cbind(exp(cbind(OR = coef(o3_cases_5YA.nb), confint(o3_cases_5YA.nb))), p_value = summary(o3_cases_5YA.nb)$coefficients[,4]))
#make a df just with the pollutants
LA_covid_cases_onlyPoll_5YA = data.frame(rbind(pm25_cases_5YA.nb_irr[nrow(pm25_cases_5YA.nb_irr),],
pm10_cases_5YA.nb_irr[nrow(pm10_cases_5YA.nb_irr),],
nox_cases_5YA.nb_irr[nrow(nox_cases_5YA.nb_irr),],
no2_cases_5YA.nb_irr[nrow(no2_cases_5YA.nb_irr),],
o3_cases_5YA.nb_irr[nrow(o3_cases_5YA.nb_irr),]))
sort data
LA_covid_cases_onlyPoll_5YA$names = row.names(LA_covid_cases_onlyPoll_5YA)
LA_covid_cases_onlyPoll_5YA$significance = "p-value > 0.05"
LA_covid_cases_onlyPoll_5YA$significance[LA_covid_cases_onlyPoll_5YA$p_value < 0.05] <- "p-value < 0.05"
write.csv(LA_covid_cases_onlyPoll_5YA,"data_output/LA_covid_cases_onlyPoll_5YA.csv", row.names = F)
plot
ggplot(LA_covid_cases_onlyPoll_5YA, aes(x=reorder(names, OR), y=OR, color=significance)) +
geom_point(fill="white", shape=21, size = 2) +
geom_errorbar(aes(ymin=X2.5.., ymax=X97.5..),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
geom_hline(yintercept = 1, linetype="dotted") +
coord_flip()+ylab("Infectivity rate ratios, pre-lockdown") +
xlab("Pollutants, 5 year average (2014-2018)") + theme_classic()
ggsave('fig_out/LA_CASES_odds_IRR_5YA.pdf')
april_deaths = read.csv("data_output/LA_deaths_April2020.csv")[,c("Code","april_deaths")]
april_cases = read.csv("data_output/COVID_LA_cases.csv")[,c("lad19_cd","april_26_cases")]
colnames(april_cases) <- c("Code","april_cases")
april_cases= april_cases %>%
group_by(Code) %>%
summarise(april_cases = sum(april_cases))
## `summarise()` ungrouping output (override with `.groups` argument)
resub_dt = merge(april_cases,april_deaths, by = "Code")
resub_dt = merge(avgAP_dt,resub_dt, by = "Code", all.x=T)
#ADD 2018 data
resub_dt = merge(resub_dt,
read.csv("data_output/merged_covidAir_cov_dt_LA.csv")[,c("Code",
"pm25_val",
"no2_val",
"o3_val",
"pm10_val",
"nox_val")], by = "Code", all.x=T)
write.csv(resub_dt, "data_output/LA_dt_resub_full.csv", row.names = F)
LA_covid_pm25 <-glm.nb(data = resub_dt, april_cases ~ X2018.people.per.sq..km + Mean_ann_earnings + median_age_2018 +
pm25_val)
LA_covid_no2 <-glm.nb(data = resub_dt, april_cases ~ X2018.people.per.sq..km + Mean_ann_earnings + median_age_2018 +
no2_val)
LA_covid_nox <-glm.nb(data = resub_dt, april_cases ~ X2018.people.per.sq..km + Mean_ann_earnings + median_age_2018 +
nox_val)
LA_covid_pm10 <-glm.nb(data = resub_dt, april_cases ~ X2018.people.per.sq..km + Mean_ann_earnings + median_age_2018 +
pm10_val)
LA_covid_o3 <-glm.nb(data = resub_dt, april_cases ~ X2018.people.per.sq..km + Mean_ann_earnings + median_age_2018 +
o3_val)
LA_covid_pm25_5yAvg <-glm.nb(data = resub_dt, april_cases ~ pop_dens_5ya + earnings_5ya + age_5ya +
pm25_5yAvg)
LA_covid_no2_5yAvg <-glm.nb(data = resub_dt, april_cases ~ pop_dens_5ya + earnings_5ya + age_5ya +
no2_5yAvg)
LA_covid_nox_5yAvg <-glm.nb(data = resub_dt, april_cases ~ pop_dens_5ya + earnings_5ya + age_5ya +
nox_5yAvg)
LA_covid_pm10_5yAvg <-glm.nb(data = resub_dt, april_cases ~ pop_dens_5ya + earnings_5ya + age_5ya +
pm10_5yAvg)
LA_covid_o3_5yAvg <-glm.nb(data = resub_dt, april_cases ~ pop_dens_5ya + earnings_5ya + age_5ya +
o3_5yAvg)
stargazer::stargazer(LA_covid_pm25,LA_covid_pm10,LA_covid_no2,LA_covid_nox,LA_covid_o3,
LA_covid_pm25_5yAvg,LA_covid_pm10_5yAvg,LA_covid_no2_5yAvg,
LA_covid_nox_5yAvg,LA_covid_o3_5yAvg,
type="html",out = "fig_out/LA_covid_NB_model_resub_cases.html",
dep.var.labels="COVID positive or not",
single.row=TRUE)
Dependent variable: | ||||||||||
COVID positive or not | ||||||||||
(1) | (2) | (3) | (4) | (5) | (6) | (7) | (8) | (9) | (10) | |
X2018.people.per.sq..km | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) | 0.00004 (0.00002) | 0.00003 (0.00003) | 0.00004\*\* (0.00002) | |||||
Mean\_ann\_earnings | \-0.00000 (0.00001) | \-0.00000 (0.00001) | \-0.00002\*\* (0.00001) | \-0.00002\*\* (0.00001) | 0.00000 (0.00001) | |||||
median\_age\_2018 | \-0.099\*\*\* (0.012) | \-0.095\*\*\* (0.011) | \-0.055\*\*\* (0.013) | \-0.059\*\*\* (0.012) | \-0.072\*\*\* (0.011) | |||||
pm25\_val | \-0.120\*\*\* (0.031) | |||||||||
pm10\_val | \-0.087\*\*\* (0.020) | |||||||||
no2\_val | 0.033\*\*\* (0.010) | |||||||||
nox\_val | 0.019\*\*\* (0.006) | |||||||||
o3\_val | \-0.065\*\*\* (0.012) | |||||||||
pop\_dens\_5ya | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) | 0.00005\*\* (0.00002) | 0.00005\* (0.00002) | 0.00002 (0.00002) | |||||
earnings\_5ya | \-0.00001\*\* (0.00001) | \-0.00001\*\* (0.00001) | \-0.00001\*\* (0.00001) | \-0.00001\* (0.00001) | \-0.00000 (0.00001) | |||||
age\_5ya | \-0.089\*\*\* (0.012) | \-0.089\*\*\* (0.012) | \-0.065\*\*\* (0.013) | \-0.066\*\*\* (0.013) | \-0.067\*\*\* (0.011) | |||||
pm25\_5yAvg | \-0.038\*\* (0.019) | |||||||||
pm10\_5yAvg | \-0.032\*\* (0.014) | |||||||||
no2\_5yAvg | 0.020\*\*\* (0.007) | |||||||||
nox\_5yAvg | 0.012\*\*\* (0.004) | |||||||||
o3\_5yAvg | \-0.256\*\*\* (0.041) | |||||||||
Constant | 10.872\*\*\* (0.629) | 10.801\*\*\* (0.586) | 7.974\*\*\* (0.605) | 8.233\*\*\* (0.564) | 9.129\*\*\* (0.480) | 10.074\*\*\* (0.594) | 10.149\*\*\* (0.586) | 8.463\*\*\* (0.617) | 8.493\*\*\* (0.601) | 9.243\*\*\* (0.505) |
Observations | 283 | 283 | 283 | 283 | 283 | 283 | 283 | 283 | 283 | 283 |
Log Likelihood | \-1,867.267 | \-1,865.012 | \-1,869.058 | \-1,869.567 | \-1,861.414 | \-1,873.401 | \-1,872.673 | \-1,870.542 | \-1,870.358 | \-1,856.444 |
theta | 2.217\*\*\* (0.176) | 2.248\*\*\* (0.179) | 2.193\*\*\* (0.174) | 2.186\*\*\* (0.174) | 2.300\*\*\* (0.184) | 2.134\*\*\* (0.169) | 2.143\*\*\* (0.170) | 2.172\*\*\* (0.173) | 2.175\*\*\* (0.173) | 2.375\*\*\* (0.190) |
Akaike Inf. Crit. | 3,744.535 | 3,740.024 | 3,748.115 | 3,749.134 | 3,732.829 | 3,756.802 | 3,755.346 | 3,751.084 | 3,750.716 | 3,722.888 |
Note: | *p\<0.1; **p\<0.05; ***p\<0.01 |
list_df_full = do.call("rbind", list(
summary(LA_covid_pm25)$coefficients[nrow(summary(LA_covid_pm25)$coefficients),1:4],
summary(LA_covid_pm10)$coefficients[nrow(summary(LA_covid_pm10)$coefficients),1:4],
summary(LA_covid_no2)$coefficients[nrow(summary(LA_covid_no2)$coefficients),1:4],
summary(LA_covid_nox)$coefficients[nrow(summary(LA_covid_nox)$coefficients),1:4],
summary(LA_covid_o3)$coefficients[nrow(summary(LA_covid_o3)$coefficients),1:4],
summary(LA_covid_pm25_5yAvg)$coefficients[nrow(summary(LA_covid_pm25_5yAvg)$coefficients),1:4],
summary(LA_covid_pm10_5yAvg)$coefficients[nrow(summary(LA_covid_pm10_5yAvg)$coefficients),1:4],
summary(LA_covid_no2_5yAvg)$coefficients[nrow(summary(LA_covid_no2_5yAvg)$coefficients),1:4],
summary(LA_covid_nox_5yAvg)$coefficients[nrow(summary(LA_covid_nox_5yAvg)$coefficients),1:4],
summary(LA_covid_o3_5yAvg)$coefficients[nrow(summary(LA_covid_o3_5yAvg)$coefficients),1:4]))
#add labels
pollutants = c("pm25","pm10","no2","nox","o3",
"pm25_5yAvg","pm10_5yAvg","no2_5yAvg","nox_5yAvg","o3_5yAvg")
pollutants = data.frame(pollutants)
list_df_full = cbind(pollutants, list_df_full)
colnames(list_df_full) <- c("pollutants", "Estimate","Std", "z","p")
list_df_full$OR = exp(list_df_full$Estimate)
list_df_full$upper = exp(list_df_full$Estimate + list_df_full$Std)
list_df_full$lower = exp(list_df_full$Estimate - list_df_full$Std)
list_df_full$significance = "p-value<0.05"
list_df_full$significance[list_df_full$p > 0.05] <- "p-value>0.05"
write.csv(list_df_full, "data_output/LA_covidInfection_RRs_fullData_resub.csv", row.names=F)
## odds ratios and 95% CI
library(ggplot2)
ggplot(list_df_full,
aes(x=reorder(pollutants, OR), y=OR, color = significance)) +
geom_point(shape=21, size = 2) +
geom_errorbar(aes(ymin=lower, ymax=upper),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
theme_classic() +
geom_hline(yintercept = 1, linetype="dotted") +
coord_flip()+ylab("Infectivity rate ratios, resub") +
xlab("pollutants")
ggsave('fig_out/LA_covidInfection_RRs_fullData_resub.pdf', width = 6, height = 4)
LA_covid_pm25 <-glm.nb(data = resub_dt, april_deaths ~ X2018.people.per.sq..km + Mean_ann_earnings + median_age_2018 +
pm25_val)
LA_covid_no2 <-glm.nb(data = resub_dt, april_deaths ~ X2018.people.per.sq..km + Mean_ann_earnings + median_age_2018 +
no2_val)
LA_covid_nox <-glm.nb(data = resub_dt, april_deaths ~ X2018.people.per.sq..km + Mean_ann_earnings + median_age_2018 +
nox_val)
LA_covid_pm10 <-glm.nb(data = resub_dt, april_deaths ~ X2018.people.per.sq..km + Mean_ann_earnings + median_age_2018 +
pm10_val)
LA_covid_o3 <-glm.nb(data = resub_dt, april_deaths ~ X2018.people.per.sq..km + Mean_ann_earnings + median_age_2018 +
o3_val)
LA_covid_pm25_5yAvg <-glm.nb(data = resub_dt, april_deaths ~ pop_dens_5ya + earnings_5ya + age_5ya +
pm25_5yAvg)
LA_covid_no2_5yAvg <-glm.nb(data = resub_dt, april_deaths ~ pop_dens_5ya + earnings_5ya + age_5ya +
no2_5yAvg)
LA_covid_nox_5yAvg <-glm.nb(data = resub_dt, april_deaths ~ pop_dens_5ya + earnings_5ya + age_5ya +
nox_5yAvg)
LA_covid_pm10_5yAvg <-glm.nb(data = resub_dt, april_deaths ~ pop_dens_5ya + earnings_5ya + age_5ya +
pm10_5yAvg)
LA_covid_o3_5yAvg <-glm.nb(data = resub_dt, april_deaths ~ pop_dens_5ya + earnings_5ya + age_5ya +
o3_5yAvg)
stargazer::stargazer(LA_covid_pm25,LA_covid_pm10,LA_covid_no2,LA_covid_nox,LA_covid_o3,
LA_covid_pm25_5yAvg,LA_covid_pm10_5yAvg,LA_covid_no2_5yAvg,
LA_covid_nox_5yAvg,LA_covid_o3_5yAvg,
type="html",out = "fig_out/LA_covid_NB_model_resub_deaths.html",
dep.var.labels="COVID-19 death",
single.row=TRUE)
Dependent variable: | ||||||||||
COVID-19 death | ||||||||||
(1) | (2) | (3) | (4) | (5) | (6) | (7) | (8) | (9) | (10) | |
X2018.people.per.sq..km | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) | 0.00003 (0.00002) | 0.00003 (0.00002) | 0.00005\*\* (0.00002) | |||||
Mean\_ann\_earnings | 0.00000 (0.00001) | 0.00000 (0.00001) | \-0.00001 (0.00001) | \-0.00001 (0.00001) | 0.00000 (0.00001) | |||||
median\_age\_2018 | \-0.076\*\*\* (0.011) | \-0.074\*\*\* (0.010) | \-0.043\*\*\* (0.011) | \-0.046\*\*\* (0.011) | \-0.059\*\*\* (0.010) | |||||
pm25\_val | \-0.060\*\* (0.029) | |||||||||
pm10\_val | \-0.047\*\* (0.018) | |||||||||
no2\_val | 0.031\*\*\* (0.009) | |||||||||
nox\_val | 0.018\*\*\* (0.005) | |||||||||
o3\_val | \-0.042\*\*\* (0.011) | |||||||||
pop\_dens\_5ya | 0.0001\*\*\* (0.00002) | 0.0001\*\*\* (0.00002) | 0.00004\*\* (0.00002) | 0.00004\* (0.00002) | 0.00003 (0.00002) | |||||
earnings\_5ya | \-0.00001 (0.00001) | \-0.00000 (0.00001) | \-0.00000 (0.00001) | \-0.00000 (0.00001) | 0.00000 (0.00001) | |||||
age\_5ya | \-0.069\*\*\* (0.011) | \-0.069\*\*\* (0.011) | \-0.046\*\*\* (0.011) | \-0.046\*\*\* (0.011) | \-0.056\*\*\* (0.010) | |||||
pm25\_5yAvg | 0.005 (0.018) | |||||||||
pm10\_5yAvg | 0.0004 (0.012) | |||||||||
no2\_5yAvg | 0.025\*\*\* (0.006) | |||||||||
nox\_5yAvg | 0.016\*\*\* (0.004) | |||||||||
o3\_5yAvg | \-0.183\*\*\* (0.038) | |||||||||
Constant | 8.145\*\*\* (0.577) | 8.170\*\*\* (0.540) | 6.087\*\*\* (0.544) | 6.310\*\*\* (0.508) | 7.170\*\*\* (0.442) | 7.466\*\*\* (0.538) | 7.536\*\*\* (0.533) | 6.150\*\*\* (0.548) | 6.196\*\*\* (0.534) | 7.254\*\*\* (0.467) |
Observations | 283 | 283 | 283 | 283 | 283 | 283 | 283 | 283 | 283 | 283 |
Log Likelihood | \-1,535.037 | \-1,534.045 | \-1,530.920 | \-1,531.456 | \-1,530.728 | \-1,537.505 | \-1,537.548 | \-1,528.701 | \-1,528.580 | \-1,526.697 |
theta | 2.683\*\*\* (0.220) | 2.700\*\*\* (0.221) | 2.759\*\*\* (0.227) | 2.749\*\*\* (0.226) | 2.761\*\*\* (0.227) | 2.641\*\*\* (0.216) | 2.641\*\*\* (0.216) | 2.799\*\*\* (0.230) | 2.801\*\*\* (0.230) | 2.837\*\*\* (0.234) |
Akaike Inf. Crit. | 3,080.074 | 3,078.090 | 3,071.840 | 3,072.912 | 3,071.456 | 3,085.011 | 3,085.096 | 3,067.402 | 3,067.160 | 3,063.394 |
Note: | *p\<0.1; **p\<0.05; ***p\<0.01 |
list_df_full = do.call("rbind", list(
summary(LA_covid_pm25)$coefficients[nrow(summary(LA_covid_pm25)$coefficients),1:4],
summary(LA_covid_pm10)$coefficients[nrow(summary(LA_covid_pm10)$coefficients),1:4],
summary(LA_covid_no2)$coefficients[nrow(summary(LA_covid_no2)$coefficients),1:4],
summary(LA_covid_nox)$coefficients[nrow(summary(LA_covid_nox)$coefficients),1:4],
summary(LA_covid_o3)$coefficients[nrow(summary(LA_covid_o3)$coefficients),1:4],
summary(LA_covid_pm25_5yAvg)$coefficients[nrow(summary(LA_covid_pm25_5yAvg)$coefficients),1:4],
summary(LA_covid_pm10_5yAvg)$coefficients[nrow(summary(LA_covid_pm10_5yAvg)$coefficients),1:4],
summary(LA_covid_no2_5yAvg)$coefficients[nrow(summary(LA_covid_no2_5yAvg)$coefficients),1:4],
summary(LA_covid_nox_5yAvg)$coefficients[nrow(summary(LA_covid_nox_5yAvg)$coefficients),1:4],
summary(LA_covid_o3_5yAvg)$coefficients[nrow(summary(LA_covid_o3_5yAvg)$coefficients),1:4]))
#add labels
pollutants = c("pm25","pm10","no2","nox","o3",
"pm25_5yAvg","pm10_5yAvg","no2_5yAvg","nox_5yAvg","o3_5yAvg")
pollutants = data.frame(pollutants)
list_df_full = cbind(pollutants, list_df_full)
colnames(list_df_full) <- c("pollutants", "Estimate","Std", "z","p")
list_df_full$OR = exp(list_df_full$Estimate)
list_df_full$upper = exp(list_df_full$Estimate + list_df_full$Std)
list_df_full$lower = exp(list_df_full$Estimate - list_df_full$Std)
list_df_full$significance = "p-value<0.05"
list_df_full$significance[list_df_full$p > 0.05] <- "p-value>0.05"
write.csv(list_df_full, "data_output/LA_covidDEATH_RRs_fullData_resub.csv", row.names=F)
## odds ratios and 95% CI
library(ggplot2)
ggplot(list_df_full,
aes(x=reorder(pollutants, OR), y=OR, color = significance)) +
geom_point(shape=21, size = 2) +
geom_errorbar(aes(ymin=lower, ymax=upper),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
theme_classic() +
geom_hline(yintercept = 1, linetype="dotted") +
coord_flip()+ylab("Mortality rate ratios, resub") +
xlab("pollutants")
ggsave('fig_out/LA_covidDEATH_RRs_fullData_resub.pdf', width = 6, height = 4)
covid_df = read.csv("data/covid19_result.txt", sep = '\t')
covid_id = covid_df[,c(1,5,6)]
colnames(covid_id)
## [1] "eid" "origin" "result"
covid_id_unique = aggregate(covid_id, by=list(covid_id$eid),
FUN=max)
ggplot(data = covid_id_unique, aes(x = result))+
geom_histogram(stat="count")
ggplot(data = covid_id_unique, aes(x = origin))+
geom_histogram(stat="count")
Look at time
covid_df_time = covid_df
covid_df_time$specdate = as.Date(covid_df_time$specdate, format="%d/%m/%Y")
covid_df_time<-covid_df_time %>%
group_by(specdate) %>%
mutate(culm_cases=cumsum(result))
ggplot(covid_df_time, aes(x=specdate, y=culm_cases)) +
geom_bar(stat="identity") +
scale_x_date(date_labels = "%d/%m/%Y")
Data curation
### load non-covid UKB phenotype data ----
ukb_covid = read.csv("data/29_4_2020_ukb41646_covid19_subset.csv")[,-1]
#check distribution
#here, I will aggregate the columns by selecting the last results, for each
# reshape
ukb_covid_t = melt(ukb_covid, id='eid')
ukb_covid_t <- ukb_covid_t[order(ukb_covid_t$variable),]
#delete everything after period
ukb_covid_t$variable = gsub("\\..*","",ukb_covid_t$variable)
#aggregate by last
ukb_covid_t_na = na.omit(ukb_covid_t)
ukb_covid_t_na = aggregate(ukb_covid_t_na, by=list(ukb_covid_t_na$eid,ukb_covid_t_na$variable), FUN=last)
# put it back straight
ukb_covid_t_na$variable = as.factor(ukb_covid_t_na$variable)
ukb_covid_t_na = ukb_covid_t_na[,-c(1,2)]
ukb_covid_cur = cast(ukb_covid_t_na, eid~variable)
### label UKB data ----
#get levels and labels
lvl.0493 <- c(-141,-131,-121,0)
lbl.0493 <- c("Often","Sometimes","Do not know","Rarely/never")
lvl.0007 <- c(0,1)
lbl.0007 <- c("No","Yes")
lvl.100349 <- c(-3,-1,0,1)
lbl.100349 <- c("Prefer not to answer","Do not know","No","Yes")
lvl.0090 <- c(-3,0,1,2)
lbl.0090 <- c("Prefer not to answer","Never","Previous","Current")
lvl.0009 <- c(0,1)
lbl.0009 <- c("Female","Male")
### eid replacement, saved here ----
yy_replace <- function(string, patterns, replacements) {
for (i in seq_along(patterns))
string <- gsub(patterns[i], replacements[i], string, perl=TRUE)
string
}
#label = replacement, lvl = to replace
ukb_covid_cur$X22609 <- yy_replace(ukb_covid_cur$X22609, lvl.0493, lbl.0493)
ukb_covid_cur$X22610 <- yy_replace(ukb_covid_cur$X22610, lvl.0493, lbl.0493)
ukb_covid_cur$X22611 <- yy_replace(ukb_covid_cur$X22611, lvl.0493, lbl.0493)
ukb_covid_cur$X22615 <- yy_replace(ukb_covid_cur$X22615, lvl.0493, lbl.0493)
ukb_covid_cur$X22616 <- yy_replace(ukb_covid_cur$X22616, lvl.0007, lbl.0007)
ukb_covid_cur$X2316 <- yy_replace(ukb_covid_cur$X2316, lvl.100349, lbl.100349)
ukb_covid_cur$X31 <- yy_replace(ukb_covid_cur$X31, lvl.0009, lbl.0009)
ukb_covid_cur$X22130 <- yy_replace(ukb_covid_cur$X22130, lvl.0007, lbl.0007)
ukb_covid_cur$X2443 <- yy_replace(ukb_covid_cur$X2443, lvl.100349, lbl.100349)
ukb_covid_cur$X20116 <- yy_replace(ukb_covid_cur$X20116, lvl.0090, lbl.0090)
### put custom column names----
#get the ID that I designed
UID_names = read.csv("data/UKB_list_columns_AIRcovid.csv")[c("FieldID","my_colname")]
UID_names$FieldID = paste("X",UID_names$FieldID, sep = "")
UID_names$my_colname = lapply(UID_names$my_colname,toString)
UID_names[nrow(UID_names) + 1,] = c("eid","eid")
# sort it according to the dataset
UID_names_match = colnames(ukb_covid_cur)
UID_sort_df = data.frame(UID_names_match)
colnames(UID_sort_df) <- "FieldID"
#merge to sort
UID_sort_df = merge(UID_sort_df, UID_names, by = "FieldID")
# this is now sorted according to the correct df
#replace column names
ukb_covid_df = ukb_covid_cur
colnames(ukb_covid_df) <- UID_sort_df$my_colname
### merge both ukb datasets together ----
ukb_covid_merge = merge(covid_id_unique[,-1], ukb_covid_df, by = "eid")
nrow(ukb_covid_merge)
## [1] 1474
further processing…
# do 2020 since COVID occurred in this year
ukb_covid_merge$age = 2020 - ukb_covid_merge$birthYear
ukb_covid_merge$whr = ukb_covid_merge$waist / ukb_covid_merge$hip
#definition from the NHS: Diastolic >= 90 OR Systolic >=140
ukb_covid_merge$highBP[ukb_covid_merge$diaBP>=90 | ukb_covid_merge$sysBP>=140] <-1
ukb_covid_merge$highBP[ukb_covid_merge$diaBP<90 | ukb_covid_merge$sysBP<140] <-0
# add a column: COVID + and inpatient
ukb_covid_merge$inpatient_covid <-0
ukb_covid_merge$inpatient_covid [ukb_covid_merge$origin == 1 & ukb_covid_merge$origin == 1] <-1
write.csv(ukb_covid_merge, "data_output/processed_29_4_2020_ukb41646_covid19.csv")
I will load the air pollutants & match the covid patients to their nearest pollution climate mapping location.
### load pm2.5 data ----
#note the pop weighted data have the area code, but not the x y coords
pm25 = read.csv("data/processed_PM25_uk-air_annual_mean_mappm252018g.csv", na.strings = "MISSING")[c("x","y","pm252018g")]
nrow(pm25)
## [1] 281802
pm25_na = na.omit(pm25)
nrow(pm25_na)
## [1] 254877
### convert to lon lat - pm2.5 ----
# transform: easting/northing -> long/lat
### shortcuts (from https://stephendavidgregory.github.io/useful/UKgrid_to_LatLon)
ukgrid <- "+init=epsg:27700"
latlong <- "+init=epsg:4326"
### Create coordinates variable
pm25_coords <- cbind(Easting = as.numeric(as.character(pm25_na$x)),
Northing = as.numeric(as.character(pm25_na$y)))
### Create the SpatialPointsDataFrame
pm25_SP <- SpatialPointsDataFrame(pm25_coords,
data = pm25_na,
proj4string = CRS("+init=epsg:27700"))
### Convert
pm25_LL <- spTransform(pm25_SP, CRS(latlong))
pm25_ll_df = data.frame('pm25_lon' = coordinates(pm25_LL)[, 1], 'pm25_lat' = coordinates(pm25_LL)[, 2], 'pm25_val' = pm25_LL$pm252018g)
write.csv(pm25_ll_df,"data_output/processed_pm25_lonlat.csv")
# convert ukb data eastings northing to lon lat
ukb_covid_coord_na = na.omit(ukb_covid_merge[c("eid","x_coord","y_coord")])
nrow(ukb_covid_coord_na)
## [1] 1464
ukb_covid_coords <- cbind(Easting = as.numeric(as.character(ukb_covid_coord_na$x_coord)),
Northing = as.numeric(as.character(ukb_covid_coord_na$y_coord)))
### Create the SpatialPointsDataFrame
ukb_covid_SP <- SpatialPointsDataFrame(ukb_covid_coords,
data = ukb_covid_coord_na,
proj4string = CRS("+init=epsg:27700"))
### Convert
ukb_covid_ll <- spTransform(ukb_covid_SP, CRS(latlong))
ukb_covid_ll_df = data.frame('ukb_lon' = coordinates(ukb_covid_ll)[, 1],
'ukb_lat' = coordinates(ukb_covid_ll)[, 2],
'eid' = ukb_covid_ll$eid)
write.csv(ukb_covid_ll_df, ("data_output/ukb_covid_lonlat_df.csv"))
# plot UKB locations
world <- ne_countries(scale = "medium", returnclass = "sf")
ggplot(data = world) +
geom_sf() +
geom_point(data = ukb_covid_ll_df, aes(x = ukb_lon, y = ukb_lat), size = 1,
shape = 23, fill = "darkred") +
coord_sf(ylim = c(min(ukb_covid_ll_df$ukb_lat)-2, max(ukb_covid_ll_df$ukb_lat)+4),
xlim = c(min(ukb_covid_ll_df$ukb_lon)-4, max(ukb_covid_ll_df$ukb_lon)+3), expand = FALSE)
ggsave('fig_out/UKB_COVID_participants_locations.pdf')
I wrote a python script: match_air_to_UKB_covid
ukb_eid_pm25_merged = read.csv('data_output/merged_ukb_pm25.csv')[c("eid","distance","pm25_val")]
ukb_covid_pm25_df = merge(ukb_covid_merge, ukb_eid_pm25_merged, by = 'eid')
colnames(ukb_covid_pm25_df)
## [1] "eid" "origin" "result" "n_cancers"
## [5] "townsend" "x_coord" "y_coord" "smoking"
## [9] "fev" "copd" "WP_dusty" "WP_chemicals"
## [13] "WP_cig" "WP_diesel" "breathing" "whistling"
## [17] "diabetes" "sex" "birthYear" "diaBP"
## [21] "sysBP" "waist" "hip" "height"
## [25] "age" "whr" "highBP" "inpatient_covid"
## [29] "distance" "pm25_val"
#omit those without coordinates in the ukb data
### merge pop density----
ukb_cities_eid = read.csv("data_output/2_5_2020_ukb_eid_match_cities.csv")[c("eid","spec_area","gen_area")]
nrow(ukb_cities_eid)
## [1] 1464
popDens_2018 = read.csv("data/2018_official_popDensity.csv")[c("Name","X2018.people.per.sq..km")]
popDens_2018$X2018.people.per.sq..km = gsub(",", "", popDens_2018$X2018.people.per.sq..km)
popDens_2018$X2018.people.per.sq..km = as.numeric(popDens_2018$X2018.people.per.sq..km)
#Make everything lower case
ukb_cities_eid$spec_area = tolower(ukb_cities_eid$spec_area)
ukb_cities_eid$gen_area = tolower(ukb_cities_eid$gen_area)
popDens_2018$Name = tolower(popDens_2018$Name)
#delete repeated strings
ukb_cities_eid$spec_area = gsub("london borough of ", "", ukb_cities_eid$spec_area)
ukb_cities_eid$spec_area = gsub("royal borough of ", "", ukb_cities_eid$spec_area)
ukb_cities_eid$spec_area = gsub("city of ", "", ukb_cities_eid$spec_area)
ukb_cities_eid$spec_area = gsub("st helens", "helens", ukb_cities_eid$spec_area)
ukb_cities_eid$gen_area = gsub(" combined authority", "", ukb_cities_eid$gen_area)
popDens_2018$Name = gsub(", city of", "", popDens_2018$Name)
popDens_2018$Name = gsub("st. ", "", popDens_2018$Name)
#delete west midlands because its redundant (all cities are well defined) and there are several regions called west midlands so it's confusing
popDens_2018$Name = gsub("west midlands", "deleted_west_midlands_because_redundant", popDens_2018$Name)
popDens_2018$Name = gsub("\\s*\\([^\\)]+\\)","",popDens_2018$Name)
ukb_cities_eid_merged = merge(ukb_cities_eid, popDens_2018, by.x="spec_area",by.y="Name", all.x=TRUE)
colnames(ukb_cities_eid_merged)[4] <- "spec_Pop_dens_perkm2"
nrow(ukb_cities_eid_merged)
## [1] 1464
#check for duplicated rows in popDens
length(unique(popDens_2018$Name))
## [1] 434
length(popDens_2018$Name) # there is a dupicate
## [1] 435
popDens_2018$Name[duplicated(popDens_2018$Name)]
## [1] "deleted_west_midlands_because_redundant"
ukb_cities_eid_merged2 = merge(ukb_cities_eid_merged, popDens_2018, by.x="gen_area",by.y="Name", all.x=TRUE,all.y=FALSE)
colnames(ukb_cities_eid_merged2)[5] <- "gen_Pop_dens_perkm2"
nrow(ukb_cities_eid_merged2)
## [1] 1464
#QC
sum(is.na(ukb_cities_eid_merged2$spec_Pop_dens_perkm2))
## [1] 214
sum(is.na(ukb_cities_eid_merged2$gen_Pop_dens_perkm2))
## [1] 516
ukb_cities_eid_merged2$merged_popDens_km2 <- ifelse(is.na(ukb_cities_eid_merged2$spec_Pop_dens_perkm2),
ukb_cities_eid_merged2$gen_Pop_dens_perkm2, ukb_cities_eid_merged2$spec_Pop_dens_perkm2)
ukb_cities_eid_out = ukb_cities_eid_merged2[c("eid","merged_popDens_km2")]
ukb_covid_pm25_popDens_df = merge(ukb_covid_pm25_df, ukb_cities_eid_out, by = 'eid')
nrow(ukb_covid_pm25_df)
## [1] 1464
ukb_covid_pm25_popDens_df = read.csv("data_output/2_5_2020_full_cov_CV_UKB_air_dataset.csv")
ukb_covid_pm25_popDens_df$smoker = ifelse(ukb_covid_pm25_popDens_df$smoking =="Current", "smoking", "not_smoking")
library(MASS)
summary(ukb_covid_pm25.nb <-glm.nb(data = ukb_covid_pm25_popDens_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age + pm25_val))
##
## Call:
## glm.nb(formula = result ~ merged_popDens_km2 + n_cancers + townsend +
## smoking + whistling + diabetes + whr + sex + age + pm25_val,
## data = ukb_covid_pm25_popDens_df, init.theta = 15229.88441,
## link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2059 -0.9479 -0.8028 0.6589 1.1708
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.065e+00 8.543e-01 -2.418 0.01562 *
## merged_popDens_km2 -9.956e-06 1.884e-05 -0.529 0.59711
## n_cancers -1.258e-01 1.272e-01 -0.989 0.32253
## townsend 1.354e-02 1.310e-02 1.034 0.30121
## smokingNever 3.665e-01 1.379e-01 2.658 0.00785 **
## smokingPrefer not to answer 7.608e-01 4.478e-01 1.699 0.08929 .
## smokingPrevious 4.056e-01 1.377e-01 2.946 0.00322 **
## whistlingNo 2.671e-01 2.698e-01 0.990 0.32220
## whistlingPrefer not to answer 4.568e-02 1.276e+00 0.036 0.97145
## whistlingYes 2.533e-01 2.742e-01 0.924 0.35559
## diabetesNo -1.619e-01 6.106e-01 -0.265 0.79085
## diabetesPrefer not to answer -6.877e-01 1.433e+00 -0.480 0.63140
## diabetesYes -1.956e-01 6.195e-01 -0.316 0.75215
## whr 7.856e-01 5.839e-01 1.346 0.17846
## sexMale 3.340e-02 1.000e-01 0.334 0.73838
## age -5.681e-03 4.748e-03 -1.196 0.23153
## pm25_val 6.035e-02 2.866e-02 2.106 0.03520 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(15229.88) family taken to be 1)
##
## Null deviance: 1040.2 on 1449 degrees of freedom
## Residual deviance: 1015.6 on 1433 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 2365.6
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 15230
## Std. Err.: 58208
## Warning while fitting theta: iteration limit reached
##
## 2 x log-likelihood: -2329.646
summary(ukb_covid_pm25.p <-glm(data = ukb_covid_pm25_popDens_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age + pm25_val, family = 'poisson'))
##
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend +
## smoking + whistling + diabetes + whr + sex + age + pm25_val,
## family = "poisson", data = ukb_covid_pm25_popDens_df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2060 -0.9479 -0.8028 0.6589 1.1708
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.065e+00 8.543e-01 -2.418 0.01562 *
## merged_popDens_km2 -9.956e-06 1.883e-05 -0.529 0.59711
## n_cancers -1.258e-01 1.272e-01 -0.989 0.32252
## townsend 1.354e-02 1.310e-02 1.034 0.30120
## smokingNever 3.665e-01 1.379e-01 2.658 0.00785 **
## smokingPrefer not to answer 7.609e-01 4.478e-01 1.699 0.08928 .
## smokingPrevious 4.056e-01 1.377e-01 2.946 0.00321 **
## whistlingNo 2.671e-01 2.698e-01 0.990 0.32219
## whistlingPrefer not to answer 4.567e-02 1.276e+00 0.036 0.97145
## whistlingYes 2.533e-01 2.742e-01 0.924 0.35558
## diabetesNo -1.619e-01 6.105e-01 -0.265 0.79084
## diabetesPrefer not to answer -6.877e-01 1.433e+00 -0.480 0.63138
## diabetesYes -1.956e-01 6.194e-01 -0.316 0.75214
## whr 7.856e-01 5.839e-01 1.346 0.17845
## sexMale 3.340e-02 1.000e-01 0.334 0.73837
## age -5.681e-03 4.748e-03 -1.196 0.23151
## pm25_val 6.035e-02 2.866e-02 2.106 0.03520 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1040.2 on 1449 degrees of freedom
## Residual deviance: 1015.6 on 1433 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 2363.6
##
## Number of Fisher Scoring iterations: 5
#I dont' think that pop density should be an offset here
#check for multicolinearity
car::vif(ukb_covid_pm25.nb)
## GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 2.443689 1 1.563230
## n_cancers 1.019337 1 1.009622
## townsend 1.442148 1 1.200895
## smoking 1.259363 3 1.039182
## whistling 1.875826 3 1.110535
## diabetes 2.049216 3 1.127019
## whr 1.850854 1 1.360461
## sex 1.619879 1 1.272744
## age 1.138711 1 1.067104
## pm25_val 2.011763 1 1.418366
# according to http://www.jstor.org/stable/2290467, GVIF^(1/(2*Df)) < 5 is ok
# model reduction:
summary(ukb_covid_pm25.nb_red <-glm.nb(data = ukb_covid_pm25_popDens_df, result ~ merged_popDens_km2 + smoking +pm25_val))
##
## Call:
## glm.nb(formula = result ~ merged_popDens_km2 + smoking + pm25_val,
## data = ukb_covid_pm25_popDens_df, init.theta = 15585.93096,
## link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1217 -0.9499 -0.8075 0.6755 1.0620
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.639e+00 2.629e-01 -6.235 4.52e-10 ***
## merged_popDens_km2 -9.261e-07 1.680e-05 -0.055 0.95604
## smokingNever 3.369e-01 1.344e-01 2.507 0.01216 *
## smokingPrefer not to answer 6.113e-01 4.261e-01 1.435 0.15139
## smokingPrevious 3.899e-01 1.353e-01 2.882 0.00395 **
## pm25_val 5.801e-02 2.833e-02 2.048 0.04055 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(15585.93) family taken to be 1)
##
## Null deviance: 1049.0 on 1461 degrees of freedom
## Residual deviance: 1032.7 on 1456 degrees of freedom
## (2 observations deleted due to missingness)
## AIC: 2370.7
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 15586
## Std. Err.: 60471
## Warning while fitting theta: iteration limit reached
##
## 2 x log-likelihood: -2356.739
summary(ukb_covid_pm25.p_red <-glm(data = ukb_covid_pm25_popDens_df, result ~ merged_popDens_km2 + smoking + pm25_val, family = 'poisson'))
##
## Call:
## glm(formula = result ~ merged_popDens_km2 + smoking + pm25_val,
## family = "poisson", data = ukb_covid_pm25_popDens_df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1218 -0.9499 -0.8075 0.6756 1.0620
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.639e+00 2.629e-01 -6.235 4.52e-10 ***
## merged_popDens_km2 -9.261e-07 1.680e-05 -0.055 0.95604
## smokingNever 3.369e-01 1.344e-01 2.508 0.01216 *
## smokingPrefer not to answer 6.113e-01 4.261e-01 1.435 0.15138
## smokingPrevious 3.899e-01 1.353e-01 2.882 0.00395 **
## pm25_val 5.801e-02 2.832e-02 2.048 0.04055 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1049.0 on 1461 degrees of freedom
## Residual deviance: 1032.7 on 1456 degrees of freedom
## (2 observations deleted due to missingness)
## AIC: 2368.7
##
## Number of Fisher Scoring iterations: 5
#check
anova(ukb_covid_pm25.nb, ukb_covid_pm25.nb_red) # p < 0.05 -> not significantly different
## Likelihood ratio tests of Negative Binomial Models
##
## Response: result
## Model
## 1 merged_popDens_km2 + smoking + pm25_val
## 2 merged_popDens_km2 + n_cancers + townsend + smoking + whistling + diabetes + whr + sex + age + pm25_val
## theta Resid. df 2 x log-lik. Test df LR stat. Pr(Chi)
## 1 15585.93 1456 -2356.739
## 2 15229.88 1433 -2329.646 1 vs 2 23 27.0924 0.2520556
Conclusion:
1. PM2.5 is a significant predictor of COVID-19: more
cases with PM2.5 levels are high
2. Smoking interpretation
The
probability of infection of people who never smoked is the expected
difference in log count between reference group -> 1.442676 * rate of
current smoker
That of previous smokers = 1.5001871 * rate of
current smoker
3. In the actual analysis, I can explore using
binary models
### convert NO2 x y to lat lon ----
### shortcuts (from https://stephendavidgregory.github.io/useful/UKgrid_to_LatLon)
no2_raw_dt = na.omit(read.csv("data/processed_30_4_2020_NO2_map2018g.csv", na.strings = "MISSING")[c("x","y","no22018")])
ukgrid <- "+init=epsg:27700"
latlong <- "+init=epsg:4326"
### Create coordinates variable
no2_coords <- cbind(Easting = as.numeric(as.character(no2_raw_dt$x)),
Northing = as.numeric(as.character(no2_raw_dt$y)))
no2_LL <- spTransform(SpatialPointsDataFrame(no2_coords,
data = no2_raw_dt,
proj4string = CRS("+init=epsg:27700")), CRS(latlong))
no2_LL_df = data.frame('no2_lon' = coordinates(no2_LL)[, 1], 'no2_lat' = coordinates(no2_LL)[, 2], 'no2_val' = no2_LL$no22018)
write.csv(no2_LL_df,"data_output/processed_no2_lonlat.csv")
### convert SO2 x y to lat lon ----
so2_raw_dt = na.omit(read.csv("data/processed_30_4_2020_SO2_map2018g.csv", na.strings = "MISSING")[c("x","y","so22018")])
so2_coords <- cbind(Easting = as.numeric(as.character(so2_raw_dt$x)),
Northing = as.numeric(as.character(so2_raw_dt$y)))
so2_LL <- spTransform(SpatialPointsDataFrame(so2_coords,
data = so2_raw_dt,
proj4string = CRS("+init=epsg:27700")), CRS(latlong))
so2_LL_df = data.frame('so2_lon' = coordinates(so2_LL)[, 1], 'so2_lat' = coordinates(so2_LL)[, 2], 'so2_val' = so2_LL$so22018)
write.csv(so2_LL_df,"data_output/processed_so2_lonlat.csv")
### convert 03 x y to lat lon ----
o3_raw_dt = na.omit(read.csv("data/processed_30_4_2020_O3_map2018g.csv", na.strings = "MISSING")[c("x","y","dgt12018")])
o3_coords <- cbind(Easting = as.numeric(as.character(o3_raw_dt$x)),
Northing = as.numeric(as.character(o3_raw_dt$y)))
o3_LL <- spTransform(SpatialPointsDataFrame(o3_coords,
data = o3_raw_dt,
proj4string = CRS("+init=epsg:27700")), CRS(latlong))
o3_LL_df = data.frame('o3_lon' = coordinates(o3_LL)[, 1], 'o3_lat' = coordinates(o3_LL)[, 2], 'o3_val' = o3_LL$dgt12018)
write.csv(o3_LL_df,"data_output/processed_o3_lonlat.csv")
### convert PM10 x y to lat lon ----
pm10_raw_dt = na.omit(read.csv("data/processed_30_4_2020_PM10_mappm102018g.csv", na.strings = "MISSING")[c("x","y","pm102018g")])
pm10_coords <- cbind(Easting = as.numeric(as.character(pm10_raw_dt$x)),
Northing = as.numeric(as.character(pm10_raw_dt$y)))
pm10_LL <- spTransform(SpatialPointsDataFrame(pm10_coords,
data = pm10_raw_dt,
proj4string = CRS("+init=epsg:27700")), CRS(latlong))
pm10_LL_df = data.frame('pm10_lon' = coordinates(pm10_LL)[, 1], 'pm10_lat' = coordinates(pm10_LL)[, 2], 'pm10_val' = pm10_LL$pm102018g)
write.csv(pm10_LL_df,"data_output/processed_pm10_lonlat.csv")
### convert NOx x y to lat lon ----
nox_raw_dt = na.omit(read.csv("data/processed_30_4_2020_NOX_map2018g.csv", na.strings = "MISSING")[c("x","y","nox2018")])
nox_coords <- cbind(Easting = as.numeric(as.character(nox_raw_dt$x)),
Northing = as.numeric(as.character(nox_raw_dt$y)))
nox_LL <- spTransform(SpatialPointsDataFrame(nox_coords,
data = nox_raw_dt,
proj4string = CRS("+init=epsg:27700")), CRS(latlong))
nox_LL_df = data.frame('nox_lon' = coordinates(nox_LL)[, 1], 'nox_lat' = coordinates(nox_LL)[, 2], 'nox_val' = nox_LL$nox2018)
write.csv(nox_LL_df,"data_output/processed_nox_lonlat.csv")
load and merge all data
no2_ukb = read.csv("data_output/merged_ukb_no2.csv")[c('eid','no2_val')]
so2_ukb = read.csv("data_output/merged_ukb_so2.csv")[c('eid','so2_val')]
o3_ukb = read.csv("data_output/merged_ukb_o3.csv")[c('eid','o3_val')]
nox_ukb = read.csv("data_output/merged_ukb_nox.csv")[c('eid','nox_val')]
pm10_ukb = read.csv("data_output/merged_ukb_pm10.csv")[c('eid','pm10_val')]
ukb_additional_pol = merge(no2_ukb, so2_ukb, by = 'eid')
ukb_additional_pol = merge(ukb_additional_pol, o3_ukb, by = 'eid')
ukb_additional_pol = merge(ukb_additional_pol, nox_ukb, by = 'eid')
ukb_additional_pol = merge(ukb_additional_pol, pm10_ukb, by = 'eid')
ukb_covid_allPol_df = merge(ukb_covid_pm25_popDens_df, ukb_additional_pol, by = 'eid')
library(arsenal)
ukb_descript_stats <- tableby(result ~ ., data = ukb_covid_allPol_df[,3:ncol(ukb_covid_allPol_df)])
summary(ukb_descript_stats, title = "Descriptive statistics of the UK Biobank data")
##
## Table: Descriptive statistics of the UK Biobank data
##
## | | 0 (N=800) | 1 (N=664) | Total (N=1464) | p value|
## |:--------------------------------------|:-----------------------:|:-----------------------:|:-----------------------:|-------:|
## |**origin** | | | | < 0.001|
## | Mean (SD) | 0.672 (0.470) | 0.858 (0.349) | 0.757 (0.429) | |
## | Range | 0.000 - 1.000 | 0.000 - 1.000 | 0.000 - 1.000 | |
## |**n_cancers** | | | | 0.173|
## | N-Miss | 0 | 1 | 1 | |
## | Mean (SD) | 0.119 (0.350) | 0.095 (0.309) | 0.108 (0.332) | |
## | Range | 0.000 - 2.000 | 0.000 - 2.000 | 0.000 - 2.000 | |
## |**townsend** | | | | 0.042|
## | Mean (SD) | -0.338 (3.506) | 0.038 (3.551) | -0.167 (3.530) | |
## | Range | -6.156 - 8.873 | -6.079 - 8.940 | -6.156 - 8.940 | |
## |**x_coord** | | | | 0.010|
## | Mean (SD) | 428271.250 (56003.689) | 436143.072 (60211.610) | 431841.530 (58062.493) | |
## | Range | 321000.000 - 542000.000 | 327000.000 - 540000.000 | 321000.000 - 542000.000 | |
## |**y_coord** | | | | 0.108|
## | Mean (SD) | 342893.750 (133805.914) | 331689.759 (131765.710) | 337812.158 (132956.294) | |
## | Range | 149000.000 - 580000.000 | 152000.000 - 580000.000 | 149000.000 - 580000.000 | |
## |**smoking** | | | | 0.002|
## | N-Miss | 0 | 2 | 2 | |
## | Current | 135 (16.9%) | 68 (10.3%) | 203 (13.9%) | |
## | Never | 353 (44.1%) | 302 (45.6%) | 655 (44.8%) | |
## | Prefer not to answer | 4 (0.5%) | 6 (0.9%) | 10 (0.7%) | |
## | Previous | 308 (38.5%) | 286 (43.2%) | 594 (40.6%) | |
## |**fev** | | | | 0.084|
## | N-Miss | 559 | 484 | 1043 | |
## | Mean (SD) | 2.899 (0.531) | 2.992 (0.567) | 2.939 (0.548) | |
## | Range | 1.658 - 4.522 | 1.954 - 4.098 | 1.658 - 4.522 | |
## |**copd** | | | | 0.055|
## | N-Miss | 644 | 564 | 1208 | |
## | No | 147 (94.2%) | 99 (99.0%) | 246 (96.1%) | |
## | Yes | 9 (5.8%) | 1 (1.0%) | 10 (3.9%) | |
## |**WP_dusty** | | | | 0.189|
## | N-Miss | 645 | 566 | 1211 | |
## | Often | 7 (4.5%) | 10 (10.2%) | 17 (6.7%) | |
## | Rarely/never | 119 (76.8%) | 73 (74.5%) | 192 (75.9%) | |
## | Sometimes | 29 (18.7%) | 15 (15.3%) | 44 (17.4%) | |
## |**WP_chemicals** | | | | 0.182|
## | N-Miss | 646 | 566 | 1212 | |
## | Do not know | 1 (0.6%) | 2 (2.0%) | 3 (1.2%) | |
## | Often | 2 (1.3%) | 5 (5.1%) | 7 (2.8%) | |
## | Rarely/never | 136 (88.3%) | 79 (80.6%) | 215 (85.3%) | |
## | Sometimes | 15 (9.7%) | 12 (12.2%) | 27 (10.7%) | |
## |**WP_cig** | | | | 0.843|
## | N-Miss | 645 | 566 | 1211 | |
## | Do not know | 1 (0.6%) | 0 (0.0%) | 1 (0.4%) | |
## | Often | 8 (5.2%) | 6 (6.1%) | 14 (5.5%) | |
## | Rarely/never | 115 (74.2%) | 74 (75.5%) | 189 (74.7%) | |
## | Sometimes | 31 (20.0%) | 18 (18.4%) | 49 (19.4%) | |
## |**WP_diesel** | | | | 0.383|
## | N-Miss | 645 | 566 | 1211 | |
## | Do not know | 2 (1.3%) | 3 (3.1%) | 5 (2.0%) | |
## | Often | 2 (1.3%) | 4 (4.1%) | 6 (2.4%) | |
## | Rarely/never | 141 (91.0%) | 85 (86.7%) | 226 (89.3%) | |
## | Sometimes | 10 (6.5%) | 6 (6.1%) | 16 (6.3%) | |
## |**breathing** | | | | 0.256|
## | N-Miss | 645 | 566 | 1211 | |
## | No | 144 (92.9%) | 87 (88.8%) | 231 (91.3%) | |
## | Yes | 11 (7.1%) | 11 (11.2%) | 22 (8.7%) | |
## |**whistling** | | | | 0.834|
## | N-Miss | 0 | 2 | 2 | |
## | Do not know | 25 (3.1%) | 16 (2.4%) | 41 (2.8%) | |
## | No | 527 (65.9%) | 442 (66.8%) | 969 (66.3%) | |
## | Prefer not to answer | 2 (0.2%) | 1 (0.2%) | 3 (0.2%) | |
## | Yes | 246 (30.8%) | 203 (30.7%) | 449 (30.7%) | |
## |**diabetes** | | | | 0.816|
## | N-Miss | 0 | 2 | 2 | |
## | Do not know | 3 (0.4%) | 3 (0.5%) | 6 (0.4%) | |
## | No | 721 (90.1%) | 588 (88.8%) | 1309 (89.5%) | |
## | Prefer not to answer | 2 (0.2%) | 1 (0.2%) | 3 (0.2%) | |
## | Yes | 74 (9.2%) | 70 (10.6%) | 144 (9.8%) | |
## |**sex** | | | | 0.043|
## | Female | 393 (49.1%) | 291 (43.8%) | 684 (46.7%) | |
## | Male | 407 (50.9%) | 373 (56.2%) | 780 (53.3%) | |
## |**birthYear** | | | | 0.396|
## | Mean (SD) | 1950.341 (8.665) | 1950.729 (8.737) | 1950.517 (8.697) | |
## | Range | 1937.000 - 1970.000 | 1937.000 - 1969.000 | 1937.000 - 1970.000 | |
## |**diaBP** | | | | 0.045|
## | N-Miss | 16 | 21 | 37 | |
## | Mean (SD) | 81.922 (10.923) | 83.075 (10.603) | 82.441 (10.791) | |
## | Range | 46.000 - 125.000 | 57.000 - 120.000 | 46.000 - 125.000 | |
## |**sysBP** | | | | 0.361|
## | N-Miss | 17 | 21 | 38 | |
## | Mean (SD) | 136.498 (19.270) | 137.440 (19.499) | 136.923 (19.372) | |
## | Range | 92.000 - 219.000 | 91.000 - 229.000 | 91.000 - 229.000 | |
## |**waist** | | | | 0.010|
## | N-Miss | 7 | 5 | 12 | |
## | Mean (SD) | 94.368 (15.035) | 96.393 (14.517) | 95.287 (14.831) | |
## | Range | 61.000 - 151.000 | 62.000 - 166.000 | 61.000 - 166.000 | |
## |**hip** | | | | 0.142|
## | N-Miss | 7 | 5 | 12 | |
## | Mean (SD) | 104.880 (11.008) | 105.732 (11.006) | 105.267 (11.012) | |
## | Range | 83.000 - 158.000 | 82.000 - 172.000 | 82.000 - 172.000 | |
## |**height** | | | | 0.995|
## | N-Miss | 8 | 7 | 15 | |
## | Mean (SD) | 168.848 (9.503) | 168.851 (9.247) | 168.849 (9.385) | |
## | Range | 143.000 - 197.000 | 141.000 - 196.000 | 141.000 - 197.000 | |
## |**age** | | | | 0.396|
## | Mean (SD) | 69.659 (8.665) | 69.271 (8.737) | 69.483 (8.697) | |
## | Range | 50.000 - 83.000 | 51.000 - 83.000 | 50.000 - 83.000 | |
## |**whr** | | | | 0.010|
## | N-Miss | 7 | 5 | 12 | |
## | Mean (SD) | 0.898 (0.092) | 0.910 (0.090) | 0.904 (0.091) | |
## | Range | 0.642 - 1.211 | 0.660 - 1.194 | 0.642 - 1.211 | |
## |**highBP** | | | | 0.199|
## | N-Miss | 16 | 21 | 37 | |
## | Mean (SD) | 0.189 (0.392) | 0.216 (0.412) | 0.201 (0.401) | |
## | Range | 0.000 - 1.000 | 0.000 - 1.000 | 0.000 - 1.000 | |
## |**inpatient_covid** | | | | < 0.001|
## | Mean (SD) | 0.672 (0.470) | 0.858 (0.349) | 0.757 (0.429) | |
## | Range | 0.000 - 1.000 | 0.000 - 1.000 | 0.000 - 1.000 | |
## |**distance** | | | | 0.070|
## | Mean (SD) | 963.123 (21.328) | 961.081 (21.525) | 962.197 (21.434) | |
## | Range | 930.406 - 1001.337 | 930.476 - 1001.203 | 930.406 - 1001.337 | |
## |**pm25_val** | | | | < 0.001|
## | Mean (SD) | 8.859 (1.853) | 9.218 (1.924) | 9.022 (1.893) | |
## | Range | 5.630 - 13.650 | 5.769 - 13.667 | 5.630 - 13.667 | |
## |**merged_popDens_km2** | | | | 0.008|
## | Mean (SD) | 2770.719 (2909.404) | 3197.601 (3228.642) | 2964.332 (3064.644) | |
## | Range | 64.000 - 16097.000 | 64.000 - 16097.000 | 64.000 - 16097.000 | |
## |**smoker** | | | | < 0.001|
## | N-Miss | 0 | 2 | 2 | |
## | not_smoking | 665 (83.1%) | 594 (89.7%) | 1259 (86.1%) | |
## | smoking | 135 (16.9%) | 68 (10.3%) | 203 (13.9%) | |
## |**no2_val** | | | | < 0.001|
## | Mean (SD) | 15.485 (6.074) | 16.789 (6.469) | 16.076 (6.288) | |
## | Range | 4.987 - 48.672 | 5.632 - 48.672 | 4.987 - 48.672 | |
## |**so2_val** | | | | 0.027|
## | Mean (SD) | 1.668 (0.546) | 1.731 (0.544) | 1.697 (0.545) | |
## | Range | 0.661 - 5.108 | 0.700 - 4.578 | 0.661 - 5.108 | |
## |**o3_val** | | | | 0.753|
## | Mean (SD) | 5.621 (2.653) | 5.576 (2.668) | 5.601 (2.659) | |
## | Range | 0.000 - 13.188 | 0.000 - 12.731 | 0.000 - 13.188 | |
## |**nox_val** | | | | < 0.001|
## | Mean (SD) | 21.955 (10.770) | 24.163 (11.841) | 22.956 (11.318) | |
## | Range | 6.254 - 95.880 | 7.116 - 95.880 | 6.254 - 95.880 | |
## |**pm10_val** | | | | < 0.001|
## | Mean (SD) | 13.430 (2.683) | 13.925 (2.816) | 13.655 (2.755) | |
## | Range | 8.244 - 21.178 | 8.735 - 21.207 | 8.244 - 21.207 | |
# PM2.5
summary(ukb_covid_pm25.nb <-glm.nb(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age + pm25_val))
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
##
## Call:
## glm.nb(formula = result ~ merged_popDens_km2 + n_cancers + townsend +
## smoking + whistling + diabetes + whr + sex + age + pm25_val,
## data = ukb_covid_allPol_df, init.theta = 15229.88441, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2059 -0.9479 -0.8028 0.6589 1.1708
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.065e+00 8.543e-01 -2.418 0.01562 *
## merged_popDens_km2 -9.956e-06 1.884e-05 -0.529 0.59711
## n_cancers -1.258e-01 1.272e-01 -0.989 0.32253
## townsend 1.354e-02 1.310e-02 1.034 0.30121
## smokingNever 3.665e-01 1.379e-01 2.658 0.00785 **
## smokingPrefer not to answer 7.608e-01 4.478e-01 1.699 0.08929 .
## smokingPrevious 4.056e-01 1.377e-01 2.946 0.00322 **
## whistlingNo 2.671e-01 2.698e-01 0.990 0.32220
## whistlingPrefer not to answer 4.568e-02 1.276e+00 0.036 0.97145
## whistlingYes 2.533e-01 2.742e-01 0.924 0.35559
## diabetesNo -1.619e-01 6.106e-01 -0.265 0.79085
## diabetesPrefer not to answer -6.877e-01 1.433e+00 -0.480 0.63140
## diabetesYes -1.956e-01 6.195e-01 -0.316 0.75215
## whr 7.856e-01 5.839e-01 1.346 0.17846
## sexMale 3.340e-02 1.000e-01 0.334 0.73838
## age -5.681e-03 4.748e-03 -1.196 0.23153
## pm25_val 6.035e-02 2.866e-02 2.106 0.03520 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(15229.88) family taken to be 1)
##
## Null deviance: 1040.2 on 1449 degrees of freedom
## Residual deviance: 1015.6 on 1433 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 2365.6
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 15230
## Std. Err.: 58208
## Warning while fitting theta: iteration limit reached
##
## 2 x log-likelihood: -2329.646
summary(ukb_covid_pm25.p <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age + pm25_val, family = 'poisson'))
##
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend +
## smoking + whistling + diabetes + whr + sex + age + pm25_val,
## family = "poisson", data = ukb_covid_allPol_df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2060 -0.9479 -0.8028 0.6589 1.1708
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.065e+00 8.543e-01 -2.418 0.01562 *
## merged_popDens_km2 -9.956e-06 1.883e-05 -0.529 0.59711
## n_cancers -1.258e-01 1.272e-01 -0.989 0.32252
## townsend 1.354e-02 1.310e-02 1.034 0.30120
## smokingNever 3.665e-01 1.379e-01 2.658 0.00785 **
## smokingPrefer not to answer 7.609e-01 4.478e-01 1.699 0.08928 .
## smokingPrevious 4.056e-01 1.377e-01 2.946 0.00321 **
## whistlingNo 2.671e-01 2.698e-01 0.990 0.32219
## whistlingPrefer not to answer 4.567e-02 1.276e+00 0.036 0.97145
## whistlingYes 2.533e-01 2.742e-01 0.924 0.35558
## diabetesNo -1.619e-01 6.105e-01 -0.265 0.79084
## diabetesPrefer not to answer -6.877e-01 1.433e+00 -0.480 0.63138
## diabetesYes -1.956e-01 6.194e-01 -0.316 0.75214
## whr 7.856e-01 5.839e-01 1.346 0.17845
## sexMale 3.340e-02 1.000e-01 0.334 0.73837
## age -5.681e-03 4.748e-03 -1.196 0.23151
## pm25_val 6.035e-02 2.866e-02 2.106 0.03520 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1040.2 on 1449 degrees of freedom
## Residual deviance: 1015.6 on 1433 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 2363.6
##
## Number of Fisher Scoring iterations: 5
car::vif(ukb_covid_pm25.nb)
## GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 2.443689 1 1.563230
## n_cancers 1.019337 1 1.009622
## townsend 1.442148 1 1.200895
## smoking 1.259363 3 1.039182
## whistling 1.875826 3 1.110535
## diabetes 2.049216 3 1.127019
## whr 1.850854 1 1.360461
## sex 1.619879 1 1.272744
## age 1.138711 1 1.067104
## pm25_val 2.011763 1 1.418366
# according to http://www.jstor.org/stable/2290467, GVIF^(1/(2*Df)) < 5 is ok
# model reduction:
summary(ukb_covid_pm25.nb_red <-glm.nb(data = ukb_covid_pm25_popDens_df, result ~ merged_popDens_km2 + smoking +pm25_val))
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
##
## Call:
## glm.nb(formula = result ~ merged_popDens_km2 + smoking + pm25_val,
## data = ukb_covid_pm25_popDens_df, init.theta = 15585.93096,
## link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1217 -0.9499 -0.8075 0.6755 1.0620
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.639e+00 2.629e-01 -6.235 4.52e-10 ***
## merged_popDens_km2 -9.261e-07 1.680e-05 -0.055 0.95604
## smokingNever 3.369e-01 1.344e-01 2.507 0.01216 *
## smokingPrefer not to answer 6.113e-01 4.261e-01 1.435 0.15139
## smokingPrevious 3.899e-01 1.353e-01 2.882 0.00395 **
## pm25_val 5.801e-02 2.833e-02 2.048 0.04055 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(15585.93) family taken to be 1)
##
## Null deviance: 1049.0 on 1461 degrees of freedom
## Residual deviance: 1032.7 on 1456 degrees of freedom
## (2 observations deleted due to missingness)
## AIC: 2370.7
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 15586
## Std. Err.: 60471
## Warning while fitting theta: iteration limit reached
##
## 2 x log-likelihood: -2356.739
summary(ukb_covid_pm25.p_red <-glm(data = ukb_covid_pm25_popDens_df, result ~ merged_popDens_km2 + smoking + pm25_val, family = 'poisson'))
##
## Call:
## glm(formula = result ~ merged_popDens_km2 + smoking + pm25_val,
## family = "poisson", data = ukb_covid_pm25_popDens_df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1218 -0.9499 -0.8075 0.6756 1.0620
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.639e+00 2.629e-01 -6.235 4.52e-10 ***
## merged_popDens_km2 -9.261e-07 1.680e-05 -0.055 0.95604
## smokingNever 3.369e-01 1.344e-01 2.508 0.01216 *
## smokingPrefer not to answer 6.113e-01 4.261e-01 1.435 0.15138
## smokingPrevious 3.899e-01 1.353e-01 2.882 0.00395 **
## pm25_val 5.801e-02 2.832e-02 2.048 0.04055 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1049.0 on 1461 degrees of freedom
## Residual deviance: 1032.7 on 1456 degrees of freedom
## (2 observations deleted due to missingness)
## AIC: 2368.7
##
## Number of Fisher Scoring iterations: 5
#check
anova(ukb_covid_pm25.nb, ukb_covid_pm25.nb_red) # p < 0.05 -> not significantly different
## Likelihood ratio tests of Negative Binomial Models
##
## Response: result
## Model
## 1 merged_popDens_km2 + smoking + pm25_val
## 2 merged_popDens_km2 + n_cancers + townsend + smoking + whistling + diabetes + whr + sex + age + pm25_val
## theta Resid. df 2 x log-lik. Test df LR stat. Pr(Chi)
## 1 15585.93 1456 -2356.739
## 2 15229.88 1433 -2329.646 1 vs 2 23 27.0924 0.2520556
#PM10
summary(ukb_covid_pm10.nb <-glm.nb(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age + pm10_val))
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
##
## Call:
## glm.nb(formula = result ~ merged_popDens_km2 + n_cancers + townsend +
## smoking + whistling + diabetes + whr + sex + age + pm10_val,
## data = ukb_covid_allPol_df, init.theta = 15278.17332, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2226 -0.9484 -0.8030 0.6608 1.1713
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.035e+00 8.581e-01 -2.371 0.01774 *
## merged_popDens_km2 -9.198e-06 1.962e-05 -0.469 0.63913
## n_cancers -1.255e-01 1.272e-01 -0.987 0.32375
## townsend 1.338e-02 1.310e-02 1.021 0.30706
## smokingNever 3.635e-01 1.378e-01 2.637 0.00836 **
## smokingPrefer not to answer 7.550e-01 4.479e-01 1.685 0.09189 .
## smokingPrevious 4.008e-01 1.376e-01 2.913 0.00358 **
## whistlingNo 2.608e-01 2.697e-01 0.967 0.33356
## whistlingPrefer not to answer 4.399e-02 1.272e+00 0.035 0.97242
## whistlingYes 2.481e-01 2.741e-01 0.905 0.36539
## diabetesNo -1.706e-01 6.104e-01 -0.279 0.77992
## diabetesPrefer not to answer -6.936e-01 1.430e+00 -0.485 0.62768
## diabetesYes -2.053e-01 6.193e-01 -0.332 0.74021
## whr 7.986e-01 5.841e-01 1.367 0.17153
## sexMale 3.228e-02 1.000e-01 0.323 0.74690
## age -5.740e-03 4.748e-03 -1.209 0.22678
## pm10_val 3.829e-02 2.056e-02 1.862 0.06254 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(15278.17) family taken to be 1)
##
## Null deviance: 1040.2 on 1449 degrees of freedom
## Residual deviance: 1016.5 on 1433 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 2366.6
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 15278
## Std. Err.: 58535
## Warning while fitting theta: iteration limit reached
##
## 2 x log-likelihood: -2330.591
summary(ukb_covid_pm10.p <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age + pm10_val, family = 'poisson'))
##
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend +
## smoking + whistling + diabetes + whr + sex + age + pm10_val,
## family = "poisson", data = ukb_covid_allPol_df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2226 -0.9484 -0.8030 0.6608 1.1713
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.035e+00 8.581e-01 -2.371 0.01774 *
## merged_popDens_km2 -9.198e-06 1.962e-05 -0.469 0.63912
## n_cancers -1.255e-01 1.272e-01 -0.987 0.32374
## townsend 1.338e-02 1.310e-02 1.021 0.30705
## smokingNever 3.635e-01 1.378e-01 2.637 0.00836 **
## smokingPrefer not to answer 7.550e-01 4.479e-01 1.686 0.09188 .
## smokingPrevious 4.008e-01 1.376e-01 2.913 0.00358 **
## whistlingNo 2.608e-01 2.697e-01 0.967 0.33355
## whistlingPrefer not to answer 4.399e-02 1.272e+00 0.035 0.97242
## whistlingYes 2.481e-01 2.741e-01 0.905 0.36538
## diabetesNo -1.706e-01 6.104e-01 -0.279 0.77992
## diabetesPrefer not to answer -6.936e-01 1.430e+00 -0.485 0.62766
## diabetesYes -2.053e-01 6.193e-01 -0.332 0.74020
## whr 7.986e-01 5.841e-01 1.367 0.17152
## sexMale 3.228e-02 1.000e-01 0.323 0.74690
## age -5.740e-03 4.748e-03 -1.209 0.22676
## pm10_val 3.829e-02 2.056e-02 1.863 0.06253 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1040.2 on 1449 degrees of freedom
## Residual deviance: 1016.6 on 1433 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 2364.6
##
## Number of Fisher Scoring iterations: 5
car::vif(ukb_covid_pm10.nb)
## GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 2.690058 1 1.640140
## n_cancers 1.019347 1 1.009627
## townsend 1.448761 1 1.203645
## smoking 1.258582 3 1.039075
## whistling 1.863927 3 1.109357
## diabetes 2.037411 3 1.125934
## whr 1.852457 1 1.361050
## sex 1.620823 1 1.273115
## age 1.139335 1 1.067396
## pm10_val 2.234526 1 1.494833
#nox
summary(ukb_covid_nox.nb <-glm.nb(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age + nox_val))
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
##
## Call:
## glm.nb(formula = result ~ merged_popDens_km2 + n_cancers + townsend +
## smoking + whistling + diabetes + whr + sex + age + nox_val,
## data = ukb_covid_allPol_df, init.theta = 15323.76936, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3845 -0.9475 -0.8039 0.6603 1.1833
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.764e+00 8.293e-01 -2.127 0.03339 *
## merged_popDens_km2 -1.289e-05 2.130e-05 -0.605 0.54515
## n_cancers -1.220e-01 1.271e-01 -0.960 0.33727
## townsend 6.366e-03 1.337e-02 0.476 0.63384
## smokingNever 3.541e-01 1.377e-01 2.571 0.01015 *
## smokingPrefer not to answer 7.226e-01 4.491e-01 1.609 0.10759
## smokingPrevious 3.917e-01 1.374e-01 2.851 0.00436 **
## whistlingNo 2.591e-01 2.697e-01 0.961 0.33670
## whistlingPrefer not to answer 2.364e-03 1.245e+00 0.002 0.99849
## whistlingYes 2.438e-01 2.741e-01 0.890 0.37362
## diabetesNo -1.971e-01 6.103e-01 -0.323 0.74668
## diabetesPrefer not to answer -7.274e-01 1.408e+00 -0.516 0.60552
## diabetesYes -2.279e-01 6.191e-01 -0.368 0.71277
## whr 8.545e-01 5.847e-01 1.462 0.14387
## sexMale 1.812e-02 1.004e-01 0.180 0.85680
## age -5.503e-03 4.747e-03 -1.159 0.24641
## nox_val 1.043e-02 5.565e-03 1.874 0.06097 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(15323.77) family taken to be 1)
##
## Null deviance: 1040.2 on 1449 degrees of freedom
## Residual deviance: 1016.6 on 1433 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 2366.6
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 15324
## Std. Err.: 58831
## Warning while fitting theta: iteration limit reached
##
## 2 x log-likelihood: -2330.647
summary(ukb_covid_nox.p <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age + nox_val, family = 'poisson'))
##
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend +
## smoking + whistling + diabetes + whr + sex + age + nox_val,
## family = "poisson", data = ukb_covid_allPol_df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3845 -0.9476 -0.8039 0.6603 1.1833
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.764e+00 8.293e-01 -2.127 0.03339 *
## merged_popDens_km2 -1.288e-05 2.130e-05 -0.605 0.54515
## n_cancers -1.220e-01 1.271e-01 -0.960 0.33726
## townsend 6.366e-03 1.337e-02 0.476 0.63383
## smokingNever 3.541e-01 1.377e-01 2.571 0.01015 *
## smokingPrefer not to answer 7.226e-01 4.491e-01 1.609 0.10758
## smokingPrevious 3.917e-01 1.374e-01 2.851 0.00436 **
## whistlingNo 2.591e-01 2.697e-01 0.961 0.33668
## whistlingPrefer not to answer 2.361e-03 1.245e+00 0.002 0.99849
## whistlingYes 2.438e-01 2.741e-01 0.890 0.37360
## diabetesNo -1.971e-01 6.103e-01 -0.323 0.74667
## diabetesPrefer not to answer -7.274e-01 1.408e+00 -0.517 0.60549
## diabetesYes -2.279e-01 6.190e-01 -0.368 0.71277
## whr 8.545e-01 5.847e-01 1.462 0.14387
## sexMale 1.812e-02 1.004e-01 0.180 0.85679
## age -5.503e-03 4.747e-03 -1.159 0.24640
## nox_val 1.043e-02 5.565e-03 1.874 0.06097 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1040.2 on 1449 degrees of freedom
## Residual deviance: 1016.6 on 1433 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 2364.6
##
## Number of Fisher Scoring iterations: 5
car::vif(ukb_covid_nox.nb)
## GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 3.190492 1 1.786195
## n_cancers 1.020066 1 1.009983
## townsend 1.522514 1 1.233902
## smoking 1.265027 3 1.039960
## whistling 1.781014 3 1.100976
## diabetes 1.961134 3 1.118797
## whr 1.848865 1 1.359730
## sex 1.633899 1 1.278241
## age 1.141394 1 1.068361
## nox_val 3.273151 1 1.809185
#no2
summary(ukb_covid_no2.nb <-glm.nb(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age + no2_val))
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
##
## Call:
## glm.nb(formula = result ~ merged_popDens_km2 + n_cancers + townsend +
## smoking + whistling + diabetes + whr + sex + age + no2_val,
## data = ukb_covid_allPol_df, init.theta = 15256.03065, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3433 -0.9455 -0.8015 0.6594 1.1994
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.867e+00 8.335e-01 -2.241 0.02506 *
## merged_popDens_km2 -1.812e-05 2.128e-05 -0.852 0.39434
## n_cancers -1.211e-01 1.271e-01 -0.953 0.34075
## townsend 5.084e-03 1.340e-02 0.379 0.70433
## smokingNever 3.554e-01 1.377e-01 2.580 0.00987 **
## smokingPrefer not to answer 7.209e-01 4.493e-01 1.605 0.10860
## smokingPrevious 3.954e-01 1.375e-01 2.876 0.00402 **
## whistlingNo 2.585e-01 2.700e-01 0.957 0.33843
## whistlingPrefer not to answer -1.309e-02 1.238e+00 -0.011 0.99156
## whistlingYes 2.449e-01 2.744e-01 0.892 0.37217
## diabetesNo -1.911e-01 6.113e-01 -0.313 0.75451
## diabetesPrefer not to answer -7.362e-01 1.403e+00 -0.525 0.59967
## diabetesYes -2.218e-01 6.200e-01 -0.358 0.72050
## whr 8.312e-01 5.848e-01 1.421 0.15525
## sexMale 1.920e-02 1.003e-01 0.191 0.84815
## age -5.471e-03 4.748e-03 -1.152 0.24919
## no2_val 2.283e-02 1.039e-02 2.197 0.02804 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(15256.03) family taken to be 1)
##
## Null deviance: 1040.2 on 1449 degrees of freedom
## Residual deviance: 1015.3 on 1433 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 2365.3
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 15256
## Std. Err.: 58376
## Warning while fitting theta: iteration limit reached
##
## 2 x log-likelihood: -2329.3
summary(ukb_covid_no2.p <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age + no2_val, family = 'poisson'))
##
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend +
## smoking + whistling + diabetes + whr + sex + age + no2_val,
## family = "poisson", data = ukb_covid_allPol_df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3434 -0.9455 -0.8015 0.6594 1.1994
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.867e+00 8.335e-01 -2.241 0.02505 *
## merged_popDens_km2 -1.812e-05 2.127e-05 -0.852 0.39433
## n_cancers -1.211e-01 1.271e-01 -0.953 0.34074
## townsend 5.084e-03 1.340e-02 0.379 0.70432
## smokingNever 3.554e-01 1.377e-01 2.580 0.00987 **
## smokingPrefer not to answer 7.209e-01 4.493e-01 1.605 0.10859
## smokingPrevious 3.954e-01 1.375e-01 2.877 0.00402 **
## whistlingNo 2.585e-01 2.700e-01 0.957 0.33841
## whistlingPrefer not to answer -1.310e-02 1.238e+00 -0.011 0.99156
## whistlingYes 2.449e-01 2.744e-01 0.892 0.37215
## diabetesNo -1.911e-01 6.113e-01 -0.313 0.75450
## diabetesPrefer not to answer -7.362e-01 1.402e+00 -0.525 0.59963
## diabetesYes -2.218e-01 6.200e-01 -0.358 0.72049
## whr 8.312e-01 5.848e-01 1.421 0.15524
## sexMale 1.920e-02 1.003e-01 0.191 0.84815
## age -5.471e-03 4.747e-03 -1.152 0.24918
## no2_val 2.283e-02 1.039e-02 2.197 0.02804 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1040.2 on 1449 degrees of freedom
## Residual deviance: 1015.3 on 1433 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 2363.3
##
## Number of Fisher Scoring iterations: 5
car::vif(ukb_covid_no2.nb)
## GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 3.144160 1 1.773178
## n_cancers 1.020039 1 1.009970
## townsend 1.523933 1 1.234477
## smoking 1.267137 3 1.040249
## whistling 1.763147 3 1.099128
## diabetes 1.944539 3 1.117213
## whr 1.847941 1 1.359390
## sex 1.628714 1 1.276211
## age 1.141009 1 1.068180
## no2_val 3.255183 1 1.804212
#o3
summary(ukb_covid_o3.nb <-glm.nb(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age + o3_val))
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
##
## Call:
## glm.nb(formula = result ~ merged_popDens_km2 + n_cancers + townsend +
## smoking + whistling + diabetes + whr + sex + age + o3_val,
## data = ukb_covid_allPol_df, init.theta = 15368.98351, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2076 -0.9503 -0.8062 0.6636 1.1391
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.630e+00 8.293e-01 -1.966 0.04931 *
## merged_popDens_km2 1.614e-05 1.399e-05 1.154 0.24867
## n_cancers -1.289e-01 1.274e-01 -1.012 0.31151
## townsend 1.333e-02 1.346e-02 0.990 0.32206
## smokingNever 3.585e-01 1.381e-01 2.596 0.00942 **
## smokingPrefer not to answer 7.494e-01 4.493e-01 1.668 0.09528 .
## smokingPrevious 3.905e-01 1.377e-01 2.837 0.00455 **
## whistlingNo 2.435e-01 2.687e-01 0.906 0.36477
## whistlingPrefer not to answer 2.093e-02 1.278e+00 0.016 0.98693
## whistlingYes 2.276e-01 2.730e-01 0.834 0.40454
## diabetesNo -1.858e-01 6.083e-01 -0.305 0.76009
## diabetesPrefer not to answer -6.932e-01 1.435e+00 -0.483 0.62909
## diabetesYes -2.278e-01 6.172e-01 -0.369 0.71205
## whr 8.310e-01 5.842e-01 1.422 0.15489
## sexMale 3.618e-02 1.002e-01 0.361 0.71815
## age -5.563e-03 4.749e-03 -1.171 0.24143
## o3_val 7.573e-03 1.544e-02 0.491 0.62369
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(15368.98) family taken to be 1)
##
## Null deviance: 1040.2 on 1449 degrees of freedom
## Residual deviance: 1019.8 on 1433 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 2369.8
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 15369
## Std. Err.: 59189
## Warning while fitting theta: iteration limit reached
##
## 2 x log-likelihood: -2333.797
summary(ukb_covid_o3.p <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age + o3_val, family = 'poisson'))
##
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend +
## smoking + whistling + diabetes + whr + sex + age + o3_val,
## family = "poisson", data = ukb_covid_allPol_df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2076 -0.9503 -0.8062 0.6636 1.1391
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.630e+00 8.292e-01 -1.966 0.04931 *
## merged_popDens_km2 1.614e-05 1.399e-05 1.154 0.24866
## n_cancers -1.289e-01 1.274e-01 -1.012 0.31150
## townsend 1.333e-02 1.346e-02 0.990 0.32205
## smokingNever 3.585e-01 1.381e-01 2.596 0.00942 **
## smokingPrefer not to answer 7.494e-01 4.492e-01 1.668 0.09527 .
## smokingPrevious 3.905e-01 1.377e-01 2.837 0.00455 **
## whistlingNo 2.435e-01 2.687e-01 0.906 0.36476
## whistlingPrefer not to answer 2.092e-02 1.278e+00 0.016 0.98693
## whistlingYes 2.276e-01 2.730e-01 0.834 0.40453
## diabetesNo -1.858e-01 6.083e-01 -0.305 0.76008
## diabetesPrefer not to answer -6.932e-01 1.435e+00 -0.483 0.62906
## diabetesYes -2.278e-01 6.172e-01 -0.369 0.71205
## whr 8.310e-01 5.842e-01 1.423 0.15488
## sexMale 3.618e-02 1.002e-01 0.361 0.71814
## age -5.563e-03 4.749e-03 -1.171 0.24142
## o3_val 7.573e-03 1.544e-02 0.491 0.62368
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1040.2 on 1449 degrees of freedom
## Residual deviance: 1019.8 on 1433 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 2367.8
##
## Number of Fisher Scoring iterations: 5
car::vif(ukb_covid_o3.nb)
## GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 1.370137 1 1.170528
## n_cancers 1.019831 1 1.009867
## townsend 1.535336 1 1.239087
## smoking 1.267302 3 1.040271
## whistling 1.867480 3 1.109710
## diabetes 2.047511 3 1.126863
## whr 1.856088 1 1.362383
## sex 1.627746 1 1.275831
## age 1.140751 1 1.068060
## o3_val 1.098307 1 1.048001
#so2
summary(ukb_covid_so2.nb <-glm.nb(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age + so2_val))
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
##
## Call:
## glm.nb(formula = result ~ merged_popDens_km2 + n_cancers + townsend +
## smoking + whistling + diabetes + whr + sex + age + so2_val,
## data = ukb_covid_allPol_df, init.theta = 15326.12162, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2325 -0.9485 -0.8076 0.6604 1.2053
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.721e+00 8.342e-01 -2.063 0.03915 *
## merged_popDens_km2 1.462e-05 1.409e-05 1.037 0.29962
## n_cancers -1.272e-01 1.274e-01 -0.998 0.31824
## townsend 8.033e-03 1.345e-02 0.597 0.55043
## smokingNever 3.566e-01 1.378e-01 2.587 0.00967 **
## smokingPrefer not to answer 7.159e-01 4.487e-01 1.596 0.11060
## smokingPrevious 3.901e-01 1.375e-01 2.837 0.00456 **
## whistlingNo 2.472e-01 2.689e-01 0.920 0.35778
## whistlingPrefer not to answer -7.566e-03 1.260e+00 -0.006 0.99521
## whistlingYes 2.310e-01 2.732e-01 0.846 0.39781
## diabetesNo -1.926e-01 6.089e-01 -0.316 0.75176
## diabetesPrefer not to answer -6.932e-01 1.420e+00 -0.488 0.62534
## diabetesYes -2.308e-01 6.178e-01 -0.374 0.70864
## whr 8.300e-01 5.845e-01 1.420 0.15560
## sexMale 2.495e-02 1.004e-01 0.248 0.80378
## age -5.453e-03 4.748e-03 -1.148 0.25078
## so2_val 8.195e-02 7.699e-02 1.065 0.28709
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(15326.12) family taken to be 1)
##
## Null deviance: 1040.2 on 1449 degrees of freedom
## Residual deviance: 1018.9 on 1433 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 2368.9
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 15326
## Std. Err.: 58891
## Warning while fitting theta: iteration limit reached
##
## 2 x log-likelihood: -2332.922
summary(ukb_covid_so2.p <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age + so2_val, family = 'poisson'))
##
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend +
## smoking + whistling + diabetes + whr + sex + age + so2_val,
## family = "poisson", data = ukb_covid_allPol_df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2325 -0.9485 -0.8076 0.6605 1.2054
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.721e+00 8.342e-01 -2.063 0.03915 *
## merged_popDens_km2 1.462e-05 1.409e-05 1.037 0.29961
## n_cancers -1.272e-01 1.274e-01 -0.998 0.31823
## townsend 8.033e-03 1.345e-02 0.597 0.55042
## smokingNever 3.566e-01 1.378e-01 2.587 0.00967 **
## smokingPrefer not to answer 7.159e-01 4.487e-01 1.596 0.11059
## smokingPrevious 3.901e-01 1.375e-01 2.837 0.00456 **
## whistlingNo 2.472e-01 2.689e-01 0.920 0.35776
## whistlingPrefer not to answer -7.570e-03 1.260e+00 -0.006 0.99521
## whistlingYes 2.310e-01 2.732e-01 0.846 0.39779
## diabetesNo -1.926e-01 6.089e-01 -0.316 0.75175
## diabetesPrefer not to answer -6.933e-01 1.420e+00 -0.488 0.62532
## diabetesYes -2.309e-01 6.178e-01 -0.374 0.70863
## whr 8.300e-01 5.845e-01 1.420 0.15559
## sexMale 2.495e-02 1.004e-01 0.248 0.80377
## age -5.453e-03 4.748e-03 -1.148 0.25077
## so2_val 8.195e-02 7.698e-02 1.065 0.28708
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1040.2 on 1449 degrees of freedom
## Residual deviance: 1018.9 on 1433 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 2366.9
##
## Number of Fisher Scoring iterations: 5
car::vif(ukb_covid_so2.nb)
## GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 1.386609 1 1.177543
## n_cancers 1.019429 1 1.009668
## townsend 1.534094 1 1.238585
## smoking 1.264270 3 1.039856
## whistling 1.816373 3 1.104589
## diabetes 1.991278 3 1.121645
## whr 1.852603 1 1.361103
## sex 1.633541 1 1.278100
## age 1.141044 1 1.068196
## so2_val 1.185954 1 1.089015
binary model to account for the fact that the response variable is 1/0
# PM2.5
summary(ukb_covid_pm25.b <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age + pm25_val, family = 'binomial'))
##
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend +
## smoking + whistling + diabetes + whr + sex + age + pm25_val,
## family = "binomial", data = ukb_covid_allPol_df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5015 -1.0980 -0.8723 1.2076 1.7802
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.495e+00 1.201e+00 -2.078 0.037729 *
## merged_popDens_km2 -1.665e-05 2.673e-05 -0.623 0.533410
## n_cancers -2.279e-01 1.656e-01 -1.376 0.168882
## townsend 2.553e-02 1.803e-02 1.416 0.156877
## smokingNever 6.317e-01 1.760e-01 3.589 0.000332 ***
## smokingPrefer not to answer 1.517e+00 7.460e-01 2.033 0.042046 *
## smokingPrevious 7.049e-01 1.764e-01 3.996 6.45e-05 ***
## whistlingNo 4.731e-01 3.493e-01 1.355 0.175573
## whistlingPrefer not to answer 1.233e-01 1.636e+00 0.075 0.939912
## whistlingYes 4.484e-01 3.556e-01 1.261 0.207277
## diabetesNo -2.793e-01 8.588e-01 -0.325 0.744986
## diabetesPrefer not to answer -1.350e+00 1.910e+00 -0.707 0.479679
## diabetesYes -3.466e-01 8.712e-01 -0.398 0.690727
## whr 1.465e+00 8.070e-01 1.816 0.069422 .
## sexMale 6.331e-02 1.375e-01 0.461 0.645148
## age -1.069e-02 6.540e-03 -1.634 0.102234
## pm25_val 1.131e-01 3.964e-02 2.853 0.004331 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1997.4 on 1449 degrees of freedom
## Residual deviance: 1952.5 on 1433 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 1986.5
##
## Number of Fisher Scoring iterations: 4
car::vif(ukb_covid_pm25.b)
## GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 2.324259 1 1.524552
## n_cancers 1.020306 1 1.010102
## townsend 1.406078 1 1.185782
## smoking 1.361982 3 1.052839
## whistling 1.946471 3 1.117398
## diabetes 2.222505 3 1.142371
## whr 1.864962 1 1.365636
## sex 1.637204 1 1.279533
## age 1.132009 1 1.063959
## pm25_val 1.946202 1 1.395064
#PM10
summary(ukb_covid_pm10.b <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age + pm10_val, family = 'binomial'))
##
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend +
## smoking + whistling + diabetes + whr + sex + age + pm10_val,
## family = "binomial", data = ukb_covid_allPol_df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5265 -1.1001 -0.8752 1.2106 1.7815
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.436e+00 1.204e+00 -2.023 0.043114 *
## merged_popDens_km2 -1.452e-05 2.764e-05 -0.525 0.599243
## n_cancers -2.270e-01 1.656e-01 -1.371 0.170288
## townsend 2.519e-02 1.802e-02 1.398 0.162234
## smokingNever 6.253e-01 1.758e-01 3.556 0.000376 ***
## smokingPrefer not to answer 1.503e+00 7.454e-01 2.017 0.043730 *
## smokingPrevious 6.950e-01 1.761e-01 3.946 7.95e-05 ***
## whistlingNo 4.602e-01 3.491e-01 1.318 0.187409
## whistlingPrefer not to answer 1.219e-01 1.635e+00 0.075 0.940573
## whistlingYes 4.382e-01 3.554e-01 1.233 0.217639
## diabetesNo -2.959e-01 8.585e-01 -0.345 0.730345
## diabetesPrefer not to answer -1.361e+00 1.909e+00 -0.713 0.475962
## diabetesYes -3.652e-01 8.709e-01 -0.419 0.674974
## whr 1.483e+00 8.066e-01 1.839 0.065893 .
## sexMale 6.211e-02 1.374e-01 0.452 0.651265
## age -1.075e-02 6.536e-03 -1.645 0.099922 .
## pm10_val 7.170e-02 2.826e-02 2.537 0.011168 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1997.4 on 1449 degrees of freedom
## Residual deviance: 1954.2 on 1433 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 1988.2
##
## Number of Fisher Scoring iterations: 4
car::vif(ukb_covid_pm10.b)
## GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 2.479156 1 1.574534
## n_cancers 1.020245 1 1.010072
## townsend 1.405143 1 1.185387
## smoking 1.359251 3 1.052487
## whistling 1.938463 3 1.116631
## diabetes 2.215023 3 1.141729
## whr 1.865545 1 1.365850
## sex 1.637883 1 1.279798
## age 1.132501 1 1.064190
## pm10_val 2.089322 1 1.445449
#nox
summary(ukb_covid_nox.b <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age + nox_val, family = 'binomial'))
##
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend +
## smoking + whistling + diabetes + whr + sex + age + nox_val,
## family = "binomial", data = ukb_covid_allPol_df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8158 -1.0981 -0.8681 1.2109 1.7967
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.971e+00 1.167e+00 -1.688 0.091373 .
## merged_popDens_km2 -2.821e-05 3.058e-05 -0.922 0.356330
## n_cancers -2.220e-01 1.656e-01 -1.341 0.179989
## townsend 1.120e-02 1.841e-02 0.608 0.542937
## smokingNever 6.121e-01 1.758e-01 3.483 0.000497 ***
## smokingPrefer not to answer 1.418e+00 7.419e-01 1.911 0.056052 .
## smokingPrevious 6.853e-01 1.761e-01 3.893 9.92e-05 ***
## whistlingNo 4.534e-01 3.511e-01 1.291 0.196604
## whistlingPrefer not to answer 3.230e-02 1.630e+00 0.020 0.984188
## whistlingYes 4.300e-01 3.574e-01 1.203 0.228927
## diabetesNo -3.427e-01 8.598e-01 -0.399 0.690220
## diabetesPrefer not to answer -1.409e+00 1.907e+00 -0.739 0.460161
## diabetesYes -4.073e-01 8.721e-01 -0.467 0.640507
## whr 1.558e+00 8.069e-01 1.930 0.053549 .
## sexMale 3.870e-02 1.378e-01 0.281 0.778826
## age -1.023e-02 6.537e-03 -1.564 0.117746
## nox_val 2.286e-02 8.580e-03 2.665 0.007708 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1997.4 on 1449 degrees of freedom
## Residual deviance: 1953.4 on 1433 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 1987.4
##
## Number of Fisher Scoring iterations: 4
#no2
summary(ukb_covid_no2.b <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age + no2_val, family = 'binomial'))
##
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend +
## smoking + whistling + diabetes + whr + sex + age + no2_val,
## family = "binomial", data = ukb_covid_allPol_df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7152 -1.0980 -0.8676 1.2074 1.8112
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.146e+00 1.175e+00 -1.826 0.067787 .
## merged_popDens_km2 -3.474e-05 3.012e-05 -1.153 0.248722
## n_cancers -2.206e-01 1.657e-01 -1.332 0.182872
## townsend 9.422e-03 1.844e-02 0.511 0.609470
## smokingNever 6.136e-01 1.759e-01 3.489 0.000485 ***
## smokingPrefer not to answer 1.414e+00 7.420e-01 1.906 0.056635 .
## smokingPrevious 6.904e-01 1.762e-01 3.918 8.94e-05 ***
## whistlingNo 4.519e-01 3.508e-01 1.288 0.197732
## whistlingPrefer not to answer 5.918e-03 1.629e+00 0.004 0.997102
## whistlingYes 4.301e-01 3.571e-01 1.204 0.228481
## diabetesNo -3.365e-01 8.622e-01 -0.390 0.696330
## diabetesPrefer not to answer -1.424e+00 1.908e+00 -0.746 0.455449
## diabetesYes -4.023e-01 8.745e-01 -0.460 0.645508
## whr 1.520e+00 8.073e-01 1.883 0.059643 .
## sexMale 4.126e-02 1.378e-01 0.300 0.764553
## age -1.014e-02 6.542e-03 -1.550 0.121023
## no2_val 4.586e-02 1.507e-02 3.042 0.002350 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1997.4 on 1449 degrees of freedom
## Residual deviance: 1951.3 on 1433 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 1985.3
##
## Number of Fisher Scoring iterations: 4
car::vif(ukb_covid_no2.b)
## GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 2.908510 1 1.705436
## n_cancers 1.020530 1 1.010213
## townsend 1.465272 1 1.210484
## smoking 1.361503 3 1.052777
## whistling 1.873400 3 1.110295
## diabetes 2.153442 3 1.136376
## whr 1.867270 1 1.366481
## sex 1.643660 1 1.282053
## age 1.131877 1 1.063897
## no2_val 3.004997 1 1.733493
#o3
summary(ukb_covid_o3.b <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age + o3_val, family = 'binomial'))
##
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend +
## smoking + whistling + diabetes + whr + sex + age + o3_val,
## family = "binomial", data = ukb_covid_allPol_df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5128 -1.1014 -0.8815 1.2147 1.7384
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.691e+00 1.160e+00 -1.458 0.144974
## merged_popDens_km2 3.266e-05 2.026e-05 1.613 0.106848
## n_cancers -2.303e-01 1.657e-01 -1.390 0.164576
## townsend 2.509e-02 1.857e-02 1.351 0.176671
## smokingNever 6.132e-01 1.757e-01 3.491 0.000482 ***
## smokingPrefer not to answer 1.475e+00 7.434e-01 1.985 0.047180 *
## smokingPrevious 6.728e-01 1.758e-01 3.828 0.000129 ***
## whistlingNo 4.292e-01 3.476e-01 1.235 0.216935
## whistlingPrefer not to answer 7.931e-02 1.637e+00 0.048 0.961358
## whistlingYes 4.005e-01 3.538e-01 1.132 0.257619
## diabetesNo -3.154e-01 8.525e-01 -0.370 0.711418
## diabetesPrefer not to answer -1.337e+00 1.909e+00 -0.700 0.483791
## diabetesYes -3.952e-01 8.650e-01 -0.457 0.647746
## whr 1.542e+00 8.047e-01 1.917 0.055253 .
## sexMale 6.833e-02 1.373e-01 0.498 0.618821
## age -1.036e-02 6.521e-03 -1.589 0.112170
## o3_val 1.394e-02 2.102e-02 0.663 0.507271
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1997.4 on 1449 degrees of freedom
## Residual deviance: 1960.2 on 1433 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 1994.2
##
## Number of Fisher Scoring iterations: 4
car::vif(ukb_covid_o3.b)
## GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 1.334395 1 1.155160
## n_cancers 1.020762 1 1.010327
## townsend 1.497646 1 1.223783
## smoking 1.363683 3 1.053058
## whistling 1.938926 3 1.116675
## diabetes 2.218764 3 1.142050
## whr 1.865199 1 1.365723
## sex 1.642995 1 1.281794
## age 1.132566 1 1.064221
## o3_val 1.092175 1 1.045072
#so2
summary(ukb_covid_so2.b <-glm(data = ukb_covid_allPol_df, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age + so2_val, family = 'binomial'))
##
## Call:
## glm(formula = result ~ merged_popDens_km2 + n_cancers + townsend +
## smoking + whistling + diabetes + whr + sex + age + so2_val,
## family = "binomial", data = ukb_covid_allPol_df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5419 -1.1008 -0.8828 1.2125 1.8154
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.865e+00 1.168e+00 -1.597 0.110294
## merged_popDens_km2 2.955e-05 2.042e-05 1.447 0.147827
## n_cancers -2.274e-01 1.657e-01 -1.372 0.170114
## townsend 1.531e-02 1.853e-02 0.826 0.408529
## smokingNever 6.105e-01 1.755e-01 3.478 0.000505 ***
## smokingPrefer not to answer 1.415e+00 7.429e-01 1.905 0.056727 .
## smokingPrevious 6.719e-01 1.756e-01 3.827 0.000130 ***
## whistlingNo 4.384e-01 3.479e-01 1.260 0.207518
## whistlingPrefer not to answer 2.860e-02 1.632e+00 0.018 0.986015
## whistlingYes 4.079e-01 3.540e-01 1.152 0.249305
## diabetesNo -3.297e-01 8.535e-01 -0.386 0.699264
## diabetesPrefer not to answer -1.338e+00 1.905e+00 -0.703 0.482347
## diabetesYes -4.055e-01 8.659e-01 -0.468 0.639609
## whr 1.542e+00 8.050e-01 1.916 0.055368 .
## sexMale 4.830e-02 1.377e-01 0.351 0.725700
## age -1.015e-02 6.524e-03 -1.556 0.119623
## so2_val 1.553e-01 1.078e-01 1.441 0.149709
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1997.4 on 1449 degrees of freedom
## Residual deviance: 1958.6 on 1433 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 1992.6
##
## Number of Fisher Scoring iterations: 4
car::vif(ukb_covid_so2.b)
## GVIF Df GVIF^(1/(2*Df))
## merged_popDens_km2 1.352441 1 1.162945
## n_cancers 1.020029 1 1.009965
## townsend 1.489026 1 1.220257
## smoking 1.355795 3 1.052040
## whistling 1.916015 3 1.114465
## diabetes 2.192588 3 1.139793
## whr 1.865722 1 1.365914
## sex 1.649086 1 1.284167
## age 1.131975 1 1.063943
## so2_val 1.177317 1 1.085042
Add
plot binary modelsAdd
stargazer(ukb_covid_pm25.b, ukb_covid_pm10.b, ukb_covid_nox.b, ukb_covid_no2.b,
ukb_covid_o3.b, ukb_covid_so2.b,type="html",out = "fig_out/ukb_covid_allPoll_binary.html",
dep.var.labels="COVID positive or not",
single.row=TRUE)
Dependent variable: | ||||||
COVID positive or not | ||||||
(1) | (2) | (3) | (4) | (5) | (6) | |
merged\_popDens\_km2 | \-0.00002 (0.00003) | \-0.00001 (0.00003) | \-0.00003 (0.00003) | \-0.00003 (0.00003) | 0.00003 (0.00002) | 0.00003 (0.00002) |
n\_cancers | \-0.228 (0.166) | \-0.227 (0.166) | \-0.222 (0.166) | \-0.221 (0.166) | \-0.230 (0.166) | \-0.227 (0.166) |
townsend | 0.026 (0.018) | 0.025 (0.018) | 0.011 (0.018) | 0.009 (0.018) | 0.025 (0.019) | 0.015 (0.019) |
smokingNever | 0.632\*\*\* (0.176) | 0.625\*\*\* (0.176) | 0.612\*\*\* (0.176) | 0.614\*\*\* (0.176) | 0.613\*\*\* (0.176) | 0.610\*\*\* (0.176) |
smokingPrefer not to answer | 1.517\*\* (0.746) | 1.503\*\* (0.745) | 1.418\* (0.742) | 1.414\* (0.742) | 1.475\*\* (0.743) | 1.415\* (0.743) |
smokingPrevious | 0.705\*\*\* (0.176) | 0.695\*\*\* (0.176) | 0.685\*\*\* (0.176) | 0.690\*\*\* (0.176) | 0.673\*\*\* (0.176) | 0.672\*\*\* (0.176) |
whistlingNo | 0.473 (0.349) | 0.460 (0.349) | 0.453 (0.351) | 0.452 (0.351) | 0.429 (0.348) | 0.438 (0.348) |
whistlingPrefer not to answer | 0.123 (1.636) | 0.122 (1.635) | 0.032 (1.630) | 0.006 (1.629) | 0.079 (1.637) | 0.029 (1.632) |
whistlingYes | 0.448 (0.356) | 0.438 (0.355) | 0.430 (0.357) | 0.430 (0.357) | 0.401 (0.354) | 0.408 (0.354) |
diabetesNo | \-0.279 (0.859) | \-0.296 (0.858) | \-0.343 (0.860) | \-0.337 (0.862) | \-0.315 (0.853) | \-0.330 (0.853) |
diabetesPrefer not to answer | \-1.350 (1.910) | \-1.361 (1.909) | \-1.409 (1.907) | \-1.424 (1.908) | \-1.337 (1.909) | \-1.338 (1.905) |
diabetesYes | \-0.347 (0.871) | \-0.365 (0.871) | \-0.407 (0.872) | \-0.402 (0.875) | \-0.395 (0.865) | \-0.405 (0.866) |
whr | 1.465\* (0.807) | 1.483\* (0.807) | 1.558\* (0.807) | 1.520\* (0.807) | 1.542\* (0.805) | 1.542\* (0.805) |
sexMale | 0.063 (0.137) | 0.062 (0.137) | 0.039 (0.138) | 0.041 (0.138) | 0.068 (0.137) | 0.048 (0.138) |
age | \-0.011 (0.007) | \-0.011\* (0.007) | \-0.010 (0.007) | \-0.010 (0.007) | \-0.010 (0.007) | \-0.010 (0.007) |
pm25\_val | 0.113\*\*\* (0.040) | |||||
pm10\_val | 0.072\*\* (0.028) | |||||
nox\_val | 0.023\*\*\* (0.009) | |||||
no2\_val | 0.046\*\*\* (0.015) | |||||
o3\_val | 0.014 (0.021) | |||||
so2\_val | 0.155 (0.108) | |||||
Constant | \-2.495\*\* (1.201) | \-2.436\*\* (1.204) | \-1.971\* (1.167) | \-2.146\* (1.175) | \-1.691 (1.160) | \-1.865 (1.168) |
Observations | 1,450 | 1,450 | 1,450 | 1,450 | 1,450 | 1,450 |
Log Likelihood | \-976.235 | \-977.093 | \-976.681 | \-975.636 | \-980.105 | \-979.286 |
Akaike Inf. Crit. | 1,986.469 | 1,988.185 | 1,987.362 | 1,985.271 | 1,994.211 | 1,992.572 |
Note: | *p\<0.1; **p\<0.05; ***p\<0.01 |
Each column of the table corresponds to a binary model used to
predict whether one would get infected based on the concentration of an
air pollutant, as well as covariates. We used different models to
characterise each pollutant because the pollutants are highly
correlated. The raw estimate values of each model are listed with their
standard error in parentheses. The p-values are indicated using the
number of asterisks beside the estimates. Average_Pop_densitykm2,
average population density per square kilometer; NO.levels, nitrogen
oxides levels; no2_val, nitrogen dioxide levels; O3.levels, ozone
levels; Akaike Inf. Crit., Akaike’s Information Criteria
ukb_covid_pm25.b_odds = data.frame(cbind(exp(cbind(OR = coef(ukb_covid_pm25.b), confint(ukb_covid_pm25.b))), p_value = summary(ukb_covid_pm25.b)$coefficients[,4]))
ukb_covid_pm10.b_odds = data.frame(cbind(exp(cbind(OR = coef(ukb_covid_pm10.b), confint(ukb_covid_pm10.b))), p_value = summary(ukb_covid_pm10.b)$coefficients[,4]))
ukb_covid_nox.b_odds = data.frame(cbind(exp(cbind(OR = coef(ukb_covid_nox.b), confint(ukb_covid_nox.b))), p_value = summary(ukb_covid_nox.b)$coefficients[,4]))
ukb_covid_no2.b_odds = data.frame(cbind(exp(cbind(OR = coef(ukb_covid_no2.b), confint(ukb_covid_no2.b))), p_value = summary(ukb_covid_no2.b)$coefficients[,4]))
ukb_covid_o3.b_odds = data.frame(cbind(exp(cbind(OR = coef(ukb_covid_o3.b), confint(ukb_covid_o3.b))), p_value = summary(ukb_covid_o3.b)$coefficients[,4]))
ukb_covid_so2.b_odds = data.frame(cbind(exp(cbind(OR = coef(ukb_covid_so2.b), confint(ukb_covid_so2.b))), p_value = summary(ukb_covid_so2.b)$coefficients[,4]))
#make a df just with the pollutants
ukb_covid_onlyPoll = data.frame(rbind(ukb_covid_pm25.b_odds[nrow(ukb_covid_pm25.b_odds),],
ukb_covid_pm10.b_odds[nrow(ukb_covid_pm10.b_odds),],
ukb_covid_nox.b_odds[nrow(ukb_covid_nox.b_odds),],
ukb_covid_no2.b_odds[nrow(ukb_covid_no2.b_odds),],
ukb_covid_o3.b_odds[nrow(ukb_covid_o3.b_odds),],
ukb_covid_so2.b_odds[nrow(ukb_covid_so2.b_odds),]))
sort data
ukb_covid_onlyPoll$names = row.names(ukb_covid_onlyPoll)
ukb_covid_onlyPoll$significance = "p-value > 0.05"
ukb_covid_onlyPoll$significance[ukb_covid_onlyPoll$p_value < 0.05] <- "p-value < 0.05"
plot
ggplot(ukb_covid_onlyPoll, aes(x=reorder(names, OR), y=OR, color=significance)) +
geom_point(fill="white", shape=21, size = 2) +
geom_errorbar(aes(ymin=X2.5.., ymax=X97.5..),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
theme_bw() +
geom_hline(yintercept = 1, linetype="dotted") +
coord_flip()+ylab("Infectivity odds ratios") +
xlab("Pollutants")+
ylim(0.75, 1.6)
ggsave('fig_out/UKB_infectivityOR_bar.pdf')
stargazer(ukb_covid_onlyPoll, type ="html", single.row=TRUE, summary = FALSE, out = "fig_out/ukb_covid_onlyPoll.html")
OR | X2.5.. | X97.5.. | p\_value | names | significance | |
pm25\_val | 1.120 | 1.036 | 1.211 | 0.004 | pm25\_val | p-value \< 0.05 |
pm10\_val | 1.074 | 1.017 | 1.136 | 0.011 | pm10\_val | p-value \< 0.05 |
nox\_val | 1.023 | 1.006 | 1.041 | 0.008 | nox\_val | p-value \< 0.05 |
no2\_val | 1.047 | 1.017 | 1.079 | 0.002 | no2\_val | p-value \< 0.05 |
o3\_val | 1.014 | 0.973 | 1.057 | 0.507 | o3\_val | p-value \> 0.05 |
so2\_val | 1.168 | 0.946 | 1.444 | 0.150 | so2\_val | p-value \> 0.05 |
### UKB analysis with 5YA
#### Adding the 5YA air pollution data
ukb_dt = read.csv("data_output/2_5_2020_full_cov_CV_UKB_air_dataset.csv")
ukb_dt= ukb_dt[ , -which(names(ukb_dt) %in% c("pm25_val"))]
ukb_5yAP = read.csv("data_output/ukb_covid_dt_out.csv")[,c("eid","no2_5yAvg","nox_5yAvg","pm10_5yAvg","o3_5yAvg","pm25_5yAvg","no2_val", "nox_val", "pm10_val", "o3_val", "pm25_val")]
ukb_dt = merge(ukb_dt,ukb_5yAP, by = "eid", all.x = T)
nrow(ukb_dt)
## [1] 1464
ukb_covid.b_pm25 <-glm(data = ukb_dt, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+whr + sex + age+
pm25_val, family = 'binomial')
ukb_covid.b_no2 <-glm(data = ukb_dt, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+whr + sex + age+
no2_val, family = 'binomial')
ukb_covid.b_nox <-glm(data = ukb_dt, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+whr + sex + age+
nox_val, family = 'binomial')
ukb_covid.b_pm10 <-glm(data = ukb_dt, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+whr + sex + age+
pm10_val, family = 'binomial')
ukb_covid.b_o3 <-glm(data = ukb_dt, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+whr + sex + age+
o3_val, family = 'binomial')
ukb_covid.b_pm25_5yAvg <-glm(data = ukb_dt, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+whr + sex + age+
pm25_5yAvg, family = 'binomial')
ukb_covid.b_no2_5yAvg <-glm(data = ukb_dt, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+whr + sex + age+
no2_5yAvg, family = 'binomial')
ukb_covid.b_nox_5yAvg <-glm(data = ukb_dt, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+whr + sex + age+
nox_5yAvg, family = 'binomial')
ukb_covid.b_pm10_5yAvg <-glm(data = ukb_dt, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+whr + sex + age+
pm10_5yAvg, family = 'binomial')
ukb_covid.b_o3_5yAvg <-glm(data = ukb_dt, result ~ merged_popDens_km2 + n_cancers +townsend + smoking + whistling + diabetes+whr + sex + age+
o3_5yAvg, family = 'binomial')
stargazer::stargazer(ukb_covid.b_pm25,ukb_covid.b_pm10,ukb_covid.b_no2,ukb_covid.b_nox,ukb_covid.b_o3,
ukb_covid.b_pm25_5yAvg,ukb_covid.b_pm10_5yAvg,ukb_covid.b_no2_5yAvg,
ukb_covid.b_nox_5yAvg,ukb_covid.b_o3_5yAvg,
type="html",out = "fig_out/ukb_covidInfections_binomial_resub.html",
dep.var.labels="COVID-19 positive or not",
single.row=TRUE)
##
## <table style="text-align:center"><tr><td colspan="11" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="10"><em>Dependent variable:</em></td></tr>
## <tr><td></td><td colspan="10" style="border-bottom: 1px solid black"></td></tr>
## <tr><td style="text-align:left"></td><td colspan="10">COVID-19 positive or not</td></tr>
## <tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td><td>(3)</td><td>(4)</td><td>(5)</td><td>(6)</td><td>(7)</td><td>(8)</td><td>(9)</td><td>(10)</td></tr>
## <tr><td colspan="11" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">merged_popDens_km2</td><td>-0.00002 (0.00003)</td><td>-0.00002 (0.00003)</td><td>-0.00004 (0.00003)</td><td>-0.00003 (0.00003)</td><td>0.00003 (0.00002)</td><td>-0.00001 (0.00003)</td><td>-0.00001 (0.00003)</td><td>-0.00003 (0.00003)</td><td>-0.00003 (0.00003)</td><td>0.00003 (0.00002)</td></tr>
## <tr><td style="text-align:left">n_cancers</td><td>-0.229 (0.166)</td><td>-0.227 (0.166)</td><td>-0.214 (0.166)</td><td>-0.216 (0.166)</td><td>-0.228 (0.166)</td><td>-0.227 (0.166)</td><td>-0.226 (0.165)</td><td>-0.216 (0.166)</td><td>-0.219 (0.166)</td><td>-0.224 (0.166)</td></tr>
## <tr><td style="text-align:left">townsend</td><td>0.025 (0.018)</td><td>0.024 (0.018)</td><td>0.008 (0.018)</td><td>0.010 (0.018)</td><td>0.024 (0.019)</td><td>0.023 (0.018)</td><td>0.023 (0.018)</td><td>0.008 (0.018)</td><td>0.010 (0.018)</td><td>0.020 (0.019)</td></tr>
## <tr><td style="text-align:left">smokingNever</td><td>0.635<sup>***</sup> (0.176)</td><td>0.627<sup>***</sup> (0.176)</td><td>0.614<sup>***</sup> (0.176)</td><td>0.612<sup>***</sup> (0.176)</td><td>0.611<sup>***</sup> (0.176)</td><td>0.624<sup>***</sup> (0.176)</td><td>0.619<sup>***</sup> (0.176)</td><td>0.609<sup>***</sup> (0.176)</td><td>0.608<sup>***</sup> (0.176)</td><td>0.603<sup>***</sup> (0.175)</td></tr>
## <tr><td style="text-align:left">smokingPrefer not to answer</td><td>1.520<sup>**</sup> (0.746)</td><td>1.505<sup>**</sup> (0.746)</td><td>1.411<sup>*</sup> (0.742)</td><td>1.413<sup>*</sup> (0.742)</td><td>1.470<sup>**</sup> (0.743)</td><td>1.500<sup>**</sup> (0.746)</td><td>1.489<sup>**</sup> (0.745)</td><td>1.408<sup>*</sup> (0.743)</td><td>1.411<sup>*</sup> (0.742)</td><td>1.435<sup>*</sup> (0.742)</td></tr>
## <tr><td style="text-align:left">smokingPrevious</td><td>0.703<sup>***</sup> (0.177)</td><td>0.690<sup>***</sup> (0.176)</td><td>0.682<sup>***</sup> (0.176)</td><td>0.677<sup>***</sup> (0.176)</td><td>0.668<sup>***</sup> (0.176)</td><td>0.690<sup>***</sup> (0.176)</td><td>0.682<sup>***</sup> (0.176)</td><td>0.677<sup>***</sup> (0.176)</td><td>0.674<sup>***</sup> (0.176)</td><td>0.658<sup>***</sup> (0.176)</td></tr>
## <tr><td style="text-align:left">whistlingNo</td><td>0.475 (0.349)</td><td>0.461 (0.349)</td><td>0.462 (0.351)</td><td>0.461 (0.351)</td><td>0.427 (0.348)</td><td>0.466 (0.349)</td><td>0.458 (0.349)</td><td>0.462 (0.351)</td><td>0.462 (0.351)</td><td>0.422 (0.348)</td></tr>
## <tr><td style="text-align:left">whistlingPrefer not to answer</td><td>0.123 (1.637)</td><td>0.127 (1.636)</td><td>0.031 (1.629)</td><td>0.053 (1.629)</td><td>0.082 (1.637)</td><td>0.118 (1.633)</td><td>0.126 (1.632)</td><td>0.014 (1.629)</td><td>0.038 (1.629)</td><td>0.070 (1.635)</td></tr>
## <tr><td style="text-align:left">whistlingYes</td><td>0.458 (0.356)</td><td>0.446 (0.356)</td><td>0.450 (0.357)</td><td>0.447 (0.358)</td><td>0.400 (0.354)</td><td>0.448 (0.356)</td><td>0.442 (0.356)</td><td>0.448 (0.357)</td><td>0.445 (0.358)</td><td>0.396 (0.354)</td></tr>
## <tr><td style="text-align:left">diabetesNo</td><td>-0.275 (0.859)</td><td>-0.296 (0.858)</td><td>-0.343 (0.861)</td><td>-0.351 (0.859)</td><td>-0.317 (0.852)</td><td>-0.293 (0.858)</td><td>-0.311 (0.857)</td><td>-0.341 (0.860)</td><td>-0.350 (0.858)</td><td>-0.323 (0.852)</td></tr>
## <tr><td style="text-align:left">diabetesPrefer not to answer</td><td>-1.368 (1.912)</td><td>-1.377 (1.910)</td><td>-1.422 (1.907)</td><td>-1.407 (1.906)</td><td>-1.338 (1.909)</td><td>-1.466 (1.909)</td><td>-1.462 (1.908)</td><td>-1.435 (1.906)</td><td>-1.428 (1.906)</td><td>-1.344 (1.907)</td></tr>
## <tr><td style="text-align:left">diabetesYes</td><td>-0.324 (0.871)</td><td>-0.345 (0.871)</td><td>-0.389 (0.874)</td><td>-0.396 (0.872)</td><td>-0.379 (0.865)</td><td>-0.337 (0.870)</td><td>-0.356 (0.870)</td><td>-0.380 (0.872)</td><td>-0.389 (0.870)</td><td>-0.381 (0.865)</td></tr>
## <tr><td style="text-align:left">whr</td><td>1.487<sup>*</sup> (0.808)</td><td>1.511<sup>*</sup> (0.808)</td><td>1.555<sup>*</sup> (0.808)</td><td>1.595<sup>**</sup> (0.808)</td><td>1.580<sup>**</sup> (0.806)</td><td>1.492<sup>*</sup> (0.808)</td><td>1.508<sup>*</sup> (0.808)</td><td>1.547<sup>*</sup> (0.808)</td><td>1.582<sup>*</sup> (0.808)</td><td>1.588<sup>**</sup> (0.806)</td></tr>
## <tr><td style="text-align:left">sexMale</td><td>0.059 (0.138)</td><td>0.057 (0.138)</td><td>0.034 (0.138)</td><td>0.030 (0.138)</td><td>0.061 (0.138)</td><td>0.050 (0.138)</td><td>0.050 (0.138)</td><td>0.031 (0.138)</td><td>0.028 (0.138)</td><td>0.056 (0.138)</td></tr>
## <tr><td style="text-align:left">age</td><td>-0.011 (0.007)</td><td>-0.011<sup>*</sup> (0.007)</td><td>-0.010 (0.007)</td><td>-0.011 (0.007)</td><td>-0.010 (0.007)</td><td>-0.011<sup>*</sup> (0.007)</td><td>-0.011<sup>*</sup> (0.007)</td><td>-0.011 (0.007)</td><td>-0.011 (0.007)</td><td>-0.010 (0.007)</td></tr>
## <tr><td style="text-align:left">pm25_val</td><td>0.120<sup>***</sup> (0.040)</td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">pm10_val</td><td></td><td>0.076<sup>***</sup> (0.028)</td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">no2_val</td><td></td><td></td><td>0.049<sup>***</sup> (0.015)</td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">nox_val</td><td></td><td></td><td></td><td>0.024<sup>***</sup> (0.009)</td><td></td><td></td><td></td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">o3_val</td><td></td><td></td><td></td><td></td><td>0.012 (0.021)</td><td></td><td></td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">pm25_5yAvg</td><td></td><td></td><td></td><td></td><td></td><td>0.120<sup>***</sup> (0.044)</td><td></td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">pm10_5yAvg</td><td></td><td></td><td></td><td></td><td></td><td></td><td>0.077<sup>**</sup> (0.030)</td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">no2_5yAvg</td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td>0.044<sup>***</sup> (0.014)</td><td></td><td></td></tr>
## <tr><td style="text-align:left">nox_5yAvg</td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td>0.021<sup>***</sup> (0.008)</td><td></td></tr>
## <tr><td style="text-align:left">o3_5yAvg</td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td>-0.019 (0.079)</td></tr>
## <tr><td style="text-align:left">Constant</td><td>-2.577<sup>**</sup> (1.203)</td><td>-2.502<sup>**</sup> (1.205)</td><td>-2.197<sup>*</sup> (1.175)</td><td>-2.006<sup>*</sup> (1.167)</td><td>-1.701 (1.160)</td><td>-2.604<sup>**</sup> (1.213)</td><td>-2.532<sup>**</sup> (1.212)</td><td>-2.203<sup>*</sup> (1.175)</td><td>-2.009<sup>*</sup> (1.167)</td><td>-1.581 (1.171)</td></tr>
## <tr><td colspan="11" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>1,447</td><td>1,447</td><td>1,447</td><td>1,447</td><td>1,447</td><td>1,447</td><td>1,447</td><td>1,447</td><td>1,447</td><td>1,447</td></tr>
## <tr><td style="text-align:left">Log Likelihood</td><td>-973.678</td><td>-974.683</td><td>-972.953</td><td>-974.207</td><td>-978.070</td><td>-974.439</td><td>-975.008</td><td>-973.235</td><td>-974.345</td><td>-978.204</td></tr>
## <tr><td style="text-align:left">Akaike Inf. Crit.</td><td>1,981.357</td><td>1,983.366</td><td>1,979.906</td><td>1,982.413</td><td>1,990.139</td><td>1,982.878</td><td>1,984.015</td><td>1,980.469</td><td>1,982.690</td><td>1,990.408</td></tr>
## <tr><td colspan="11" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="10" style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
## </table>
list_df_full = do.call("rbind", list(
summary(ukb_covid.b_pm25)$coefficients[nrow(summary(ukb_covid.b_pm25)$coefficients),1:4],
summary(ukb_covid.b_pm10)$coefficients[nrow(summary(ukb_covid.b_pm10)$coefficients),1:4],
summary(ukb_covid.b_no2)$coefficients[nrow(summary(ukb_covid.b_no2)$coefficients),1:4],
summary(ukb_covid.b_nox)$coefficients[nrow(summary(ukb_covid.b_nox)$coefficients),1:4],
summary(ukb_covid.b_o3)$coefficients[nrow(summary(ukb_covid.b_o3)$coefficients),1:4],
summary(ukb_covid.b_pm25_5yAvg)$coefficients[nrow(summary(ukb_covid.b_pm25_5yAvg)$coefficients),1:4],
summary(ukb_covid.b_pm10_5yAvg)$coefficients[nrow(summary(ukb_covid.b_pm10_5yAvg)$coefficients),1:4],
summary(ukb_covid.b_no2_5yAvg)$coefficients[nrow(summary(ukb_covid.b_no2_5yAvg)$coefficients),1:4],
summary(ukb_covid.b_nox_5yAvg)$coefficients[nrow(summary(ukb_covid.b_nox_5yAvg)$coefficients),1:4],
summary(ukb_covid.b_o3_5yAvg)$coefficients[nrow(summary(ukb_covid.b_o3_5yAvg)$coefficients),1:4]))
#add labels
pollutants = c("pm25","pm10","no2","nox","o3",
"pm25_5yAvg","pm10_5yAvg","no2_5yAvg","nox_5yAvg","o3_5yAvg")
pollutants = data.frame(pollutants)
list_df_full = cbind(pollutants, list_df_full)
colnames(list_df_full) <- c("pollutants", "Estimate","Std", "z","p")
list_df_full$OR = exp(list_df_full$Estimate)
list_df_full$upper = exp(list_df_full$Estimate + list_df_full$Std)
list_df_full$lower = exp(list_df_full$Estimate - list_df_full$Std)
list_df_full$significance = "p-value<0.05"
list_df_full$significance[list_df_full$p > 0.05] <- "p-value>0.05"
write.csv(list_df_full, "data_output/UKB_covidInfection_IORs_resub.csv", row.names=F)
## odds ratios and 95% CI
library(ggplot2)
ggplot(list_df_full,
aes(x=reorder(pollutants, OR), y=OR, color = significance)) +
geom_point(shape=21, size = 2) +
geom_errorbar(aes(ymin=lower, ymax=upper),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
theme_classic() +
geom_hline(yintercept = 1, linetype="dotted") +
coord_flip()+ylab("Infectivity odds ratios, resub") +
xlab("pollutants")
ggsave('fig_out/UKB_covidInfection_IORs_resub.pdf', width = 6, height = 4)