Yizhou Yu, Marco Travaglio,Rebeka Popovic, Nuno Santos Leal, and Luis Miguel Martins
MRC Toxicology Unit, University of Cambridge
Yizhou Yu, MRC Toxicology Unit, University of
Cambridge, (yzy21 [at] mrc-tox.cam.ac.uk)
November 2020
This study was first deposited in a preprint server (medRxiv), in November 2020, under the title “Alzheimer’s and Parkinson’s diseases predict different COVID-19 outcomes, a UK Biobank study” by Yizhou Yu, Marco Travaglio, Rebeka Popovic, Nuno Santos Leal, L. Miguel Martins: The preprint version (v01 on medRxiv) can be found here
After peer-review, the revised study was published in January 2021 in the journal Geriatrics: https://www.mdpi.com/2308-3417/6/1/10
The input files for this analysis pipeline are on the master branch of this GitHub page: https://github.com/M1gus/AD_PD_COVID19
Here, I will first extract the UK Biobank data, from all >500,000 people. I will then subset the data from all people who have been tested for COVID-19.
I will analyse the data of participants who were tested and infected (without information on death). This will allow me to determine what drives infection.
In this analysis, I will try to include: UK Biobank phenotype and genotype markers (e.g. BMI, smoking status, dementia, etc.), the crowd mobility that they were exposed to, etc.
I will then look at how these markers affect lethality.
Access to the UK Biobank data is granted under the application 60124.
Note that I had to downgrade R to run it with rpy2
update.packages(ask=FALSE,
checkBuilt=TRUE,
repos=”https://cloud.r-project.org”)
import pandas as pd
import tensorflow as tf
#load R
%load_ext rpy2.ipython
/Users/yizhouyu/miniconda3/envs/yy_37_env2/lib/python3.7/site-packages/rpy2/robjects/pandas2ri.py:17: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace.
from pandas.core.index import Index as PandasIndex
def UKB_select_rows(input_dt,col_list,output_dt):
#input_dt = full UKB dataset
#col_list = metadata
#output_dt = name of output data
input_name = str(''.join(input_dt))
output_name = str(''.join(output_dt))
col_list_name = str(''.join(col_list))
#create the list of columns to select
col_df = pd.read_csv(col_list_name, sep = ',', usecols = ["FieldID","Instances",'Array'])
#this is the array of columns to select, dont forget to include eid...
col_list_selected = ['eid']
#here I make a loop to collect all instances + arrays of a field
for index, row in col_df.iterrows():
# range automatically goes from 0 to n-1
instance_list = list(range(0,row['Instances'].astype(int)))
list_single_field = []
for instance_item in instance_list:
field_instance = [row['FieldID'].astype(str) + "-" + str(instance_item)]
list_single_field = list_single_field + field_instance
array_list = list(range(0,row['Array'].astype(int)))
for fi_item in list_single_field:
for array_item in array_list:
field_instance_array = [str(fi_item) + "." + str(array_item)]
col_list_selected = col_list_selected + field_instance_array
#print(len(col_list_selected))
#col_df['UID'] = col_df["FieldID"].astype(str) + "-" + col_df["Instances"].astype(str) + "." + col_df['Array'].astype(str)
#selected_UID_list = col_df['UID'].tolist()
# read the large csv file with specified chunksize
#I tried with many chunksizes...
df_chunk = pd.read_csv(input_name, sep = ',', chunksize=10000, dtype={'eid': int},encoding= 'unicode_escape', usecols = col_list_selected)
# Each chunk is in df format -> save them
# only write header once...
write_header = True
for chunk in df_chunk:
print("10K done, still running...")
chunk.to_csv(output_name, mode='a', index=False, header=write_header)
write_header = False
UKB_select_rows(input_dt="/Users/yizhouyu/camDrive/UKB_dt_18-4-2020/ukb41646.csv",
col_list = "/Users/yizhouyu/camDrive/Miguel_shared_Yizhou/COVID_project_yizhou/AirPoll_UKB_6000/ukb_covid19/data/UKB_metadt_AIRcovid2.csv",
output_dt = "/Users/yizhouyu/camDrive/Miguel_shared_Yizhou/COVID_project_yizhou/AirPoll_UKB_6000/ukb_covid19/data_output/UKB_subset_AP_analysis.csv")
/Users/yizhouyu/py_37_env/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3254: DtypeWarning: Columns (16450,16460,16462) have mixed types.Specify dtype option on import or set low_memory=False.
if (await self.run_code(code, result, async_=asy)):
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
The data has still remains quite large
# pd.read_csv("/Users/yizhouyu/camDrive/Miguel_shared_Yizhou/COVID_project_yizhou/AirPoll_UKB_6000/ukb_covid19/data_output/UKB_subset_AP_analysis.csv").head()
%%R
raw_ukb_covid = read.csv("/Users/yizhouyu/camDrive/ARUK_project/additional_datasets/covid_refresh_16_8_2020/covid19_result_17_8_2020.txt", sep = "\t")
nrow(raw_ukb_covid)
[1] 20554
#### Get the date range of the test results
%%R
raw_ukb_covid$specdate = as.Date(raw_ukb_covid$specdate,tryFormats = "%d/%m/%Y")
max(raw_ukb_covid$specdate)
[1] "2020-07-26"
For the the positive cases, I would like to know when they are first tested positive, i.e. the earliest date only.
%%R
raw_ukb_covid.pos = subset(raw_ukb_covid, result == 1)
raw_ukb_covid.pos_cur = aggregate(raw_ukb_covid.pos, list(raw_ukb_covid.pos$eid), FUN=head, 1)
For the negative test results, I should keep the last date because that is the most up-to-date test.
%%R
raw_ukb_covid.neg = subset(raw_ukb_covid, result == 0)
raw_ukb_covid.neg_cur = aggregate(raw_ukb_covid.neg, list(raw_ukb_covid.neg$eid), FUN=tail, 1)
%%R
ukb_covid_bind = rbind(raw_ukb_covid.pos_cur, raw_ukb_covid.neg_cur)
%%R
print(paste(length((ukb_covid_bind$eid)),length(unique(ukb_covid_bind$eid)),sep = " "))
[1] "13940 13502"
There are still ~400 eids that are duplicates. I will sum them to keep the positive tests only, with the latest date.
%%R
ukb_covid_bind = ukb_covid_bind[c("eid","specdate","result")]
ukb_covid_bind$specdate = as.Date(ukb_covid_bind$specdate,tryFormats = "%d/%m/%Y")
ukb_covid_bind_test = aggregate(ukb_covid_bind, list(ukb_covid_bind$eid), FUN=max)[c("eid","specdate","result")]
nrow(ukb_covid_bind_test)
[1] 13502
### Save curated data
The curated data now is: if a participant is positive, then save this result & date, if not, save the latest (most up-to-date) negative test result.
%%R
write.csv(ukb_covid_bind_test,"data_output/curated_ukb_covid_test_results.csv", row.names=FALSE)
#### Select rows by chunk
def UKB_select_rows(input_dt,row_list,output_dt):
input_name = str(''.join(input_dt))
row_list_name = str(''.join(row_list))
output_name = str(''.join(output_dt))
#get the list of IDs
ID_list_df = pd.read_csv(row_list_name, sep = ',', usecols = ['eid'], dtype={'eid': int})
selected_ID_list = ID_list_df['eid'].tolist()
#print(selected_ID_list)
df_chunk = pd.read_csv(input_name, sep = ',', chunksize=10000,
encoding= 'unicode_escape')
#make a chunks list
chunk_list = []
# Each chunk is in df format
for chunk in df_chunk:
# perform data filtering
chunk_filter = chunk[chunk['eid'].isin(selected_ID_list)]
print("10K done, still running...")
#get the rows that are in the ID list
#print(chunk_filter)
# Once the data filtering is done, append the chunk to list
chunk_list.append(chunk_filter)
# concat the list into dataframe
#header_chunk.append(chunk_list)
df_concat = pd.concat(chunk_list)
#print(df_concat)
df_concat.to_csv(output_name)
#### Select rows by chunk
UKB_select_rows(input_dt="data_output/UKB_subset_AP_analysis.csv",
row_list="data_output/curated_ukb_covid_test_results.csv",
output_dt="data_output/full_ukb_covidTested_results.csv")
/Users/yizhouyu/miniconda3/envs/yy_37_env2/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3254: DtypeWarning: Columns (0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,276,278,280) have mixed types.Specify dtype option on import or set low_memory=False.
if (await self.run_code(code, result, async_=asy)):
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
/Users/yizhouyu/miniconda3/envs/yy_37_env2/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3254: DtypeWarning: Columns (275) have mixed types.Specify dtype option on import or set low_memory=False.
if (await self.run_code(code, result, async_=asy)):
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
10K done, still running...
%%R
# load non-covid UKB phenotype data
ukb_pheno_dt = read.csv("data_output/full_ukb_covidTested_results.csv")[-1]
#-1: delete first column of index from the previous function
#aggregate the columns by selecting the last results, for each
library(reshape)
ukb_pheno_dt_t = melt(ukb_pheno_dt, id='eid')
# head(ukb_pheno_dt_t)
#order by variable to make sure I am will be replacing the latest value
ukb_pheno_dt_t <- ukb_pheno_dt_t[order(ukb_pheno_dt_t$variable),]
#delete everything after period
ukb_pheno_dt_t$variable = gsub("\\..*","",ukb_pheno_dt_t$variable)
#aggregate by last
ukb_covid_t_na = na.omit(ukb_pheno_dt_t)
ukb_covid_t_na = aggregate(ukb_covid_t_na, by=list(ukb_covid_t_na$eid,ukb_covid_t_na$variable), FUN=tail, n = 1)
# Curate
ukb_covid_t_na = ukb_covid_t_na[c("eid", "variable", "value")]
ukb_covid_t_na$variable = as.factor(ukb_covid_t_na$variable)
# put it back straight
ukb_covid_cur = cast(ukb_covid_t_na, eid~variable)
%%R
#get levels and labels
lvl.0493 <- c(-141,-131,-121,0)
lbl.0493 <- c("Often","Sometimes","Do not know","Rarely/never")
lvl.0007 <- c(0,1)
lbl.0007 <- c("FALSE","TRUE")
lvl.100349 <- c(-3,-1,0,1)
lbl.100349 <- c("Prefer not to answer","Do not know","FALSE","TRUE")
lvl.0090 <- c(-3,0,1,2)
lbl.0090 <- c("Prefer not to answer","Never","Previous","Current")
lvl.0009 <- c(0,1)
lbl.0009 <- c("Female","Male")
lvl.0272 <- c(1)
lbl.0272 <- c("Date is unknown")
lvl.0300 <- c(0,1,2)
lbl.0300 <- c("Self-reported only","Hospital admission","Death only")
lvl.100290 <- c(-10,-3,-1)
lbl.100290 <- c("Less than a year","Prefer not to answer","Do not know")
lvl.100291 <- c(-3,-1)
lbl.100291 <- c("Prefer not to answer","Do not know")
lvl.100294 <- c(-3,-1,1,2,3,4,5)
lbl.100294 <- c("Prefer not to answer","Do not know",
"Less than 18,000","18,000 to 30,999","31,000 to 51,999",
"52,000 to 100,000","Greater than 100,000")
lvl.100298 <- c(-10,-3,-1)
lbl.100298 <- c("Less than once a week","Prefer not to answer","Do not know")
lvl.1001 <- c(-3,-1,1001,1002,1003,2001,2002,2003,2004,3001,3002,3003,3004,4001,4002,4003,1,2,3,4,5,6)
lbl.1001 <- c("Prefer not to answer","Do not know",
"British","Irish","Any other white background","White and Black Caribbean",
"White and Black African","White and Asian","Any other mixed background","Indian",
"Pakistani","Bangladeshi","Any other Asian background","Caribbean","African","Any other Black background",
"White","Mixed","Asian or Asian British","Black or Black British","Chinese","Other ethnic group")
### eid replacement, saved here ----
yy_replace <- function(string, patterns, replacements) {
for (i in seq_along(patterns))
string <- gsub(patterns[i], replacements[i], string, perl=TRUE)
string
}
#label = replacement, lvl = to replace
#### Decode UKB phenotype data
%%R
ukb_covid_decode = ukb_covid_cur
%%R
ukb_covid_decode$X22609 <- yy_replace(ukb_covid_decode$X22609, lvl.0493, lbl.0493)
ukb_covid_decode$X22610 <- yy_replace(ukb_covid_decode$X22610, lvl.0493, lbl.0493)
ukb_covid_decode$X22611 <- yy_replace(ukb_covid_decode$X22611, lvl.0493, lbl.0493)
ukb_covid_decode$X22615 <- yy_replace(ukb_covid_decode$X22615, lvl.0493, lbl.0493)
ukb_covid_decode$X22616 <- yy_replace(ukb_covid_decode$X22616, lvl.0007, lbl.0007)
ukb_covid_decode$X2316 <- yy_replace(ukb_covid_decode$X2316, lvl.100349, lbl.100349)
ukb_covid_decode$X31 <- yy_replace(ukb_covid_decode$X31, lvl.0009, lbl.0009)
ukb_covid_decode$X22130 <- yy_replace(ukb_covid_decode$X22130, lvl.0007, lbl.0007)
ukb_covid_decode$X2443 <- yy_replace(ukb_covid_decode$X2443, lvl.100349, lbl.100349)
ukb_covid_decode$X20116 <- yy_replace(ukb_covid_decode$X20116, lvl.0090, lbl.0090)
ukb_covid_decode$X42030 <- yy_replace(ukb_covid_decode$X42030, lvl.0272, lbl.0272)
ukb_covid_decode$X42031 <- yy_replace(ukb_covid_decode$X42031, lvl.0300, lbl.0300)
ukb_covid_decode$X42032 <- yy_replace(ukb_covid_decode$X42032, lvl.0272, lbl.0272)
ukb_covid_decode$X42033 <- yy_replace(ukb_covid_decode$X42033, lvl.0300, lbl.0300)
#ukb_covid_cur$X100291 <- yy_replace(ukb_covid_cur$X100291, lvl.100290, lbl.100290)
ukb_covid_decode$X709 <- yy_replace(ukb_covid_decode$X709, lvl.100291, lbl.100291)
ukb_covid_decode$X777 <- yy_replace(ukb_covid_decode$X777, lvl.100298, lbl.100298)
#household income
ukb_covid_decode$X738 <- ordered(ukb_covid_decode$X738, levels = lvl.100294,
labels = lbl.100294)
#ethnicity
ukb_covid_decode$X21000 <- ordered(ukb_covid_decode$X21000, levels = lvl.1001,
labels = lbl.1001)
ukb_covid_decode$X42021 <- yy_replace(ukb_covid_decode$X42021, lvl.0090, lbl.0090)
ukb_covid_decode$X42020 <- yy_replace(ukb_covid_decode$X42020, lvl.0272, lbl.0272)
%%R
#get the ID that I designed
UID_names = read.csv("data/UKB_metadt_AIRcovid2.csv")[c("FieldID","my_colname")]
UID_names$FieldID = paste("X",UID_names$FieldID, sep = "")
UID_names$my_colname = lapply(UID_names$my_colname,toString)
UID_names[nrow(UID_names) + 1,] = c("eid","eid")
# sort it according to the dataset
UID_names_match = colnames(ukb_covid_cur)
UID_sort_df = data.frame(UID_names_match)
colnames(UID_sort_df) <- "FieldID"
#merge to sort
UID_sort_df = merge(UID_sort_df, UID_names, by = "FieldID")
# this is now sorted according to the correct df
#replace column names
ukb_covid_pheno_df = ukb_covid_decode
colnames(ukb_covid_pheno_df) <- UID_sort_df$my_colname
head(ukb_covid_pheno_df)
eid n_cancers townsend x_coord y_coord smoking fev ethnicity
1 1016273 0 -5.87237 441000 214000 Previous <NA> British
2 1016298 0 2.91871 420000 562000 Previous <NA> British
3 1016708 0 3.43245 532000 168000 Never <NA> Caribbean
4 1017007 0 -2.30303 396000 4e+05 Previous <NA> British
5 1017365 0 -0.00099431 408000 282000 Previous <NA> British
6 1017713 0 -3.92656 329000 407000 Never 3.11074 British
copd WP_dusty WP_chemicals WP_cig WP_diesel breathing whistling
1 <NA> <NA> <NA> <NA> <NA> <NA> FALSE
2 FALSE Rarely/never Rarely/never Rarely/never Rarely/never FALSE FALSE
3 <NA> <NA> <NA> <NA> <NA> <NA> FALSE
4 <NA> <NA> <NA> <NA> <NA> <NA> FALSE
5 FALSE Rarely/never Rarely/never Rarely/never Rarely/never FALSE FALSE
6 <NA> <NA> <NA> <NA> <NA> <NA> FALSE
diabetes sex birthYear diaBP sysBP ad_date ad pd_date_1 pd_1 pd_date_2
1 FALSE Male 1950 76 114 <NA> <NA>
2 FALSE Female 1948 81 143 <NA> <NA>
3 FALSE Female 1967 61 95 <NA> <NA>
4 FALSE Male 1939 96 163 <NA> <NA>
5 FALSE Female 1947 89 138 <NA> <NA>
6 FALSE Male 1945 85 156 <NA> <NA>
pd_2 waist hip height yearAtlocation peopleInHousehold
1 <NA> 86 101 171 10 2
2 <NA> 100 110 154 6 2
3 <NA> 93 104 175 5 1
4 <NA> 103 107 175 32 2
5 <NA> 93 101 152 33 1
6 <NA> 120 117 172 -3 Prefer not to answer
AverageHouseholdIncome travelToWork
1 Prefer not to answer 4
2 18,000 to 30,999 4
3 Do not know <NA>
4 18,000 to 30,999 <NA>
5 18,000 to 30,999 <NA>
6 Prefer not to answer <NA>
### Merge the UK Biobank phenotype data with the UK Biobank COVID-19 data
%%R
ukb_covid_pheno_merged = merge(ukb_covid_pheno_df, ukb_covid_bind_test, by = "eid")
write.csv(ukb_covid_pheno_merged,"data_output/ukb_covid19_pheno_merged.csv", row.names=FALSE)
#### Curate the PD/AD diagnosis data
%%R
ukb_covid_pheno_merged = read.csv("data_output/ukb_covid19_pheno_merged.csv")
# PD diagnosis
library(dplyr)
ukb_covid_pheno_PDcur = ukb_covid_pheno_merged %>%
mutate(PD_diag = c("FALSE", "TRUE")[(!is.na(pd_1) |
!is.na(pd_2)
)+1] )
# Ad diagnosis
ukb_covid_pheno_PDADcur = ukb_covid_pheno_PDcur %>%
mutate(AD_diag = c("FALSE", "TRUE")[(!is.na(ad)
)+1] )
/Users/yizhouyu/miniconda3/envs/yy_37_env2/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning:
Attaching package: ‘dplyr’
warnings.warn(x, RRuntimeWarning)
/Users/yizhouyu/miniconda3/envs/yy_37_env2/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: The following object is masked from ‘package:reshape’:
rename
warnings.warn(x, RRuntimeWarning)
/Users/yizhouyu/miniconda3/envs/yy_37_env2/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: The following objects are masked from ‘package:stats’:
filter, lag
warnings.warn(x, RRuntimeWarning)
/Users/yizhouyu/miniconda3/envs/yy_37_env2/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
warnings.warn(x, RRuntimeWarning)
%%R
ukb_covid_pheno_PDADcur = ukb_covid_pheno_PDADcur %>%
mutate(highBP = c("FALSE", "TRUE")[(diaBP >= 90 | sysBP > 140
)+1] )
#### Calculate Age: 2020 - birth year
%%R
ukb_covid_pheno_PDADcur$age = 2020 - ukb_covid_pheno_PDADcur$birthYear
%%R
vars_drop = c("specdate","X","yearAtlocation",
"ad_date","ad","pd_date_1",
"pd_1","pd_date_2","pd_2","diaBP","sysBP")
ukb_covid_pheno_subset = ukb_covid_pheno_PDADcur[ , !(names(ukb_covid_pheno_PDADcur) %in% vars_drop)]
%%R
# Ad diagnosis
head(ukb_covid_pheno_subset)
eid n_cancers townsend x_coord y_coord smoking fev ethnicity
1 1016273 0 -5.87237000 441000 214000 Previous NA British
2 1016298 0 2.91871000 420000 562000 Previous NA British
3 1016708 0 3.43245000 532000 168000 Never NA Caribbean
4 1017007 0 -2.30303000 396000 400000 Previous NA British
5 1017365 0 -0.00099431 408000 282000 Previous NA British
6 1017713 0 -3.92656000 329000 407000 Never 3.11074 British
copd WP_dusty WP_chemicals WP_cig WP_diesel breathing whistling
1 NA <NA> <NA> <NA> <NA> NA FALSE
2 FALSE Rarely/never Rarely/never Rarely/never Rarely/never FALSE FALSE
3 NA <NA> <NA> <NA> <NA> NA FALSE
4 NA <NA> <NA> <NA> <NA> NA FALSE
5 FALSE Rarely/never Rarely/never Rarely/never Rarely/never FALSE FALSE
6 NA <NA> <NA> <NA> <NA> NA FALSE
diabetes sex birthYear waist hip height peopleInHousehold
1 FALSE Male 1950 86 101 171 2
2 FALSE Female 1948 100 110 154 2
3 FALSE Female 1967 93 104 175 1
4 FALSE Male 1939 103 107 175 2
5 FALSE Female 1947 93 101 152 1
6 FALSE Male 1945 120 117 172 Prefer not to answer
AverageHouseholdIncome travelToWork result PD_diag AD_diag highBP age
1 Prefer not to answer 4 0 FALSE FALSE FALSE 70
2 18,000 to 30,999 4 0 FALSE FALSE TRUE 72
3 Do not know <NA> 0 FALSE FALSE FALSE 53
4 18,000 to 30,999 <NA> 0 FALSE FALSE TRUE 81
5 18,000 to 30,999 <NA> 0 FALSE FALSE FALSE 73
6 Prefer not to answer <NA> 0 FALSE FALSE TRUE 75
%%R
write.csv(ukb_covid_pheno_subset,"data_output/merged_ukb_covid_test_results.csv", row.names=FALSE)
### Create Descriptive table - obsolete
# %%R
# library(arsenal)
# ukb_covid_pheno_subset = read.csv("data_output/merged_ukb_covid_test_results.csv")
# ukb_descript_stats <- tableby(result ~ ., data = ukb_covid_pheno_subset[,2:ncol(ukb_covid_pheno_subset)])
# write2pdf(ukb_descript_stats, "ukb_descript_stats.pdf",
# keep.md = TRUE,
# quiet = TRUE, # passed to rmarkdown::render
# title = "Descriptive statistics of the UK Biobank data") # passed to summary.tableby
# write2word(ukb_descript_stats, "ukb_descript_stats.doc",
# keep.md = TRUE,
# quiet = TRUE, # passed to rmarkdown::render
# title = "Descriptive statistics of the UK Biobank data") # passed to summary.tableby
The raw air pollution files were obtained from DEFRA, UK: https://uk-air.defra.gov.uk/data/pcm-data.
The raw X Y location values were converted to long-lat using an R code from Travaglio et. al., 2020, available here: https://github.com/M1gus/AirPollutionCOVID19/.
The air pollution values are from 2018.
%%R
ukb_covid_pheno_subset = read.csv("data_output/merged_ukb_covid_test_results.csv")
ukgrid <- "+init=epsg:27700"
latlong <- "+init=epsg:4326"
ukb_coords = na.omit(ukb_covid_pheno_subset[c("eid","x_coord","y_coord")])
ukb_coords_NE <- data.frame(cbind(Easting = as.numeric(as.character(ukb_coords$x_coord)),
Northing = as.numeric(as.character(ukb_coords$y_coord))))
### Create the SpatialPointsDataFrame
library(sp)
ukb_covid_SP <- SpatialPointsDataFrame(ukb_coords_NE,
data = ukb_coords,
proj4string = CRS("+init=epsg:27700"))
%%R
### Convert
library(rgdal)
ukb_covid_ll <- spTransform(ukb_covid_SP, CRS(latlong))
ukb_covid_ll_df = data.frame('ukb_lon' = coordinates(ukb_covid_ll)[, 1],
'ukb_lat' = coordinates(ukb_covid_ll)[, 2],
'eid' = ukb_covid_ll$eid)
write.csv(ukb_covid_ll_df, ("data_output/ukb_covid_lonlat_df.csv"))
# plot UKB locations
/Users/yizhouyu/miniconda3/envs/yy_37_env2/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: rgdal: version: 1.4-4, (SVN revision 833)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 2.4.2, released 2019/06/28
Path to GDAL shared files: /Users/yizhouyu/miniconda3/share/gdal
GDAL binary built with GEOS: TRUE
Loaded PROJ.4 runtime: Rel. 6.1.0, May 15th, 2019, [PJ_VERSION: 610]
Path to PROJ.4 shared files: /Users/yizhouyu/miniconda3/share/proj
Linking to sp version: 1.3-1
warnings.warn(x, RRuntimeWarning)
### Graph the location of the UK Biobank participants
from shapely.geometry import Point
import geopandas as gpd
from geopandas import GeoDataFrame
import pandas as pd
ukb_covid_ll_df = pd.read_csv("data_output/ukb_covid_lonlat_df.csv")
geometry = [Point(xy) for xy in zip(ukb_covid_ll_df['ukb_lon'], ukb_covid_ll_df['ukb_lat'])]
gdf = GeoDataFrame(ukb_covid_ll_df, geometry=geometry)
#this is a simple map that goes with geopandas
cities = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
UK_map = cities[cities["name"] == 'United Kingdom']
# gdf.plot(ax=UK_map.plot(figsize=(10, 6)), marker='o', color='red', markersize=15)
cities = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
UK_map = cities[cities["name"] == 'United Kingdom']
ukb_participants_plt = gdf.plot(ax=UK_map.plot(figsize=(10, 6)), marker='.', color='red', markersize=15)
ukb_participants_fig = ukb_participants_plt.get_figure()
ukb_participants_fig.savefig('fig_out/ukb_participants_plt.pdf')
In this python script, I use a function in sklearn to find the nearest modelled air pollutant value based longitude and latitude values. I then asign this value to each UK Biobank participant in the COVID-19 subset. Source: https://automating-gis-processes.github.io/site/develop/notebooks/L3/nearest-neighbor-faster.html
#build functions
from sklearn.neighbors import BallTree
import numpy as np
def get_nearest(src_points, candidates, k_neighbors=1):
"""Find nearest neighbors for all source points from a set of candidate points"""
# Create tree from the candidate points
tree = BallTree(candidates, leaf_size=15, metric='haversine')
# Find closest points and distances
distances, indices = tree.query(src_points, k=k_neighbors)
# Transpose to get distances and indices into arrays
distances = distances.transpose()
indices = indices.transpose()
# Get closest indices and distances (i.e. array at index 0)
# note: for the second closest points, you would take index 1, etc.
closest = indices[0]
closest_dist = distances[0]
# Return indices and distances
return (closest, closest_dist)
def nearest_neighbor(left_gdf, right_gdf, return_dist=False):
"""
For each point in left_gdf, find closest point in right GeoDataFrame and return them.
NOTICE: Assumes that the input Points are in WGS84 projection (lat/lon).
"""
left_geom_col = left_gdf.geometry.name
right_geom_col = right_gdf.geometry.name
# Ensure that index in right gdf is formed of sequential numbers
right = right_gdf.copy().reset_index(drop=True)
# Parse coordinates from points and insert them into a numpy array as RADIANS
left_radians = np.array(left_gdf[left_geom_col].apply(lambda geom: (geom.x * np.pi / 180, geom.y * np.pi / 180)).to_list())
right_radians = np.array(right[right_geom_col].apply(lambda geom: (geom.x * np.pi / 180, geom.y * np.pi / 180)).to_list())
# Find the nearest points
# -----------------------
# closest ==> index in right_gdf that corresponds to the closest point
# dist ==> distance between the nearest neighbors (in meters)
closest, dist = get_nearest(src_points=left_radians, candidates=right_radians)
# Return points from right GeoDataFrame that are closest to points in left GeoDataFrame
closest_points = right.loc[closest]
# Ensure that the index corresponds the one in left_gdf
closest_points = closest_points.reset_index(drop=True)
# Add distance if requested
if return_dist:
# Convert to meters from radians
earth_radius = 6371000 # meters
closest_points['distance'] = dist * earth_radius
return closest_points
# load AP data
import geopandas as gpd
no2_df = pd.read_csv('data/processed_no2_lonlat.csv', usecols = ['no2_lon','no2_lat','no2_val'])
o3_df = pd.read_csv('data/processed_o3_lonlat.csv', usecols = ['o3_lon','o3_lat','o3_val'])
pm10_df = pd.read_csv('data/processed_pm10_lonlat.csv', usecols = ['pm10_lon','pm10_lat','pm10_val'])
so2_df = pd.read_csv('data/processed_so2_lonlat.csv', usecols = ['so2_lon','so2_lat','so2_val'])
nox_df = pd.read_csv('data/processed_nox_lonlat.csv', usecols = ['nox_lon','nox_lat','nox_val'])
pm25_df = pd.read_csv("data/processed_pm25_lonlat.csv", usecols=['pm25_lon','pm25_lat','pm25_val'])
no2_gdf = gpd.GeoDataFrame(
no2_df, geometry=gpd.points_from_xy(no2_df.no2_lon, no2_df.no2_lat))
nox_gdf = gpd.GeoDataFrame(
nox_df, geometry=gpd.points_from_xy(nox_df.nox_lon, nox_df.nox_lat))
so2_gdf = gpd.GeoDataFrame(
so2_df, geometry=gpd.points_from_xy(so2_df.so2_lon, so2_df.so2_lat))
pm25_gdf = gpd.GeoDataFrame(
pm25_df, geometry=gpd.points_from_xy(pm25_df.pm25_lon, pm25_df.pm25_lat))
pm10_gdf = gpd.GeoDataFrame(
pm10_df, geometry=gpd.points_from_xy(pm10_df.pm10_lon, pm10_df.pm10_lat))
o3_gdf = gpd.GeoDataFrame(
o3_df, geometry=gpd.points_from_xy(o3_df.o3_lon, o3_df.o3_lat))
ukb_covid_df = pd.read_csv("data_output/ukb_covid_lonlat_df.csv", usecols=['ukb_lon','ukb_lat','eid'])
ukb_covid_gdf = gpd.GeoDataFrame(
ukb_covid_df, geometry=gpd.points_from_xy(ukb_covid_df.ukb_lon, ukb_covid_df.ukb_lat))
#match no2
no2_matched_df = nearest_neighbor(ukb_covid_gdf, no2_gdf, return_dist=True).rename(columns={'geometry': 'no2_geom'})
#join
no2_ukb_df = ukb_covid_gdf.join(no2_matched_df)[["eid", "no2_val"]]
#match nox
nox_matched_df = nearest_neighbor(ukb_covid_gdf, nox_gdf, return_dist=True).rename(columns={'geometry': 'nox_geom'})
nox_ukb_df = ukb_covid_gdf.join(nox_matched_df)[["eid", "nox_val"]]
#match so2
so2_matched_df = nearest_neighbor(ukb_covid_gdf, so2_gdf, return_dist=True).rename(columns={'geometry': 'so2_geom'})
so2_ukb_df = ukb_covid_gdf.join(so2_matched_df)[["eid", "so2_val"]]
#match pm10
pm10_matched_df = nearest_neighbor(ukb_covid_gdf, pm10_gdf, return_dist=True).rename(columns={'geometry': 'pm10_geom'})
pm10_ukb_df = ukb_covid_gdf.join(pm10_matched_df)[["eid", "pm10_val"]]
#match o3
o3_matched_df = nearest_neighbor(ukb_covid_gdf, o3_gdf, return_dist=True).rename(columns={'geometry': 'o3_geom'})
o3_ukb_df = ukb_covid_gdf.join(o3_matched_df)[["eid", "o3_val"]]
pm25_matched_df = nearest_neighbor(ukb_covid_gdf, pm25_gdf, return_dist=True).rename(columns={'geometry': 'pm25_geom'})
pm25_ukb_df = ukb_covid_gdf.join(pm25_matched_df)[["eid", "pm25_val"]]
#merge everything
from functools import reduce
conc_dfs = [no2_ukb_df,nox_ukb_df,so2_ukb_df,pm10_ukb_df,o3_ukb_df,pm25_ukb_df,]
ukb_covid_AP_vals_merged = reduce(lambda left,right: pd.merge(left,right,on='eid'), conc_dfs)
# ukb_covid_AP_vals_merged.head()
ukb_covid_AP_vals_merged.to_csv("data_output/ukb_covid_AP_vals_merged.csv", index = False)
For the population density data, I downloaded the most precise and up-to-date models. This data has longitude and latitude values for every 30m2 as well as the modelled population value. In other words, these are estimations of “the number of people living within 30-meter grid tiles in nearly every country around the world.” - as stated on their website.
Citation:
Facebook Connectivity Lab and Center for International Earth Science Information Network - CIESIN - Columbia University. 2016. High Resolution Settlement Layer (HRSL). Source imagery for HRSL © 2016 DigitalGlobe. Accessed 25 JUNE 2020.
Note: This dataset is NOT included in this repository because it is too large. The raw data can be downloaded here: https://data.humdata.org/dataset/united-kingdom-high-resolution-population-density-maps-demographic-estimates
#load - this data is >1GB
pop_dens = pd.read_csv('data/population_gbr_2019-07-01.csv')
pop_dens_gdf = gpd.GeoDataFrame(
pop_dens, geometry=gpd.points_from_xy(pop_dens.Lon, pop_dens.Lat))
pop_dens_matched_df = nearest_neighbor(ukb_covid_gdf, pop_dens_gdf, return_dist=True).rename(columns={'geometry': 'pop_dens_geom'})
#join
pop_dens_joined_df = ukb_covid_gdf.join(pop_dens_matched_df)[["eid", "Population"]]
ukb_covid_AP_popDens_merged = pd.merge(ukb_covid_AP_vals_merged, pop_dens_joined_df, on='eid')
# ukb_covid_AP_popDens_merged.head()
ukb_covid_AP_popDens_merged.to_csv("data_output/ukbCov_ukb_covid_AP_popDens_merged.csv", index = False)
#### Investigation on movement data
Here, I also include movement data from Facebook: https://research.fb.com/blog/2020/06/protecting-privacy-in-facebook-mobility-data-during-the-covid-19-response/
Columns, from (https://s3.eu-central-1.amazonaws.com/hdx-ckan-filestore-prod/resources/435ed157-6f7a-4e8f-a63a-2aa177b9bd05/readme.txt?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Expires=180&X-Amz-Credential=AKIARZNKTAO7XQY7ED6N%2F20200625%2Feu-central-1%2Fs3%2Faws4_request&X-Amz-SignedHeaders=host&X-Amz-Date=20200625T232711Z&X-Amz-Signature=3d37cad1da6ab0ffbd9d0fa103ccf3afa8998e3004b74b4634b8f0e24a069a0b)
fb_mvnt = pd.read_csv('data/movement-range-2020-06-23.txt', sep = "\t")
fb_mvnt_gb = fb_mvnt[fb_mvnt['country'] == "GBR"]
/Users/yizhouyu/miniconda3/envs/yy_37_env2/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3063: DtypeWarning: Columns (3) have mixed types.Specify dtype option on import or set low_memory=False.
interactivity=interactivity, compiler=compiler, result=result)
fb_mvnt_gb['polygon_name'].describe()
count 456
unique 4
top Northern Ireland
freq 114
Name: polygon_name, dtype: object
This shows that the data is not precise enough. There is only data at the regional level. I am therefore satisfied to only use the well-curated population density data.
ukb_covidAP_merged = pd.merge(pd.read_csv("data_output/merged_ukb_covid_test_results.csv"),
pd.read_csv("data_output/ukbCov_ukb_covid_AP_popDens_merged.csv"), on='eid')
ukb_covidAP_merged.columns
Index(['eid', 'n_cancers', 'townsend', 'x_coord', 'y_coord', 'smoking', 'fev',
'ethnicity', 'copd', 'WP_dusty', 'WP_chemicals', 'WP_cig', 'WP_diesel',
'breathing', 'whistling', 'diabetes', 'sex', 'birthYear', 'waist',
'hip', 'height', 'peopleInHousehold', 'AverageHouseholdIncome',
'travelToWork', 'result', 'PD_diag', 'AD_diag', 'highBP', 'age',
'no2_val', 'nox_val', 'so2_val', 'pm10_val', 'o3_val', 'pm25_val',
'Population'],
dtype='object')
ukb_covidAP_merged['whr'] = ukb_covidAP_merged['waist']/ukb_covidAP_merged['hip']
ukb_covidAP_merged.to_csv("data_output/ukb_covidAP_merged_full_curatedData.csv")
### Drop unused columns
ukb_covidAP = ukb_covidAP_merged.drop([
'x_coord','y_coord',
'fev','copd',
'WP_dusty', 'WP_chemicals',
'WP_cig', 'WP_diesel', 'breathing',
'birthYear',
'height','travelToWork','waist', 'hip'], axis=1)
ukb_covidAP.to_csv("data_output/ukb_covidAP_finalCuratedData.csv", index = False)
I extracted these variables and stored them in a separate folder so that they are not uploaded on github. Access to those data should be done via UK Biobank.
%%R
care_home_edu_dt = read.csv("/Users/yizhouyu/camDrive/ARUK_project/additional_datasets/data/care_home_edu_dt.csv")
#aggregate the columns by selecting the last results, for each
library(reshape)
care_home_edu_dt_t = melt(care_home_edu_dt, id='eid')
#order by variable to make sure I am will be replacing the latest value
care_home_edu_dt_t <- care_home_edu_dt_t[order(care_home_edu_dt_t$variable),]
#delete everything after period
care_home_edu_dt_t$variable = gsub("\\..*","",care_home_edu_dt_t$variable)
#aggregate by last
care_home_edu_dt_t_na = na.omit(care_home_edu_dt_t)
care_home_edu_dt_t_na = aggregate(care_home_edu_dt_t_na, by=list(care_home_edu_dt_t_na$eid,care_home_edu_dt_t_na$variable), FUN=tail, n = 1)
# Curate
care_home_edu_dt_t_na = care_home_edu_dt_t_na[c("eid", "variable", "value")]
care_home_edu_dt_t_na$variable = as.factor(care_home_edu_dt_t_na$variable)
# put it back straight
care_home_edu_cur = cast(care_home_edu_dt_t_na, eid~variable)
head(care_home_edu_cur)
eid X6138 X670
1 1000013 2 1
2 1000024 -7 1
3 1000036 3 1
4 1000048 6 1
5 1000055 2 1
6 1000067 6 1
Prefer not to answer is considered as NA
%%R
# I keep the ordered function because all of the variables need to be decoded
care_home_edu_dec = care_home_edu_cur
#housing
lvl.100286 <- c(-7,-3,1,2,3,4,5)
lbl.100286 <- c("None of the above","NA","A house or bungalow","A flat, maisonette or apartment","Mobile or temporary structure (i.e. caravan)","Sheltered accommodation","Care home")
care_home_edu_dec$X670 <- ordered(care_home_edu_dec$X670, levels=lvl.100286, labels=lbl.100286)
#education
lvl.100305 <- c(-7,-3,1,2,3,4,5,6)
lbl.100305 <- c("None of the above","NA","College or University degree","A levels/AS levels or equivalent","O levels/GCSEs or equivalent","CSEs or equivalent","NVQ or HND or HNC or equivalent","Other professional qualifications eg: nursing, teaching")
care_home_edu_dec$X6138 <- ordered(care_home_edu_dec$X6138, levels=lvl.100305, labels=lbl.100305)
colnames(care_home_edu_dec) <- c("eid","edu_level","house_type")
write.csv(care_home_edu_dec,"data_output/care_home_edu_decoded.csv", row.names = FALSE)
%%R
#Merge to data
temp_merged = merge(read.csv("data_output/ukb_covidAP_finalCuratedData.csv"), read.csv("data_output/care_home_edu_decoded.csv"), by = "eid")
write.csv(temp_merged,"data_output/ukb_covidAP_finalCuratedData_house.csv")
### Replace “No Answer” as NA
%%R
ukb_covidAP = read.csv("data_output/ukb_covidAP_finalCuratedData_house.csv")
ukb_covidAP[] <- lapply(ukb_covidAP, function(x) (gsub("Prefer not to answer", "NA", x)))
ukb_covidAP[] <- lapply(ukb_covidAP, function(x) (gsub("Do not know", "NA", x)))
ukb_covidAP$ethnicity_sim <- ifelse(ukb_covidAP$ethnicity == "White" |
ukb_covidAP$ethnicity == "British", "white", "minority")
write.csv(ukb_covidAP,"data_output/ukb_covidAP_finalCuratedData_house.csv", row.names=FALSE)
### Add addtional variables: blood type, C-reactive proteins and image derived variable
%%R
dem_dt = (read.csv("/Users/yizhouyu/camDrive/ARUK_project/additional_datasets/data/vasc_fronto_all_cause_AD_PD_dt.csv"))
idp_dt = read.csv("/Users/yizhouyu/camDrive/ARUK_project/additional_datasets/data/IDP_subset_dt.csv")
add_dt_merged = merge(dem_dt, idp_dt, by = "eid")
print(nrow(add_dt_merged))
#get the most up-to-date data from all of these variables
library(reshape)
add_dt_merged_t = melt(add_dt_merged, id='eid')
#order by variable to make sure I am will be replacing the latest value
add_dt_merged_t <- add_dt_merged_t[order(add_dt_merged_t$variable),]
#delete everything after period
add_dt_merged_t$variable = gsub("\\..*","",add_dt_merged_t$variable)
#aggregate by last
add_dt_merged_t = na.omit(add_dt_merged_t)
add_dt_merged_t = aggregate(add_dt_merged_t, by=list(add_dt_merged_t$eid,add_dt_merged_t$variable), FUN=tail, n = 1)
# Curate
add_dt_merged_t = add_dt_merged_t[c("eid", "variable", "value")]
add_dt_merged_t$variable = as.factor(add_dt_merged_t$variable)
# put it back straight
add_dt_cur = cast(add_dt_merged_t, eid~variable)
nrow(add_dt_cur)
[1] 502505
[1] 502505
#### Decode
%%R
# I keep the ordered function because all of the variables need to be decoded
add_dt_dec = add_dt_cur
#housing
lvl.0272 <- c(1)
lbl.0272 <- c("Date is unknown")
lvl.0300 <- c(0,1,2)
lbl.0300 <- c("Self-reported only","Hospital admission","Death only")
add_dt_dec$X42018 <- yy_replace(add_dt_dec$X42018, lvl.0272, lbl.0272)
add_dt_dec$X42019 <- yy_replace(add_dt_dec$X42019, lvl.0300, lbl.0300)
add_dt_dec$X42020 <- yy_replace(add_dt_dec$X42020, lvl.0272, lbl.0272)
add_dt_dec$X42021 <- yy_replace(add_dt_dec$X42021, lvl.0300, lbl.0300)
add_dt_dec$X42022 <- yy_replace(add_dt_dec$X42022, lvl.0272, lbl.0272)
add_dt_dec$X42023 <- yy_replace(add_dt_dec$X42023, lvl.0300, lbl.0300)
add_dt_dec$X42024 <- yy_replace(add_dt_dec$X42024, lvl.0272, lbl.0272)
add_dt_dec$X42025 <- yy_replace(add_dt_dec$X42025, lvl.0300, lbl.0300)
Add custom colun names for the new variables
%%R
dem_metadt = (read.csv("/Users/yizhouyu/camDrive/ARUK_project/additional_datasets/vasc_fronto_all_cause_AD_PD.csv"))[c("my_colname","FieldID")]
idp_metadt = read.csv("/Users/yizhouyu/camDrive/ARUK_project/additional_datasets/ ")[c("my_colname","FieldID")]
# dem_metadt$FieldID = toString(dem_metadt$FieldID)
idp_metadt$FieldID = as.integer(gsub("-.*","",idp_metadt$FieldID))
metadt = rbind(idp_metadt,dem_metadt)
metadt$FieldID = paste("X",metadt$FieldID, sep = "")
metadt$my_colname = gsub("_1","",metadt$my_colname)
metadt$my_colname = gsub("_2","",metadt$my_colname)
metadt = metadt[order(metadt$FieldID, decreasing = FALSE), ]
# metadt
The columns are in order as shown here, so I can just order the metadt and replace the column names
%%R
colnames(add_dt_dec)
# length(metadt$my_colname)
colnames(add_dt_dec) <- c("eid",unique(metadt$my_colname))
#### Curate dementia-related variables
%%R
library(dplyr)
# dementia diagnosis
add_dt_cur = add_dt_dec %>%
mutate(allDem_diag = c("FALSE", "TRUE")[(!is.na(all_cause_dem_source)
)+1] )
# Ad diagnosis
add_dt_cur = add_dt_cur %>%
mutate(AD_diag = c("FALSE", "TRUE")[(!is.na(AD_source)
)+1] )
# fronto_dem_source diagnosis
add_dt_cur = add_dt_cur %>%
mutate(frontoDem_diag = c("FALSE", "TRUE")[(!is.na(fronto_dem_source)
)+1] )
# vasc_dem_source diagnosis
add_dt_cur = add_dt_cur %>%
mutate(vascDem_diag = c("FALSE", "TRUE")[(!is.na(vasc_dem_source)
)+1] )
# head(add_dt_cur)
/Users/yizhouyu/miniconda3/envs/yy_37_env2/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning:
Attaching package: ‘dplyr’
warnings.warn(x, RRuntimeWarning)
/Users/yizhouyu/miniconda3/envs/yy_37_env2/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: The following object is masked from ‘package:reshape’:
rename
warnings.warn(x, RRuntimeWarning)
/Users/yizhouyu/miniconda3/envs/yy_37_env2/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: The following objects are masked from ‘package:stats’:
filter, lag
warnings.warn(x, RRuntimeWarning)
/Users/yizhouyu/miniconda3/envs/yy_37_env2/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
warnings.warn(x, RRuntimeWarning)
%%R
add_dt_cur$crp_norm = as.numeric(paste(add_dt_cur$crp))/as.numeric(paste(add_dt_cur$tot_p))
add_dt_cur$wmh_norm = as.numeric(paste(add_dt_cur$wmh))/as.numeric(paste(add_dt_cur$brainVol))
%%R
write.csv(add_dt_cur,"/Users/yizhouyu/camDrive/ARUK_project/additional_datasets/data/curated_idp_dementia_CRP.csv", row.names = FALSE)
%%R
new_dt = read.csv("/Users/yizhouyu/camDrive/ARUK_project/additional_datasets/data/curated_idp_dementia_CRP.csv")
new_dt = subset(new_dt, select = -c(AD_diag))
merged_dt = merge(read.csv("data_output/ukb_covidAP_finalCuratedData_house.csv"),
new_dt,
by = "eid", all.x = TRUE)
final_merged_dt = merge(merged_dt,read.csv("data/covid19_misc_2_8_2020.txt", sep = "\t"),
by = "eid", all.x = TRUE)
write.csv(final_merged_dt, "data_output/ukb_covid_dt_out.csv", row.names=FALSE)
%%R
library(arsenal)
ukb_covidAP = read.csv("data_output/ukb_covid_dt_out.csv")
ukb_descript_stats <- tableby(result ~ ., data = ukb_covidAP[,2:ncol(ukb_covidAP)])
write2pdf(ukb_descript_stats, "ukb_descriptStats.pdf",
keep.md = TRUE,
quiet = TRUE, # passed to rmarkdown::render
title = "Descriptive statistics of the UK Biobank data") # passed to summary.tableby
write2word(ukb_descript_stats, "ukb_descriptStats.doc",
keep.md = TRUE,
quiet = TRUE, # passed to rmarkdown::render
title = "Descriptive statistics of the UK Biobank data") # passed to summary.tableby
%%R
library(tidyr)
library(purrr)
library(ggplot2)
gen_dt = read.csv("data_output/ukb_covid_dt_out.csv")
cur_dt = subset(gen_dt, select = c(result, Population, n_cancers,townsend, smoking, whistling, diabetes,
whr , sex , age ,peopleInHousehold,AverageHouseholdIncome,ethnicity_sim,highBP,
PD_diag, AD_diag , vascDem_diag,frontoDem_diag,
greyVol , whiteVol , wmh_norm,
edu_score, crp_norm,
edu_level , house_type,blood_group,
pm25_val,no2_val,nox_val,so2_val,pm10_val,o3_val))
/Users/yizhouyu/miniconda3/envs/yy_37_env2/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning:
Attaching package: ‘tidyr’
warnings.warn(x, RRuntimeWarning)
/Users/yizhouyu/miniconda3/envs/yy_37_env2/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: The following objects are masked from ‘package:reshape’:
expand, smiths
warnings.warn(x, RRuntimeWarning)
/Users/yizhouyu/miniconda3/envs/yy_37_env2/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Use suppressPackageStartupMessages() to eliminate package startup
messages
warnings.warn(x, RRuntimeWarning)
%%R
cur_dt %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram()+
theme_classic()
/Users/yizhouyu/miniconda3/envs/yy_37_env2/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
warnings.warn(x, RRuntimeWarning)
%%R
cur_dt$log_wmh = -log(cur_dt$wmh_norm)
cur_dt$log_crp = -log(cur_dt$crp_norm)
cur_dt$log_edu = log(cur_dt$edu_score)
cur_dt %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram()+
theme_classic()
ggsave('fig_out/UKB_all_vars_numeric_distribution.pdf', width = 10, height = 10)
ggsave('fig_out/UKB_all_vars_numeric_distribution.png', width = 10, height = 10)
wmh, CRP and education scores need to be logged.
%%R
gen_dt$log_wmh = log(gen_dt$wmh_norm)
gen_dt$log_crp = log(gen_dt$crp_norm)
gen_dt$log_edu = log(gen_dt$edu_score)
write.csv(gen_dt,"data_output/ukb_covid_dt_out.csv",row.names=FALSE)
%%R
cur_dt %>%
keep(is.factor) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram(stat="count") +
theme_classic()+
theme(axis.text.x = element_text(angle = 90))
ggsave('fig_out/UKB_all_vars_non-numeric_distribution.pdf', width = 10, height = 10)
ggsave('fig_out/UKB_all_vars_non-numeric_distribution.png', width = 10, height = 10)
%%R
gen_dt = read.csv("data_output/ukb_covid_dt_out.csv")
colnames(gen_dt)
cur_dt = subset(gen_dt, select = c(result, Population, n_cancers,townsend, smoking, whistling, diabetes,
whr , sex , age ,peopleInHousehold,AverageHouseholdIncome,ethnicity_sim,highBP,
PD_diag, AD_diag , vascDem_diag,frontoDem_diag,
# greyVol , whiteVol , wmh_norm,
edu_score, log_crp,blood_group,log_edu,
edu_level , house_type))
%%R
cur_dt_na = na.omit(cur_dt)
summary(ukb_covid.b <-glm(data = cur_dt_na, result ~ Population + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age +peopleInHousehold+AverageHouseholdIncome+ethnicity_sim+highBP+
PD_diag+ AD_diag + vascDem_diag+frontoDem_diag+
# greyVol + whiteVol + wmh_norm+
log_edu+ log_crp+blood_group+
edu_level + house_type
, family = 'binomial'))
Call:
glm(formula = result ~ Population + n_cancers + townsend + smoking +
whistling + diabetes + whr + sex + age + peopleInHousehold +
AverageHouseholdIncome + ethnicity_sim + highBP + PD_diag +
AD_diag + vascDem_diag + frontoDem_diag + log_edu + log_crp +
blood_group + edu_level + house_type, family = "binomial",
data = cur_dt_na)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.1790 -0.5366 -0.4556 -0.3817 2.5883
Coefficients:
Estimate
(Intercept) -6.381e-01
Population -3.813e-03
n_cancers -2.025e-01
townsend 2.865e-02
smokingNever 1.295e-01
smokingPrevious 1.908e-01
whistlingTRUE 1.459e-02
diabetesTRUE 1.997e-02
whr 5.835e-01
sexMale 1.503e-01
age -3.405e-02
peopleInHousehold 5.918e-02
AverageHouseholdIncome18,000 to 30,999 -1.241e-01
AverageHouseholdIncome31,000 to 51,999 -6.947e-02
AverageHouseholdIncome52,000 to 100,000 -2.111e-01
AverageHouseholdIncomeGreater than 100,000 -4.973e-01
AverageHouseholdIncomeLess than 18,000 -5.053e-02
ethnicity_simwhite -3.192e-01
highBPFalse 5.085e-02
highBPTrue 2.302e-02
PD_diagTrue 6.663e-01
AD_diagTrue 1.521e+00
vascDem_diagTRUE 1.498e+00
frontoDem_diagTRUE 1.705e+01
log_edu 9.257e-02
log_crp 3.459e-03
blood_groupAB -8.089e-02
blood_groupAO -2.660e-02
blood_groupBB -1.843e-01
blood_groupBO 5.725e-02
blood_groupOO -5.122e-02
edu_levelCollege or University degree 2.570e-01
edu_levelCSEs or equivalent 2.349e-01
edu_levelNone of the above 4.750e-01
edu_levelNVQ or HND or HNC or equivalent 3.046e-01
edu_levelO levels/GCSEs or equivalent 6.255e-02
edu_levelOther professional qualifications eg: nursing, teaching 3.150e-01
house_typeA house or bungalow 3.317e-02
house_typeCare home -1.310e+01
house_typeMobile or temporary structure (i.e. caravan) -1.239e+01
house_typeNone of the above -5.282e-01
house_typeSheltered accommodation -1.251e+01
Std. Error
(Intercept) 1.017e+03
Population 2.343e-03
n_cancers 1.087e-01
townsend 1.320e-02
smokingNever 1.127e-01
smokingPrevious 1.142e-01
whistlingTRUE 7.958e-02
diabetesTRUE 1.184e-01
whr 5.243e-01
sexMale 9.152e-02
age 4.813e-03
peopleInHousehold 2.167e-02
AverageHouseholdIncome18,000 to 30,999 1.017e+03
AverageHouseholdIncome31,000 to 51,999 1.017e+03
AverageHouseholdIncome52,000 to 100,000 1.017e+03
AverageHouseholdIncomeGreater than 100,000 1.017e+03
AverageHouseholdIncomeLess than 18,000 1.017e+03
ethnicity_simwhite 9.838e-02
highBPFalse 2.406e-01
highBPTrue 2.420e-01
PD_diagTrue 2.702e-01
AD_diagTrue 3.271e-01
vascDem_diagTRUE 4.343e-01
frontoDem_diagTRUE 8.827e+02
log_edu 3.517e-02
log_crp 3.388e-02
blood_groupAB 2.205e-01
blood_groupAO 1.313e-01
blood_groupBB 4.587e-01
blood_groupBO 1.587e-01
blood_groupOO 1.298e-01
edu_levelCollege or University degree 2.436e-01
edu_levelCSEs or equivalent 2.585e-01
edu_levelNone of the above 2.362e-01
edu_levelNVQ or HND or HNC or equivalent 2.371e-01
edu_levelO levels/GCSEs or equivalent 2.388e-01
edu_levelOther professional qualifications eg: nursing, teaching 2.296e-01
house_typeA house or bungalow 1.181e-01
house_typeCare home 8.827e+02
house_typeMobile or temporary structure (i.e. caravan) 2.428e+02
house_typeNone of the above 7.531e-01
house_typeSheltered accommodation 5.059e+02
z value
(Intercept) -0.001
Population -1.627
n_cancers -1.864
townsend 2.170
smokingNever 1.149
smokingPrevious 1.671
whistlingTRUE 0.183
diabetesTRUE 0.169
whr 1.113
sexMale 1.643
age -7.075
peopleInHousehold 2.732
AverageHouseholdIncome18,000 to 30,999 0.000
AverageHouseholdIncome31,000 to 51,999 0.000
AverageHouseholdIncome52,000 to 100,000 0.000
AverageHouseholdIncomeGreater than 100,000 0.000
AverageHouseholdIncomeLess than 18,000 0.000
ethnicity_simwhite -3.244
highBPFalse 0.211
highBPTrue 0.095
PD_diagTrue 2.466
AD_diagTrue 4.651
vascDem_diagTRUE 3.450
frontoDem_diagTRUE 0.019
log_edu 2.632
log_crp 0.102
blood_groupAB -0.367
blood_groupAO -0.203
blood_groupBB -0.402
blood_groupBO 0.361
blood_groupOO -0.394
edu_levelCollege or University degree 1.055
edu_levelCSEs or equivalent 0.909
edu_levelNone of the above 2.011
edu_levelNVQ or HND or HNC or equivalent 1.285
edu_levelO levels/GCSEs or equivalent 0.262
edu_levelOther professional qualifications eg: nursing, teaching 1.372
house_typeA house or bungalow 0.281
house_typeCare home -0.015
house_typeMobile or temporary structure (i.e. caravan) -0.051
house_typeNone of the above -0.701
house_typeSheltered accommodation -0.025
Pr(>|z|)
(Intercept) 0.99950
Population 0.10365
n_cancers 0.06238 .
townsend 0.03001 *
smokingNever 0.25067
smokingPrevious 0.09471 .
whistlingTRUE 0.85453
diabetesTRUE 0.86600
whr 0.26577
sexMale 0.10044
age 1.50e-12 ***
peopleInHousehold 0.00630 **
AverageHouseholdIncome18,000 to 30,999 0.99990
AverageHouseholdIncome31,000 to 51,999 0.99995
AverageHouseholdIncome52,000 to 100,000 0.99983
AverageHouseholdIncomeGreater than 100,000 0.99961
AverageHouseholdIncomeLess than 18,000 0.99996
ethnicity_simwhite 0.00118 **
highBPFalse 0.83261
highBPTrue 0.92421
PD_diagTrue 0.01368 *
AD_diagTrue 3.31e-06 ***
vascDem_diagTRUE 0.00056 ***
frontoDem_diagTRUE 0.98459
log_edu 0.00848 **
log_crp 0.91868
blood_groupAB 0.71374
blood_groupAO 0.83948
blood_groupBB 0.68789
blood_groupBO 0.71836
blood_groupOO 0.69323
edu_levelCollege or University degree 0.29151
edu_levelCSEs or equivalent 0.36356
edu_levelNone of the above 0.04436 *
edu_levelNVQ or HND or HNC or equivalent 0.19883
edu_levelO levels/GCSEs or equivalent 0.79340
edu_levelOther professional qualifications eg: nursing, teaching 0.17014
house_typeA house or bungalow 0.77887
house_typeCare home 0.98816
house_typeMobile or temporary structure (i.e. caravan) 0.95931
house_typeNone of the above 0.48307
house_typeSheltered accommodation 0.98028
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6421.9 on 8799 degrees of freedom
Residual deviance: 6205.7 on 8758 degrees of freedom
AIC: 6289.7
Number of Fisher Scoring iterations: 13
This model shows that health-related factors, like waist-hig ratios (whr), blood pressure (BP), previous history of cancers, sex and diabetes are not predictors of infections. This makes sense since, technically, people with existing cardiovascular and health issues are equally at risk of getting the disease.
The Townsend index, PM2.5, number of people in a house hold, Alzheimer’s disease disgnosis and age are predictors of infections. Possible explanations could be:
Here, I will create a model without the house_type variable. The reason for including this variable was to accound for whether people in care homes are more likely to get infected. However, only a few individuals in this cohort report being in a care home. This variable is therefore not relevant anymore.
%%R
stargazer::stargazer(ukb_covid.b, type="html",out = "fig_out/ukb_covid_full_model_binary.html",
dep.var.labels="COVID positive or not",
single.row=TRUE)
<table style="text-align:center"><tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr>
<tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr>
<tr><td style="text-align:left"></td><td>COVID positive or not</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Population</td><td>-0.004 (0.002)</td></tr>
<tr><td style="text-align:left">n_cancers</td><td>-0.202<sup>*</sup> (0.109)</td></tr>
<tr><td style="text-align:left">townsend</td><td>0.029<sup>**</sup> (0.013)</td></tr>
<tr><td style="text-align:left">smokingNever</td><td>0.129 (0.113)</td></tr>
<tr><td style="text-align:left">smokingPrevious</td><td>0.191<sup>*</sup> (0.114)</td></tr>
<tr><td style="text-align:left">whistling</td><td>0.015 (0.080)</td></tr>
<tr><td style="text-align:left">diabetes</td><td>0.020 (0.118)</td></tr>
<tr><td style="text-align:left">whr</td><td>0.583 (0.524)</td></tr>
<tr><td style="text-align:left">sexMale</td><td>0.150 (0.092)</td></tr>
<tr><td style="text-align:left">age</td><td>-0.034<sup>***</sup> (0.005)</td></tr>
<tr><td style="text-align:left">peopleInHousehold</td><td>0.059<sup>***</sup> (0.022)</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncome18,000 to 30,999</td><td>-0.124 (1,017.458)</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncome31,000 to 51,999</td><td>-0.069 (1,017.458)</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncome52,000 to 100,000</td><td>-0.211 (1,017.458)</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncomeGreater than 100,000</td><td>-0.497 (1,017.458)</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncomeLess than 18,000</td><td>-0.051 (1,017.458)</td></tr>
<tr><td style="text-align:left">ethnicity_simwhite</td><td>-0.319<sup>***</sup> (0.098)</td></tr>
<tr><td style="text-align:left">highBPFalse</td><td>0.051 (0.241)</td></tr>
<tr><td style="text-align:left">highBPTrue</td><td>0.023 (0.242)</td></tr>
<tr><td style="text-align:left">PD_diagTrue</td><td>0.666<sup>**</sup> (0.270)</td></tr>
<tr><td style="text-align:left">AD_diagTrue</td><td>1.521<sup>***</sup> (0.327)</td></tr>
<tr><td style="text-align:left">vascDem_diag</td><td>1.498<sup>***</sup> (0.434)</td></tr>
<tr><td style="text-align:left">frontoDem_diag</td><td>17.051 (882.743)</td></tr>
<tr><td style="text-align:left">log_edu</td><td>0.093<sup>***</sup> (0.035)</td></tr>
<tr><td style="text-align:left">log_crp</td><td>0.003 (0.034)</td></tr>
<tr><td style="text-align:left">blood_groupAB</td><td>-0.081 (0.221)</td></tr>
<tr><td style="text-align:left">blood_groupAO</td><td>-0.027 (0.131)</td></tr>
<tr><td style="text-align:left">blood_groupBB</td><td>-0.184 (0.459)</td></tr>
<tr><td style="text-align:left">blood_groupBO</td><td>0.057 (0.159)</td></tr>
<tr><td style="text-align:left">blood_groupOO</td><td>-0.051 (0.130)</td></tr>
<tr><td style="text-align:left">edu_levelCollege or University degree</td><td>0.257 (0.244)</td></tr>
<tr><td style="text-align:left">edu_levelCSEs or equivalent</td><td>0.235 (0.258)</td></tr>
<tr><td style="text-align:left">edu_levelNone of the above</td><td>0.475<sup>**</sup> (0.236)</td></tr>
<tr><td style="text-align:left">edu_levelNVQ or HND or HNC or equivalent</td><td>0.305 (0.237)</td></tr>
<tr><td style="text-align:left">edu_levelO levels/GCSEs or equivalent</td><td>0.063 (0.239)</td></tr>
<tr><td style="text-align:left">edu_levelOther professional qualifications eg: nursing, teaching</td><td>0.315 (0.230)</td></tr>
<tr><td style="text-align:left">house_typeA house or bungalow</td><td>0.033 (0.118)</td></tr>
<tr><td style="text-align:left">house_typeCare home</td><td>-13.096 (882.743)</td></tr>
<tr><td style="text-align:left">house_typeMobile or temporary structure (i.e. caravan)</td><td>-12.389 (242.809)</td></tr>
<tr><td style="text-align:left">house_typeNone of the above</td><td>-0.528 (0.753)</td></tr>
<tr><td style="text-align:left">house_typeSheltered accommodation</td><td>-12.505 (505.950)</td></tr>
<tr><td style="text-align:left">Constant</td><td>-0.638 (1,017.458)</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>8,800</td></tr>
<tr><td style="text-align:left">Log Likelihood</td><td>-3,102.836</td></tr>
<tr><td style="text-align:left">Akaike Inf. Crit.</td><td>6,289.672</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
</table>
%%R
## odds ratios and 95% CI
ukb_covid.b_odds =
data.frame(cbind(exp(cbind(OR = coef(ukb_covid.b),
conf_int_low = summary(ukb_covid.b)$coefficients[,1] -
summary(ukb_covid.b)$coefficients[,2],
conf_int_high = summary(ukb_covid.b)$coefficients[,1] +
summary(ukb_covid.b)$coefficients[,2])),
p_value = summary(ukb_covid.b)$coefficients[,4]))
stargazer::stargazer(ukb_covid.b_odds, type ="html",
single.row=TRUE, summary = FALSE,
out = "fig_out/ukb_covid_odds_fullmodel.html")
write.csv(ukb_covid.b_odds,"data_output/ukb_covid_full_model_odds.csv", row.names=TRUE)
ukb_covid.b_odds$significance = "p-value > 0.05"
ukb_covid.b_odds$significance[ukb_covid.b_odds$p_value < 0.05] <- "p-value < 0.05"
ukb_covid.b_odds$variables = row.names(ukb_covid.b_odds)
library(ggplot2)
ggplot(ukb_covid.b_odds[which(ukb_covid.b_odds$significance == "p-value < 0.05"),],
aes(x=reorder(variables, OR), y=OR)) +
geom_point(shape=21, size = 2) +
geom_errorbar(aes(ymin=conf_int_low, ymax=conf_int_high),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
theme_classic() +
geom_hline(yintercept = 1, linetype="dotted") +
coord_flip()+ylab("Infectivity odds ratios") +
xlab("Variables")
ggsave('fig_out/UKB_covidInfection_full_model_ORplot_sig_only.pdf', width = 6, height = 4)
ggsave('fig_out/UKB_covidInfection_full_model_ORplot_sig_only.png', width = 6, height = 4)
<table style="text-align:center"><tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td>OR</td><td>conf_int_low</td><td>conf_int_high</td><td>p_value</td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">(Intercept)</td><td>0.528</td><td>0</td><td>Inf</td><td>0.999</td></tr>
<tr><td style="text-align:left">Population</td><td>0.996</td><td>0.994</td><td>0.999</td><td>0.104</td></tr>
<tr><td style="text-align:left">n_cancers</td><td>0.817</td><td>0.733</td><td>0.910</td><td>0.062</td></tr>
<tr><td style="text-align:left">townsend</td><td>1.029</td><td>1.016</td><td>1.043</td><td>0.030</td></tr>
<tr><td style="text-align:left">smokingNever</td><td>1.138</td><td>1.017</td><td>1.274</td><td>0.251</td></tr>
<tr><td style="text-align:left">smokingPrevious</td><td>1.210</td><td>1.080</td><td>1.357</td><td>0.095</td></tr>
<tr><td style="text-align:left">whistlingTRUE</td><td>1.015</td><td>0.937</td><td>1.099</td><td>0.855</td></tr>
<tr><td style="text-align:left">diabetesTRUE</td><td>1.020</td><td>0.906</td><td>1.148</td><td>0.866</td></tr>
<tr><td style="text-align:left">whr</td><td>1.792</td><td>1.061</td><td>3.028</td><td>0.266</td></tr>
<tr><td style="text-align:left">sexMale</td><td>1.162</td><td>1.061</td><td>1.274</td><td>0.100</td></tr>
<tr><td style="text-align:left">age</td><td>0.967</td><td>0.962</td><td>0.971</td><td>0</td></tr>
<tr><td style="text-align:left">peopleInHousehold</td><td>1.061</td><td>1.038</td><td>1.084</td><td>0.006</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncome18,000 to 30,999</td><td>0.883</td><td>0</td><td>Inf</td><td>1.000</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncome31,000 to 51,999</td><td>0.933</td><td>0</td><td>Inf</td><td>1.000</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncome52,000 to 100,000</td><td>0.810</td><td>0</td><td>Inf</td><td>1.000</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncomeGreater than 100,000</td><td>0.608</td><td>0</td><td>Inf</td><td>1.000</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncomeLess than 18,000</td><td>0.951</td><td>0</td><td>Inf</td><td>1.000</td></tr>
<tr><td style="text-align:left">ethnicity_simwhite</td><td>0.727</td><td>0.659</td><td>0.802</td><td>0.001</td></tr>
<tr><td style="text-align:left">highBPFalse</td><td>1.052</td><td>0.827</td><td>1.338</td><td>0.833</td></tr>
<tr><td style="text-align:left">highBPTrue</td><td>1.023</td><td>0.803</td><td>1.303</td><td>0.924</td></tr>
<tr><td style="text-align:left">PD_diagTrue</td><td>1.947</td><td>1.486</td><td>2.551</td><td>0.014</td></tr>
<tr><td style="text-align:left">AD_diagTrue</td><td>4.578</td><td>3.301</td><td>6.350</td><td>0.00000</td></tr>
<tr><td style="text-align:left">vascDem_diagTRUE</td><td>4.475</td><td>2.898</td><td>6.909</td><td>0.001</td></tr>
<tr><td style="text-align:left">frontoDem_diagTRUE</td><td>25,406,220.000</td><td>0</td><td>Inf</td><td>0.985</td></tr>
<tr><td style="text-align:left">log_edu</td><td>1.097</td><td>1.059</td><td>1.136</td><td>0.008</td></tr>
<tr><td style="text-align:left">log_crp</td><td>1.003</td><td>0.970</td><td>1.038</td><td>0.919</td></tr>
<tr><td style="text-align:left">blood_groupAB</td><td>0.922</td><td>0.740</td><td>1.150</td><td>0.714</td></tr>
<tr><td style="text-align:left">blood_groupAO</td><td>0.974</td><td>0.854</td><td>1.110</td><td>0.839</td></tr>
<tr><td style="text-align:left">blood_groupBB</td><td>0.832</td><td>0.526</td><td>1.316</td><td>0.688</td></tr>
<tr><td style="text-align:left">blood_groupBO</td><td>1.059</td><td>0.903</td><td>1.241</td><td>0.718</td></tr>
<tr><td style="text-align:left">blood_groupOO</td><td>0.950</td><td>0.834</td><td>1.082</td><td>0.693</td></tr>
<tr><td style="text-align:left">edu_levelCollege or University degree</td><td>1.293</td><td>1.013</td><td>1.650</td><td>0.292</td></tr>
<tr><td style="text-align:left">edu_levelCSEs or equivalent</td><td>1.265</td><td>0.977</td><td>1.638</td><td>0.364</td></tr>
<tr><td style="text-align:left">edu_levelNone of the above</td><td>1.608</td><td>1.270</td><td>2.037</td><td>0.044</td></tr>
<tr><td style="text-align:left">edu_levelNVQ or HND or HNC or equivalent</td><td>1.356</td><td>1.070</td><td>1.719</td><td>0.199</td></tr>
<tr><td style="text-align:left">edu_levelO levels/GCSEs or equivalent</td><td>1.065</td><td>0.838</td><td>1.352</td><td>0.793</td></tr>
<tr><td style="text-align:left">edu_levelOther professional qualifications eg: nursing, teaching</td><td>1.370</td><td>1.089</td><td>1.724</td><td>0.170</td></tr>
<tr><td style="text-align:left">house_typeA house or bungalow</td><td>1.034</td><td>0.919</td><td>1.163</td><td>0.779</td></tr>
<tr><td style="text-align:left">house_typeCare home</td><td>0.00000</td><td>0</td><td>Inf</td><td>0.988</td></tr>
<tr><td style="text-align:left">house_typeMobile or temporary structure (i.e. caravan)</td><td>0.00000</td><td>0</td><td>11,761,589,003,334,755,352,685,426,304,803,959,958,831,693,829,258,798,628,853,146,637,434,447,308,612,822,624,784,022,818,131,017,728</td><td>0.959</td></tr>
<tr><td style="text-align:left">house_typeNone of the above</td><td>0.590</td><td>0.278</td><td>1.252</td><td>0.483</td></tr>
<tr><td style="text-align:left">house_typeSheltered accommodation</td><td>0.00000</td><td>0</td><td>1</td><td>0.980</td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr></table>
%%R
library(MASS)
# Stepwise regression model
ukb_covid.b_step <- stepAIC(ukb_covid.b, direction = "both")
summary(ukb_covid.b_step)
Start: AIC=6289.67
result ~ Population + n_cancers + townsend + smoking + whistling +
diabetes + whr + sex + age + peopleInHousehold + AverageHouseholdIncome +
ethnicity_sim + highBP + PD_diag + AD_diag + vascDem_diag +
frontoDem_diag + log_edu + log_crp + blood_group + edu_level +
house_type
Df Deviance AIC
- blood_group 5 6206.8 6280.8
- house_type 5 6210.4 6284.4
- highBP 2 6205.9 6285.9
- AverageHouseholdIncome 5 6212.4 6286.4
- log_crp 1 6205.7 6287.7
- diabetes 1 6205.7 6287.7
- whistling 1 6205.7 6287.7
- smoking 2 6208.6 6288.6
- whr 1 6206.9 6288.9
<none> 6205.7 6289.7
- sex 1 6208.4 6290.4
- Population 1 6209.1 6291.1
- n_cancers 1 6209.4 6291.4
- edu_level 6 6219.6 6291.6
- townsend 1 6210.4 6292.4
- frontoDem_diag 1 6210.8 6292.8
- PD_diag 1 6211.0 6293.0
- peopleInHousehold 1 6212.3 6294.3
- log_edu 1 6212.8 6294.8
- vascDem_diag 1 6215.6 6297.6
- ethnicity_sim 1 6215.8 6297.8
- AD_diag 1 6223.3 6305.3
- age 1 6255.6 6337.6
Step: AIC=6280.8
result ~ Population + n_cancers + townsend + smoking + whistling +
diabetes + whr + sex + age + peopleInHousehold + AverageHouseholdIncome +
ethnicity_sim + highBP + PD_diag + AD_diag + vascDem_diag +
frontoDem_diag + log_edu + log_crp + edu_level + house_type
Df Deviance AIC
- house_type 5 6211.6 6275.6
- highBP 2 6207.0 6277.0
- AverageHouseholdIncome 5 6213.6 6277.6
- log_crp 1 6206.8 6278.8
- diabetes 1 6206.8 6278.8
- whistling 1 6206.8 6278.8
- smoking 2 6209.7 6279.7
- whr 1 6208.0 6280.0
<none> 6206.8 6280.8
- sex 1 6209.5 6281.5
- Population 1 6210.3 6282.3
- n_cancers 1 6210.5 6282.5
- edu_level 6 6220.8 6282.8
- townsend 1 6211.5 6283.5
- frontoDem_diag 1 6211.9 6283.9
- PD_diag 1 6212.2 6284.2
- peopleInHousehold 1 6213.7 6285.7
- log_edu 1 6213.9 6285.9
- vascDem_diag 1 6216.7 6288.7
- ethnicity_sim 1 6217.4 6289.4
+ blood_group 5 6205.7 6289.7
- AD_diag 1 6224.4 6296.4
- age 1 6256.6 6328.6
Step: AIC=6275.61
result ~ Population + n_cancers + townsend + smoking + whistling +
diabetes + whr + sex + age + peopleInHousehold + AverageHouseholdIncome +
ethnicity_sim + highBP + PD_diag + AD_diag + vascDem_diag +
frontoDem_diag + log_edu + log_crp + edu_level
Df Deviance AIC
- highBP 2 6211.8 6271.8
- AverageHouseholdIncome 5 6218.3 6272.3
- log_crp 1 6211.6 6273.6
- diabetes 1 6211.6 6273.6
- whistling 1 6211.6 6273.6
- smoking 2 6214.6 6274.6
- whr 1 6212.8 6274.8
<none> 6211.6 6275.6
- sex 1 6214.3 6276.3
- Population 1 6215.2 6277.2
- n_cancers 1 6215.3 6277.3
- edu_level 6 6225.6 6277.6
- townsend 1 6216.7 6278.7
- frontoDem_diag 1 6216.7 6278.7
- PD_diag 1 6217.1 6279.1
+ house_type 5 6206.8 6280.8
- peopleInHousehold 1 6219.0 6281.0
- log_edu 1 6219.2 6281.2
- vascDem_diag 1 6221.5 6283.5
- ethnicity_sim 1 6222.0 6284.0
+ blood_group 5 6210.4 6284.4
- AD_diag 1 6229.3 6291.3
- age 1 6261.3 6323.3
Step: AIC=6271.79
result ~ Population + n_cancers + townsend + smoking + whistling +
diabetes + whr + sex + age + peopleInHousehold + AverageHouseholdIncome +
ethnicity_sim + PD_diag + AD_diag + vascDem_diag + frontoDem_diag +
log_edu + log_crp + edu_level
Df Deviance AIC
- AverageHouseholdIncome 5 6218.5 6268.5
- log_crp 1 6211.8 6269.8
- whistling 1 6211.8 6269.8
- diabetes 1 6211.8 6269.8
- smoking 2 6214.7 6270.7
- whr 1 6213.0 6271.0
<none> 6211.8 6271.8
- sex 1 6214.4 6272.4
- Population 1 6215.3 6273.3
- n_cancers 1 6215.5 6273.5
- edu_level 6 6225.8 6273.8
- frontoDem_diag 1 6216.9 6274.9
- townsend 1 6217.0 6275.0
- PD_diag 1 6217.3 6275.3
+ highBP 2 6211.6 6275.6
+ house_type 5 6207.0 6277.0
- peopleInHousehold 1 6219.3 6277.3
- log_edu 1 6219.3 6277.3
- vascDem_diag 1 6221.7 6279.7
- ethnicity_sim 1 6222.2 6280.2
+ blood_group 5 6210.6 6280.6
- AD_diag 1 6229.5 6287.5
- age 1 6263.3 6321.3
Step: AIC=6268.47
result ~ Population + n_cancers + townsend + smoking + whistling +
diabetes + whr + sex + age + peopleInHousehold + ethnicity_sim +
PD_diag + AD_diag + vascDem_diag + frontoDem_diag + log_edu +
log_crp + edu_level
Df Deviance AIC
- log_crp 1 6218.5 6266.5
- whistling 1 6218.5 6266.5
- diabetes 1 6218.5 6266.5
- smoking 2 6221.2 6267.2
- whr 1 6219.9 6267.9
<none> 6218.5 6268.5
- sex 1 6220.5 6268.5
- Population 1 6221.9 6269.9
- n_cancers 1 6222.2 6270.2
- edu_level 6 6233.3 6271.3
- frontoDem_diag 1 6223.8 6271.8
+ AverageHouseholdIncome 5 6211.8 6271.8
- PD_diag 1 6224.0 6272.0
- townsend 1 6224.1 6272.1
+ highBP 2 6218.3 6272.3
- peopleInHousehold 1 6224.6 6272.6
+ house_type 5 6213.7 6273.7
- vascDem_diag 1 6228.3 6276.3
+ blood_group 5 6217.2 6277.2
- ethnicity_sim 1 6229.4 6277.4
- log_edu 1 6230.1 6278.1
- AD_diag 1 6236.5 6284.5
- age 1 6268.0 6316.0
Step: AIC=6266.49
result ~ Population + n_cancers + townsend + smoking + whistling +
diabetes + whr + sex + age + peopleInHousehold + ethnicity_sim +
PD_diag + AD_diag + vascDem_diag + frontoDem_diag + log_edu +
edu_level
Df Deviance AIC
- whistling 1 6218.5 6264.5
- diabetes 1 6218.5 6264.5
- smoking 2 6221.2 6265.2
- whr 1 6220.1 6266.1
<none> 6218.5 6266.5
- sex 1 6220.5 6266.5
- Population 1 6221.9 6267.9
- n_cancers 1 6222.2 6268.2
+ log_crp 1 6218.5 6268.5
- edu_level 6 6233.4 6269.4
- frontoDem_diag 1 6223.8 6269.8
+ AverageHouseholdIncome 5 6211.8 6269.8
- PD_diag 1 6224.0 6270.0
- townsend 1 6224.1 6270.1
+ highBP 2 6218.4 6270.4
- peopleInHousehold 1 6224.6 6270.6
+ house_type 5 6213.7 6271.7
- vascDem_diag 1 6228.3 6274.3
+ blood_group 5 6217.2 6275.2
- ethnicity_sim 1 6229.4 6275.4
- log_edu 1 6230.2 6276.2
- AD_diag 1 6236.5 6282.5
- age 1 6268.1 6314.1
Step: AIC=6264.52
result ~ Population + n_cancers + townsend + smoking + diabetes +
whr + sex + age + peopleInHousehold + ethnicity_sim + PD_diag +
AD_diag + vascDem_diag + frontoDem_diag + log_edu + edu_level
Df Deviance AIC
- diabetes 1 6218.6 6262.6
- smoking 2 6221.2 6263.2
- whr 1 6220.2 6264.2
<none> 6218.5 6264.5
- sex 1 6220.5 6264.5
- Population 1 6222.0 6266.0
- n_cancers 1 6222.2 6266.2
+ whistling 1 6218.5 6266.5
+ log_crp 1 6218.5 6266.5
- edu_level 6 6233.5 6267.5
- frontoDem_diag 1 6223.8 6267.8
+ AverageHouseholdIncome 5 6211.8 6267.8
- PD_diag 1 6224.0 6268.0
- townsend 1 6224.2 6268.2
+ highBP 2 6218.4 6268.4
- peopleInHousehold 1 6224.6 6268.6
+ house_type 5 6213.8 6269.8
- vascDem_diag 1 6228.3 6272.3
+ blood_group 5 6217.3 6273.3
- ethnicity_sim 1 6229.4 6273.4
- log_edu 1 6230.3 6274.3
- AD_diag 1 6236.6 6280.6
- age 1 6268.1 6312.1
Step: AIC=6262.57
result ~ Population + n_cancers + townsend + smoking + whr +
sex + age + peopleInHousehold + ethnicity_sim + PD_diag +
AD_diag + vascDem_diag + frontoDem_diag + log_edu + edu_level
Df Deviance AIC
- smoking 2 6221.3 6261.3
- whr 1 6220.5 6262.5
- sex 1 6220.5 6262.5
<none> 6218.6 6262.6
- Population 1 6222.0 6264.0
- n_cancers 1 6222.3 6264.3
+ diabetes 1 6218.5 6264.5
+ whistling 1 6218.5 6264.5
+ log_crp 1 6218.5 6264.5
- edu_level 6 6233.6 6265.6
+ AverageHouseholdIncome 5 6211.8 6265.8
- frontoDem_diag 1 6223.9 6265.9
- PD_diag 1 6224.0 6266.0
- townsend 1 6224.3 6266.3
+ highBP 2 6218.4 6266.4
- peopleInHousehold 1 6224.6 6266.6
+ house_type 5 6213.8 6267.8
- vascDem_diag 1 6228.5 6270.5
+ blood_group 5 6217.3 6271.3
- ethnicity_sim 1 6229.5 6271.5
- log_edu 1 6230.4 6272.4
- AD_diag 1 6236.7 6278.7
- age 1 6268.2 6310.2
Step: AIC=6261.32
result ~ Population + n_cancers + townsend + whr + sex + age +
peopleInHousehold + ethnicity_sim + PD_diag + AD_diag + vascDem_diag +
frontoDem_diag + log_edu + edu_level
Df Deviance AIC
- sex 1 6223.3 6261.3
<none> 6221.3 6261.3
- whr 1 6223.3 6261.3
+ smoking 2 6218.6 6262.6
- Population 1 6224.7 6262.7
- n_cancers 1 6224.9 6262.9
+ diabetes 1 6221.2 6263.2
+ log_crp 1 6221.3 6263.3
+ whistling 1 6221.3 6263.3
- edu_level 6 6236.3 6264.3
- townsend 1 6226.6 6264.6
- frontoDem_diag 1 6226.7 6264.7
- PD_diag 1 6226.7 6264.7
+ AverageHouseholdIncome 5 6214.8 6264.8
+ highBP 2 6221.2 6265.2
- peopleInHousehold 1 6227.7 6265.7
+ house_type 5 6216.5 6266.5
- vascDem_diag 1 6231.2 6269.2
+ blood_group 5 6220.1 6270.1
- ethnicity_sim 1 6232.3 6270.3
- log_edu 1 6232.8 6270.8
- AD_diag 1 6239.3 6277.3
- age 1 6269.0 6307.0
Step: AIC=6261.27
result ~ Population + n_cancers + townsend + whr + age + peopleInHousehold +
ethnicity_sim + PD_diag + AD_diag + vascDem_diag + frontoDem_diag +
log_edu + edu_level
Df Deviance AIC
<none> 6223.3 6261.3
+ sex 1 6221.3 6261.3
+ smoking 2 6220.5 6262.5
- Population 1 6226.6 6262.6
- n_cancers 1 6227.0 6263.0
+ log_crp 1 6223.2 6263.2
+ diabetes 1 6223.2 6263.2
+ whistling 1 6223.3 6263.3
- edu_level 6 6238.0 6264.0
- townsend 1 6228.3 6264.3
- frontoDem_diag 1 6228.7 6264.7
- PD_diag 1 6228.9 6264.9
+ highBP 2 6223.2 6265.2
+ AverageHouseholdIncome 5 6217.4 6265.4
- peopleInHousehold 1 6229.9 6265.9
+ house_type 5 6218.5 6266.5
- whr 1 6232.2 6268.2
- vascDem_diag 1 6233.1 6269.1
- ethnicity_sim 1 6233.8 6269.8
+ blood_group 5 6222.0 6270.0
- log_edu 1 6234.3 6270.3
- AD_diag 1 6241.2 6277.2
- age 1 6270.9 6306.9
Call:
glm(formula = result ~ Population + n_cancers + townsend + whr +
age + peopleInHousehold + ethnicity_sim + PD_diag + AD_diag +
vascDem_diag + frontoDem_diag + log_edu + edu_level, family = "binomial",
data = cur_dt_na)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.2324 -0.5385 -0.4593 -0.3854 2.5955
Coefficients:
Estimate
(Intercept) -1.241031
Population -0.003740
n_cancers -0.204119
townsend 0.026925
whr 1.118596
age -0.031004
peopleInHousehold 0.057422
ethnicity_simwhite -0.317547
PD_diagTrue 0.681123
AD_diagTrue 1.530243
vascDem_diagTRUE 1.473491
frontoDem_diagTRUE 14.237561
log_edu 0.109024
edu_levelCollege or University degree 0.225516
edu_levelCSEs or equivalent 0.257416
edu_levelNone of the above 0.487372
edu_levelNVQ or HND or HNC or equivalent 0.320160
edu_levelO levels/GCSEs or equivalent 0.069233
edu_levelOther professional qualifications eg: nursing, teaching 0.304212
Std. Error
(Intercept) 0.477290
Population 0.002323
n_cancers 0.108362
townsend 0.011949
whr 0.374575
age 0.004491
peopleInHousehold 0.021130
ethnicity_simwhite 0.096044
PD_diagTrue 0.269720
AD_diagTrue 0.325651
vascDem_diagTRUE 0.430448
frontoDem_diagTRUE 196.967726
log_edu 0.033280
edu_levelCollege or University degree 0.242680
edu_levelCSEs or equivalent 0.257745
edu_levelNone of the above 0.234589
edu_levelNVQ or HND or HNC or equivalent 0.236533
edu_levelO levels/GCSEs or equivalent 0.238376
edu_levelOther professional qualifications eg: nursing, teaching 0.228949
z value
(Intercept) -2.600
Population -1.610
n_cancers -1.884
townsend 2.253
whr 2.986
age -6.904
peopleInHousehold 2.718
ethnicity_simwhite -3.306
PD_diagTrue 2.525
AD_diagTrue 4.699
vascDem_diagTRUE 3.423
frontoDem_diagTRUE 0.072
log_edu 3.276
edu_levelCollege or University degree 0.929
edu_levelCSEs or equivalent 0.999
edu_levelNone of the above 2.078
edu_levelNVQ or HND or HNC or equivalent 1.354
edu_levelO levels/GCSEs or equivalent 0.290
edu_levelOther professional qualifications eg: nursing, teaching 1.329
Pr(>|z|)
(Intercept) 0.009318 **
Population 0.107372
n_cancers 0.059610 .
townsend 0.024238 *
whr 0.002824 **
age 5.07e-12 ***
peopleInHousehold 0.006577 **
ethnicity_simwhite 0.000945 ***
PD_diagTrue 0.011560 *
AD_diagTrue 2.61e-06 ***
vascDem_diagTRUE 0.000619 ***
frontoDem_diagTRUE 0.942376
log_edu 0.001053 **
edu_levelCollege or University degree 0.352748
edu_levelCSEs or equivalent 0.317930
edu_levelNone of the above 0.037750 *
edu_levelNVQ or HND or HNC or equivalent 0.175879
edu_levelO levels/GCSEs or equivalent 0.771481
edu_levelOther professional qualifications eg: nursing, teaching 0.183936
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6421.9 on 8799 degrees of freedom
Residual deviance: 6223.3 on 8781 degrees of freedom
AIC: 6261.3
Number of Fisher Scoring iterations: 10
The reduced model is:
glm(formula = result ~ Population + n_cancers + townsend + whr +
age + peopleInHousehold + ethnicity_sim + PD_diag + AD_diag +
vascDem_diag + frontoDem_diag + log_edu + edu_level, family = “binomial”,
data = cur_dt_na)
%%R
ukb_covid.b_red <-glm(formula = result ~ Population + n_cancers + townsend + whr +
age + peopleInHousehold + ethnicity_sim + PD_diag + AD_diag +
vascDem_diag + frontoDem_diag + log_edu + edu_level, family = "binomial",
data = cur_dt_na)
#### Comparison of the simplified model versus the full model
%%R
anova(ukb_covid.b, ukb_covid.b_red, test="Chisq")
Analysis of Deviance Table
Model 1: result ~ Population + n_cancers + townsend + smoking + whistling +
diabetes + whr + sex + age + peopleInHousehold + AverageHouseholdIncome +
ethnicity_sim + highBP + PD_diag + AD_diag + vascDem_diag +
frontoDem_diag + log_edu + log_crp + blood_group + edu_level +
house_type
Model 2: result ~ Population + n_cancers + townsend + whr + age + peopleInHousehold +
ethnicity_sim + PD_diag + AD_diag + vascDem_diag + frontoDem_diag +
log_edu + edu_level
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 8758 6205.7
2 8781 6223.3 -23 -17.6 0.7788
This means that removing those variables does not significantly change the model.
%%R
ukb_covid.b_red <-glm(formula = result ~ Population + n_cancers + townsend + whr +
age + peopleInHousehold + ethnicity_sim + PD_diag + AD_diag +
vascDem_diag + frontoDem_diag + edu_level, family = "binomial",
data = cur_dt)
ukb_covid.b_red_odds =
data.frame(cbind(exp(cbind(OR = coef(ukb_covid.b_red),
conf_int_low = summary(ukb_covid.b_red)$coefficients[,1] -
summary(ukb_covid.b_red)$coefficients[,2],
conf_int_high = summary(ukb_covid.b_red)$coefficients[,1] +
summary(ukb_covid.b_red)$coefficients[,2])),
p_value = summary(ukb_covid.b_red)$coefficients[,4]))
stargazer::stargazer(ukb_covid.b_red_odds, type ="html",
single.row=TRUE, summary = FALSE,
out = "fig_out/ukb_covid_b_red_odds.html")
write.csv(ukb_covid.b_red_odds,"data_output/ukb_covid_b_red_odds.csv", row.names=TRUE)
ukb_covid.b_red_odds$significance = "p-value > 0.05"
ukb_covid.b_red_odds$significance[ukb_covid.b_red_odds$p_value < 0.05] <- "p-value < 0.05"
ukb_covid.b_red_odds$variables = row.names(ukb_covid.b_red_odds)
library(ggplot2)
ggplot(ukb_covid.b_red_odds, aes(x=reorder(variables, OR), y=OR, color=significance)) +
geom_point(fill="white", shape=21, size = 2) +
geom_errorbar(aes(ymin=conf_int_low, ymax=conf_int_high),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
theme_classic() +
geom_hline(yintercept = 1, linetype="dotted") +
coord_flip()+ylab("Infectivity odds ratios") +
xlab("Variables")
ggsave('fig_out/UKB_infectivityOR_plot_optimised.pdf', width = 8, height = 8)
ggsave('fig_out/UKB_infectivityOR_plot_optimised.png', width = 8, height = 8)
<table style="text-align:center"><tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td>OR</td><td>conf_int_low</td><td>conf_int_high</td><td>p_value</td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">(Intercept)</td><td>0.269</td><td>0.182</td><td>0.397</td><td>0.001</td></tr>
<tr><td style="text-align:left">Population</td><td>0.998</td><td>0.996</td><td>0.999</td><td>0.150</td></tr>
<tr><td style="text-align:left">n_cancers</td><td>0.810</td><td>0.742</td><td>0.885</td><td>0.017</td></tr>
<tr><td style="text-align:left">townsend</td><td>1.038</td><td>1.029</td><td>1.047</td><td>0.00002</td></tr>
<tr><td style="text-align:left">whr</td><td>3.088</td><td>2.273</td><td>4.195</td><td>0.0002</td></tr>
<tr><td style="text-align:left">age</td><td>0.974</td><td>0.971</td><td>0.978</td><td>0</td></tr>
<tr><td style="text-align:left">peopleInHousehold</td><td>1.054</td><td>1.037</td><td>1.072</td><td>0.002</td></tr>
<tr><td style="text-align:left">ethnicity_simwhite</td><td>0.721</td><td>0.668</td><td>0.778</td><td>0.00002</td></tr>
<tr><td style="text-align:left">PD_diagTrue</td><td>1.712</td><td>1.371</td><td>2.138</td><td>0.016</td></tr>
<tr><td style="text-align:left">AD_diagTrue</td><td>4.153</td><td>3.225</td><td>5.349</td><td>0.00000</td></tr>
<tr><td style="text-align:left">vascDem_diagTRUE</td><td>2.511</td><td>1.697</td><td>3.717</td><td>0.019</td></tr>
<tr><td style="text-align:left">frontoDem_diagTRUE</td><td>3.090</td><td>1.161</td><td>8.221</td><td>0.249</td></tr>
<tr><td style="text-align:left">edu_levelCollege or University degree</td><td>1.155</td><td>0.944</td><td>1.413</td><td>0.474</td></tr>
<tr><td style="text-align:left">edu_levelCSEs or equivalent</td><td>1.413</td><td>1.143</td><td>1.747</td><td>0.103</td></tr>
<tr><td style="text-align:left">edu_levelNone of the above</td><td>1.654</td><td>1.366</td><td>2.003</td><td>0.008</td></tr>
<tr><td style="text-align:left">edu_levelNVQ or HND or HNC or equivalent</td><td>1.418</td><td>1.167</td><td>1.724</td><td>0.073</td></tr>
<tr><td style="text-align:left">edu_levelO levels/GCSEs or equivalent</td><td>1.123</td><td>0.923</td><td>1.367</td><td>0.555</td></tr>
<tr><td style="text-align:left">edu_levelOther professional qualifications eg: nursing, teaching</td><td>1.419</td><td>1.174</td><td>1.715</td><td>0.064</td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr></table>
#### Create a model with only dementia - all diagnosis
%%R
cur_dt_dem = subset(gen_dt, select = c(result, Population, n_cancers,townsend, smoking, whistling, diabetes,
whr , sex , age ,peopleInHousehold,AverageHouseholdIncome,ethnicity_sim,highBP,
PD_diag, AD_diag , vascDem_diag,frontoDem_diag,allDem_diag,
# greyVol , whiteVol , wmh_norm,
edu_score, log_crp,blood_group,log_edu,
edu_level , house_type))
ukb_covid.b_red_dem <-glm(formula = result ~ Population + n_cancers + townsend + whr +
age + peopleInHousehold + ethnicity_sim + allDem_diag + edu_level+allDem_diag,
family = "binomial",
data = cur_dt_dem)
ukb_covid.b_red_odds =
data.frame(cbind(exp(cbind(OR = coef(ukb_covid.b_red_dem),
conf_int_low = summary(ukb_covid.b_red_dem)$coefficients[,1] -
summary(ukb_covid.b_red_dem)$coefficients[,2],
conf_int_high = summary(ukb_covid.b_red_dem)$coefficients[,1] +
summary(ukb_covid.b_red_dem)$coefficients[,2])),
p_value = summary(ukb_covid.b_red_dem)$coefficients[,4]))
stargazer::stargazer(ukb_covid.b_red_dem, type ="html",
single.row=TRUE, summary = FALSE,
out = "fig_out/ukb_covid_b_dementia_model.html")
write.csv(ukb_covid.b_red_odds,"data_output/ukb_covid_b_red_odds_dementia.csv", row.names=TRUE)
ukb_covid.b_red_odds$significance = "p-value > 0.05"
ukb_covid.b_red_odds$significance[ukb_covid.b_red_odds$p_value < 0.05] <- "p-value < 0.05"
ukb_covid.b_red_odds$variables = row.names(ukb_covid.b_red_odds)
library(ggplot2)
ggplot(ukb_covid.b_red_odds, aes(x=reorder(variables, OR), y=OR, color=significance)) +
geom_point(fill="white", shape=21, size = 2) +
geom_errorbar(aes(ymin=conf_int_low, ymax=conf_int_high),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
theme_classic() +
geom_hline(yintercept = 1, linetype="dotted") +
coord_flip()+ylab("Infectivity odds ratios") +
xlab("Variables")
# ggsave('fig_out/UKB_infectivityOR_plot_optimised_dementia.pdf', width = 8, height = 8)
ggsave('fig_out/UKB_infectivityOR_plot_optimised_dementia.png', width = 8, height = 8)
<table style="text-align:center"><tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr>
<tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr>
<tr><td style="text-align:left"></td><td>result</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Population</td><td>-0.002 (0.002)</td></tr>
<tr><td style="text-align:left">n_cancers</td><td>-0.204<sup>**</sup> (0.088)</td></tr>
<tr><td style="text-align:left">townsend</td><td>0.036<sup>***</sup> (0.009)</td></tr>
<tr><td style="text-align:left">whr</td><td>1.124<sup>***</sup> (0.306)</td></tr>
<tr><td style="text-align:left">age</td><td>-0.026<sup>***</sup> (0.004)</td></tr>
<tr><td style="text-align:left">peopleInHousehold</td><td>0.053<sup>***</sup> (0.017)</td></tr>
<tr><td style="text-align:left">ethnicity_simwhite</td><td>-0.327<sup>***</sup> (0.077)</td></tr>
<tr><td style="text-align:left">allDem_diag</td><td>1.181<sup>***</sup> (0.174)</td></tr>
<tr><td style="text-align:left">edu_levelCollege or University degree</td><td>0.150 (0.201)</td></tr>
<tr><td style="text-align:left">edu_levelCSEs or equivalent</td><td>0.358<sup>*</sup> (0.212)</td></tr>
<tr><td style="text-align:left">edu_levelNone of the above</td><td>0.515<sup>***</sup> (0.191)</td></tr>
<tr><td style="text-align:left">edu_levelNVQ or HND or HNC or equivalent</td><td>0.357<sup>*</sup> (0.195)</td></tr>
<tr><td style="text-align:left">edu_levelO levels/GCSEs or equivalent</td><td>0.121 (0.196)</td></tr>
<tr><td style="text-align:left">edu_levelOther professional qualifications eg: nursing, teaching</td><td>0.356<sup>*</sup> (0.189)</td></tr>
<tr><td style="text-align:left">Constant</td><td>-1.329<sup>***</sup> (0.388)</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>12,788</td></tr>
<tr><td style="text-align:left">Log Likelihood</td><td>-4,618.582</td></tr>
<tr><td style="text-align:left">Akaike Inf. Crit.</td><td>9,267.165</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
</table>
%%R
ukb_covid.b_red_noDem <-glm(formula = result ~ Population + n_cancers + townsend + whr +
age + peopleInHousehold + ethnicity_sim + edu_level, family = "binomial", data = cur_dt_na)
anova(ukb_covid.b, ukb_covid.b_red_noDem, test="Chisq")
Analysis of Deviance Table
Model 1: result ~ Population + n_cancers + townsend + smoking + whistling +
diabetes + whr + sex + age + peopleInHousehold + AverageHouseholdIncome +
ethnicity_sim + highBP + PD_diag + AD_diag + vascDem_diag +
frontoDem_diag + log_edu + log_crp + blood_group + edu_level +
house_type
Model 2: result ~ Population + n_cancers + townsend + whr + age + peopleInHousehold +
ethnicity_sim + edu_level
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 8758 6205.7
2 8786 6274.2 -28 -68.535 2.974e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Taking out AD and PD significantly change the model. Dementia-related variables are therefore important for COVID-19-related cases.
%%R
ukb_death = read.csv("/Users/yizhouyu/camDrive/ARUK_project/additional_datasets/covid_refresh_16_8_2020/death_cause_17_8_2020.txt", sep = "\t")[c('eid', 'level', 'cause_icd10')]
ukb_dt = read.csv('data_output/ukb_covid_dt_out.csv', sep = ",")
ukb_dt_death = merge(ukb_dt, ukb_death, by = "eid", all.x = TRUE)
print(nrow((ukb_dt_death)))
[1] 14769
Here, I look at the different causes of death. I realise that there are multiple causes and it seems unclear whether COVID-19 contributed to those deaths.
%%R
ukb_death_code = read.csv("data/coding19.tsv", sep = "\t")[c('coding', 'meaning')]
# Here I create a for loop to loop through all ICD codings... because for some reason the built in function do not work
# subset the death code present in my dataset
ukb_death_code_cleaned <- ukb_death_code[ukb_death_code$coding %in% na.omit(unique(ukb_dt_death$cause_icd10)), ]
ukb_dt_death$cause_icd10 <- ordered(ukb_dt_death$cause_icd10, levels = ukb_death_code_cleaned$coding,
labels = ukb_death_code_cleaned$meaning)
unique(na.omit(ukb_dt_death$cause_icd10))
[1] I50.0 Congestive heart failure
[2] D46.9 Myelodysplastic syndrome, unspecified
[3] C19 Malignant neoplasm of rectosigmoid junction
[4] F01.9 Vascular dementia, unspecified
[5] U07.1 COVID-19 virus identified
[6] J12.9 Viral pneumonia, unspecified
[7] J44.9 Chronic obstructive pulmonary disease, unspecified
[8] J18.0 Bronchopneumonia, unspecified
[9] C34.9 Bronchus or lung, unspecified
[10] J18.9 Pneumonia, unspecified
[11] F03 Unspecified dementia
[12] N18.9 Chronic renal failure, unspecified
[13] R68.8 Other specified general symptoms and signs
[14] E11.9 Without complications
[15] E23.7 Disorder of pituitary gland, unspecified
[16] Y83.9 Surgical procedure, unspecified
[17] R26.3 Immobility
[18] T81.0 Haemorrhage and haematoma complicating a procedure, not elsewhere classified
[19] W19 Unspecified fall
[20] F10.1 Harmful use
[21] S06.9 Intracranial injury, unspecified
[22] F41.9 Anxiety disorder, unspecified
[23] F32.9 Depressive episode, unspecified
[24] R54 Senility
[25] K72.9 Hepatic failure, unspecified
[26] K74.3 Primary biliary cirrhosis
[27] K65.0 Acute peritonitis
[28] I10 Essential (primary) hypertension
[29] I26.9 Pulmonary embolism without mention of acute cor pulmonale
[30] J18.1 Lobar pneumonia, unspecified
[31] I21.9 Acute myocardial infarction, unspecified
[32] J45.9 Asthma, unspecified
[33] I35.9 Aortic valve disorder, unspecified
[34] E87.1 Hypo-osmolality and hyponatraemia
[35] E10.9 Without complications
[36] I20.9 Angina pectoris, unspecified
[37] J96.9 Respiratory failure, unspecified
[38] I50.9 Heart failure, unspecified
[39] I70.9 Generalised and unspecified atherosclerosis
[40] L03.9 Cellulitis, unspecified
[41] A41.9 Septicaemia, unspecified
[42] E66.9 Obesity, unspecified
[43] I89.0 Lymphoedema, not elsewhere classified
[44] D61.9 Aplastic anaemia, unspecified
[45] C83.1 Small cleaved cell (diffuse)
[46] Y43.3 Other antineoplastic drugs
[47] D64.9 Anaemia, unspecified
[48] E78.0 Pure hypercholesterolaemia
[49] U07.2 COVID-19 virus not identified
[50] C92.0 Acute myeloid leukaemia
[51] I25.9 Chronic ischaemic heart disease, unspecified
[52] I24.8 Other forms of acute ischaemic heart disease
[53] J69.0 Pneumonitis due to food and vomit
[54] G30.9 Alzheimer's disease, unspecified
[55] C50.9 Breast, unspecified
[56] J81 Pulmonary oedema
[57] K21.9 Gastro-oesophageal reflux disease without oesophagitis
[58] G93.1 Anoxic brain damage, not elsewhere classified
[59] J44.1 Chronic obstructive pulmonary disease with acute exacerbation, unspecified
[60] I46.9 Cardiac arrest, unspecified
[61] I49.0 Ventricular fibrillation and flutter
[62] C91.1 Chronic lymphocytic leukaemia
[63] C85.9 Non-Hodgkin's lymphoma, unspecified type
[64] C80.9 Malignant neoplasm, unspecified
[65] N39.0 Urinary tract infection, site not specified
[66] C16.9 Stomach, unspecified
[67] I80.2 Phlebitis and thrombophlebitis of other deep vessels of lower extremities
[68] I62.9 Intracranial haemorrhage (nontraumatic), unspecified
[69] C79.3 Secondary malignant neoplasm of brain and cerebral meninges
[70] C43.9 Malignant melanoma of skin, unspecified
[71] C79.9 Secondary malignant neoplasm, unspecified site
[72] T14.9 Injury, unspecified
[73] I63.9 Cerebral infarction, unspecified
[74] C25.9 Pancreas, unspecified
[75] C92.1 Chronic myeloid leukaemia
[76] J10.1 Influenza with other respiratory manifestations, influenza virus identified
[77] J10.0 Influenza with pneumonia, influenza virus identified
[78] C90.0 Multiple myeloma
[79] C17.0 Duodenum
[80] C15.9 Oesophagus, unspecified
[81] K55.0 Acute vascular disorders of intestine
[82] E14.9 Without complications
[83] J80 Adult respiratory distress syndrome
[84] C18.9 Colon, unspecified
[85] G30.1 Alzheimer's disease with late onset
[86] N32.1 Vesicointestinal fistula
[87] K57.9 Diverticular disease of intestine, part unspecified, without perforation or abscess
[88] M34.9 Systemic sclerosis, unspecified
[89] I48.9 Atrial fibrillation and atrial flutter, unspecified
[90] K51.9 Ulcerative colitis, unspecified
[91] J84.1 Other interstitial pulmonary diseases with fibrosis
[92] G47.3 Sleep apnoea
[93] I51.6 Cardiovascular disease, unspecified
[94] Z94.0 Kidney transplant status
[95] K86.8 Other specified diseases of pancreas
[96] N18.5 Chronic kidney disease, stage 5
[97] Y43.4 Immunosuppressive agents
[98] B24 Unspecified human immunodeficiency virus [HIV] disease
[99] C67.9 Bladder, unspecified
[100] M35.3 Polymyalgia rheumatica
[101] J43.9 Emphysema, unspecified
[102] I48.0 Paroxysmal atrial fibrillation
[103] J15.9 Bacterial pneumonia, unspecified
[104] I74.9 Embolism and thrombosis of unspecified artery
[105] M47.9 Spondylosis, unspecified
[106] K57.8 Diverticular disease of intestine, part unspecified, with perforation and abscess
[107] K74.6 Other and unspecified cirrhosis of liver
[108] G20 Parkinson's disease
[109] S72.0 Fracture of neck of femur
[110] Z51.9 Medical care, unspecified
[111] E78.5 Hyperlipidaemia, unspecified
[112] C79.5 Secondary malignant neoplasm of bone and bone marrow
[113] C61 Malignant neoplasm of prostate
[114] E03.9 Hypothyroidism, unspecified
[115] G31.8 Other specified degenerative diseases of nervous system
[116] K70.3 Alcoholic cirrhosis of liver
[117] B18.2 Chronic viral hepatitis C
[118] I50.1 Left ventricular failure
[119] E66.8 Other obesity
[120] K70.0 Alcoholic fatty liver
[121] I11.0 Hypertensive heart disease with (congestive) heart failure
[122] I11.9 Hypertensive heart disease without (congestive) heart failure
[123] C80.0 Malignant neoplasm, primary site unknown, so stated
[124] I64 Stroke, not specified as haemorrhage or infarction
[125] K92.2 Gastro-intestinal haemorrhage, unspecified
[126] C64 Malignant neoplasm of kidney, except renal pelvis
[127] F10.9 Unspecified mental and behavioural disorder
[128] I85.9 Oesophageal varices without bleeding
[129] K76.0 Fatty (change of) liver, not elsewhere classified
[130] C71.9 Brain, unspecified
[131] I51.8 Other ill-defined heart diseases
[132] C85.1 B-cell lymphoma, unspecified
[133] C90.1 Plasma cell leukaemia
[134] M35.0 Sicca syndrome [Sjogren]
[135] I45.9 Conduction disorder, unspecified
[136] C26.0 Intestinal tract, part unspecified
[137] I27.2 Other secondary pulmonary hypertension
[138] M34.8 Other forms of systemic sclerosis
[139] I67.9 Cerebrovascular disease, unspecified
[140] J22 Unspecified acute lower respiratory infection
[141] J44.0 Chronic obstructive pulmonary disease with acute lower respiratory infection
[142] J12.8 Other viral pneumonia
[143] D41.4 Bladder
[144] Z95.1 Presence of aortocoronary bypass graft
[145] Z95.0 Presence of cardiac pacemaker
[146] I49.9 Cardiac arrhythmia, unspecified
[147] I25.1 Atherosclerotic heart disease
[148] J15.4 Pneumonia due to other streptococci
[149] I69.4 Sequelae of stroke, not specified as haemorrhage or infarction
[150] Q85.0 Neurofibromatosis (nonmalignant)
[151] N17.9 Acute renal failure, unspecified
[152] I38 Endocarditis, valve unspecified
[153] I82.9 Embolism and thrombosis of unspecified vein
[154] M10.9 Gout, unspecified
[155] N04.9 Unspecified
[156] N28.9 Disorder of kidney and ureter, unspecified
[157] E85.4 Organ-limited amyloidosis
[158] N03.9 Unspecified
[159] D50.9 Iron deficiency anaemia, unspecified
[160] J84.9 Interstitial pulmonary disease, unspecified
[161] K56.6 Other and unspecified intestinal obstruction
[162] G04.9 Encephalitis, myelitis and encephalomyelitis, unspecified
[163] D76.1 Haemophagocytic lymphohistiocytosis
[164] K83.0 Cholangitis
[165] C22.9 Liver, unspecified
[166] S32.5 Fracture of pubis
[167] X59.0 Home
[168] C78.7 Secondary malignant neoplasm of liver
[169] C73 Malignant neoplasm of thyroid gland
[170] K28.9 Unspecified as acute or chronic, without haemorrhage or perforation
[171] K28.4 Chronic or unspecified with haemorrhage
[172] K80.5 Calculus of bile duct without cholangitis or cholecystitis
[173] K27.9 Unspecified as acute or chronic, without haemorrhage or perforation
[174] K27.4 Chronic or unspecified with haemorrhage
[175] K66.0 Peritoneal adhesions
[176] K43.3 Parastomal hernia with obstruction, without gangrene
[177] K63.9 Disease of intestine, unspecified
[178] I06.0 Rheumatic aortic stenosis
[179] I05.1 Rheumatic mitral insufficiency
[180] I21.4 Acute subendocardial myocardial infarction
[181] K43.5 Parastomal hernia without obstruction or gangrene
[182] Z90.4 Acquired absence of other parts of digestive tract
[183] F50.0 Anorexia nervosa
[184] N61 Inflammatory disorders of breast
[185] I42.9 Cardiomyopathy, unspecified
[186] K80.2 Calculus of gallbladder without cholecystitis
[187] K82.2 Perforation of gallbladder
[188] K76.7 Hepatorenal syndrome
[189] C22.1 Intrahepatic bile duct carcinoma
[190] K57.2 Diverticular disease of large intestine with perforation and abscess
[191] K65.9 Peritonitis, unspecified
[192] R57.0 Cardiogenic shock
[193] M85.8 Other specified disorders of bone density and structure
[194] I72.2 Aneurysm of renal artery
[195] I71.9 Aortic aneurysm of unspecified site, without mention of rupture
[196] I77.2 Rupture of artery
[197] K70.4 Alcoholic hepatic failure
[198] I73.9 Peripheral vascular disease, unspecified
[199] D69.0 Allergic purpura
[200] A49.8 Other bacterial infections of unspecified site
[201] I25.5 Ischaemic cardiomyopathy
[202] M86.9 Osteomyelitis, unspecified
[203] K81.9 Cholecystitis, unspecified
[204] I33.0 Acute and subacute infective endocarditis
[205] N19 Unspecified renal failure
[206] C18.7 Sigmoid colon
[207] K56.7 Ileus, unspecified
[208] K56.2 Volvulus
[209] Z51.8 Other specified medical care
[210] J93.9 Pneumothorax, unspecified
[211] K55.9 Vascular disorder of intestine, unspecified
[212] C49.9 Connective and soft tissue, unspecified
[213] M13.9 Arthritis, unspecified
[214] D51.0 Vitamin B12 deficiency anaemia due to intrinsic factor deficiency
[215] J98.9 Respiratory disorder, unspecified
[216] K59.0 Constipation
[217] K70.9 Alcoholic liver disease, unspecified
[218] C20 Malignant neoplasm of rectum
[219] I00 Rheumatic fever without mention of heart involvement
[220] Z95.2 Presence of prosthetic heart valve
[221] C18.1 Appendix
[222] Q27.9 Congenital malformation of peripheral vascular system, unspecified
[223] C18.0 Caecum
[224] K80.4 Calculus of bile duct with cholecystitis
[225] S22.4 Multiple fractures of ribs
[226] N40 Hyperplasia of prostate
[227] M19.9 Arthrosis, unspecified
[228] G91.2 Normal-pressure hydrocephalus
[229] I70.1 Atherosclerosis of renal artery
[230] C78.0 Secondary malignant neoplasm of lung
[231] G45.9 Transient cerebral ischaemic attack, unspecified
[232] M81.9 Osteoporosis, unspecified
[233] M06.9 Rheumatoid arthritis, unspecified
[234] Y83.5 Amputation of limb(s)
[235] T87.4 Infection of amputation stump
[236] M60.0 Infective myositis
[237] I35.0 Aortic (valve) stenosis
[238] E27.4 Other and unspecified adrenocortical insufficiency
[239] K75.0 Abscess of liver
[240] C93.1 Chronic monocytic leukaemia
[241] I71.1 Thoracic aortic aneurysm, ruptured
[242] I31.2 Haemopericardium, not elsewhere classified
[243] M41.9 Scoliosis, unspecified
[244] Z96.6 Presence of orthopaedic joint implants
[245] A49.0 Staphylococcal infection, unspecified
[246] R99 Other ill-defined and unspecified causes of mortality
[247] R00.1 Bradycardia, unspecified
[248] M25.9 Joint disorder, unspecified
[249] C37 Malignant neoplasm of thymus
[250] J86.0 Pyothorax with fistula
[251] C34.1 Upper lobe, bronchus or lung
[252] C56 Malignant neoplasm of ovary
[253] F31.9 Bipolar affective disorder, unspecified
[254] C24.1 Ampulla of Vater
[255] E63.9 Nutritional deficiency, unspecified
[256] W80 Inhalation and ingestion of other objects causing obstruction of respiratory tract
[257] T17.9 Foreign body in respiratory tract, part unspecified
[258] C16.0 Cardia
[259] C81.9 Hodgkin's disease, unspecified
[260] D35.2 Pituitary gland
[261] G12.2 Motor neuron disease
[262] I12.0 Hypertensive renal disease with renal failure
[263] Y84.1 Kidney dialysis
[264] B37.8 Candidiasis of other sites
[265] T85.7 Infection and inflammatory reaction due to other internal prosthetic devices, implants and grafts
[266] D70 Agranulocytosis
[267] I60.8 Other subarachnoid haemorrhage
[268] I61.9 Intracerebral haemorrhage, unspecified
[269] N28.0 Ischaemia and infarction of kidney
[270] C22.0 Liver cell carcinoma
[271] D37.2 Small intestine
[272] D59.3 Haemolytic-uraemic syndrome
[273] J98.0 Diseases of bronchus, not elsewhere classified
[274] R04.0 Epistaxis
[275] C78.6 Secondary malignant neoplasm of retroperitoneum and peritoneum
[276] C82.9 Follicular non-Hodgkin's lymphoma, unspecified
[277] C02.9 Tongue, unspecified
[278] C55 Malignant neoplasm of uterus, part unspecified
[279] D80.1 Nonfamilial hypogammaglobulinaemia
[280] J47 Bronchiectasis
[281] G70.0 Myasthenia gravis
[282] Z90.8 Acquired absence of other organs
[283] G35 Multiple sclerosis
[284] B34.2 Coronavirus infection, unspecified
[285] M32.9 Systemic lupus erythematosus, unspecified
[286] I27.9 Pulmonary heart disease, unspecified
[287] C83.0 Small cell (diffuse)
[288] Y83.4 Other reconstructive surgery
[289] Y44.2 Anticoagulants
[290] K56.5 Intestinal adhesions [bands] with obstruction
[291] G61.0 Guillain-Barre syndrome
[292] I69.1 Sequelae of intracerebral haemorrhage
[293] G83.9 Paralytic syndrome, unspecified
[294] Y83.1 Surgical operation with implant of artificial internal device
[295] I05.9 Mitral valve disease, unspecified
[296] M00.9 Pyogenic arthritis, unspecified
[297] C23 Malignant neoplasm of gallbladder
[298] I42.0 Dilated cardiomyopathy
[299] J60 Coalworker's pneumoconiosis
[300] F20.9 Schizophrenia, unspecified
[301] A49.1 Streptococcal infection, unspecified
[302] I71.0 Dissection of aorta [any part]
[303] A04.7 Enterocolitis due to Clostridium difficile
[304] I24.9 Acute ischaemic heart disease, unspecified
[305] I13.2 Hypertensive heart and renal disease with both (congestive) heart failure and renal failure
[306] I13.9 Hypertensive heart and renal disease, unspecified
[307] C45.9 Mesothelioma, unspecified
[308] R09.2 Respiratory arrest
[309] M35.9 Systemic involvement of connective tissue, unspecified
[310] C76.2 Abdomen
[311] R57.2 Septic shock
[312] C84.4 Peripheral T-cell lymphoma
[313] C54.1 Endometrium
[314] I42.1 Obstructive hypertrophic cardiomyopathy
[315] D32.0 Cerebral meninges
[316] J18.2 Hypostatic pneumonia, unspecified
[317] G52.1 Disorders of glossopharyngeal nerve
[318] C44.9 Malignant neoplasm of skin, unspecified
[319] I44.2 Atrioventricular block, complete
[320] R02 Gangrene, not elsewhere classified
[321] C79.8 Secondary malignant neoplasm of other specified sites
[322] C10.9 Oropharynx, unspecified
[323] Z93.4 Other artificial openings of gastro-intestinal tract status
[324] Y83.6 Removal of other organ (partial) (total)
[325] R13 Dysphagia
[326] V89.2 Person injured in unspecified motor-vehicle accident, traffic
[327] K92.0 Haematemesis
[328] G10 Huntington's disease
[329] G61.8 Other inflammatory polyneuropathies
[330] A41.5 Septicaemia due to other Gram-negative organisms
[331] Y84.8 Other medical procedures
[332] T81.4 Infection following a procedure, not elsewhere classified
[333] X59.9 Unspecified place
[334] S14.1 Other and unspecified injuries of cervical spinal cord
[335] C95.9 Leukaemia, unspecified
[336] I60.9 Subarachnoid haemorrhage, unspecified
[337] L40.9 Psoriasis, unspecified
[338] G23.2 Striatonigral degeneration
[339] R19.8 Other specified symptoms and signs involving the digestive system and abdomen
[340] K66.9 Disorder of peritoneum, unspecified
[341] D40.0 Prostate
[342] C12 Malignant neoplasm of pyriform sinus
[343] E11.5 With peripheral circulatory complications
[344] K83.1 Obstruction of bile duct
[345] L40.5 Arthropathic psoriasis
[346] G40.9 Epilepsy, unspecified
[347] G30.0 Alzheimer's disease with early onset
[348] M45 Ankylosing spondylitis
[349] G60.0 Hereditary motor and sensory neuropathy
[350] K85.9 Acute pancreatitis, unspecified
[351] I40.9 Acute myocarditis, unspecified
[352] E11.7 With multiple complications
[353] K63.1 Perforation of intestine (nontraumatic)
[354] E14.2 Withrenal complications
[355] T86.4 Liver transplant failure and rejection
[356] Y83.0 Surgical operation with transplant of whole organ
[357] E87.2 Acidosis
[358] R41.8 Other and unspecified symptoms and signs involving cognitive functions and awareness
[359] C83.3 Large cell (diffuse)
[360] F43.9 Reaction to severe stress, unspecified
[361] C75.5 Aortic body and other paraganglia
[362] M80.9 Unspecified osteoporosis with pathological fracture
[363] K86.1 Other chronic pancreatitis
363 Levels: A04.7 Enterocolitis due to Clostridium difficile < ...
Here, I will consider that a death is related to COVID-19 if the participant tested positive for COVID-19 and died.
This approach is anostic to the reported cause of death of the participant, and considers that COVID-19 can cause deat via different means.
%%R
ukb_death = read.csv("/Users/yizhouyu/camDrive/ARUK_project/additional_datasets/covid_refresh_16_8_2020/death_cause_17_8_2020.txt", sep = "\t")[c('eid', 'level', 'cause_icd10')]
ukb_dt = read.csv('data_output/ukb_covid_dt_out.csv', sep = ",")
ukb_dt_death = merge(ukb_dt, ukb_death, by = "eid")
ukb_deaths_eids = unique(ukb_dt_death$eid)
ukb_dt$death = ukb_dt$eid %in% ukb_deaths_eids
print(sum(ukb_dt$death))
write.csv(ukb_dt,"data_output/ukb_covid_addDeath.csv", row.names=FALSE)
[1] 647
67% of deaths were associated with COVID-19
%%R
ukb_dt = read.csv("data_output/ukb_covid_addDeath.csv")
ukb_dt_died = subset(ukb_dt, death == TRUE)
print(sum(ukb_dt_died$result)/sum(ukb_dt$death))
[1] 0.4621329
### Mortality odds while acccounting for other factors including COVID-19
%%R
ukb_dt$peopleInHousehold = scale(ukb_dt$peopleInHousehold)
ukb_dt$age = scale(ukb_dt$age)
ukb_dt$whr = scale(ukb_dt$whr)
ukb_dt$pm25_val = scale(ukb_dt$pm25_val)
ukb_dt$no2_val = scale(ukb_dt$no2_val)
ukb_dt$nox_val = scale(ukb_dt$nox_val)
ukb_dt$so2_val = scale(ukb_dt$so2_val)
ukb_dt$pm10_val = scale(ukb_dt$pm10_val)
ukb_dt$o3_val = scale(ukb_dt$o3_val)
#### General model on death, all causes
%%R
ukb_dt_subset = subset(ukb_dt, select = c(result,death, Population, n_cancers,townsend, smoking, whistling, diabetes,
whr , sex , age ,peopleInHousehold,AverageHouseholdIncome,ethnicity_sim,highBP,
PD_diag, AD_diag , vascDem_diag,frontoDem_diag,
# greyVol , whiteVol , wmh_norm,
edu_score, crp_norm,log_crp,log_edu,
edu_level , house_type))
ukb_dt_subset_na = na.omit(ukb_dt_subset)
summary(ukb_covid_death.b <-glm(data = ukb_dt_subset_na, death ~ Population + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age +peopleInHousehold+AverageHouseholdIncome+ethnicity_sim+highBP+
PD_diag+ AD_diag + vascDem_diag+frontoDem_diag+
# greyVol + whiteVol + wmh_norm+
edu_score+ log_crp + log_edu + house_type, family = 'binomial'))
stargazer::stargazer(ukb_covid_death.b, type ="html",
single.row=TRUE,
out = "fig_out/ukb_covid_death_coefficients_fullmodel.html")
<table style="text-align:center"><tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr>
<tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr>
<tr><td style="text-align:left"></td><td>death</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Population</td><td>0.002 (0.002)</td></tr>
<tr><td style="text-align:left">n_cancers</td><td>-0.140 (0.147)</td></tr>
<tr><td style="text-align:left">townsend</td><td>0.005 (0.022)</td></tr>
<tr><td style="text-align:left">smokingNever</td><td>-0.518<sup>***</sup> (0.157)</td></tr>
<tr><td style="text-align:left">smokingPrevious</td><td>-0.505<sup>***</sup> (0.154)</td></tr>
<tr><td style="text-align:left">whistling</td><td>0.153 (0.116)</td></tr>
<tr><td style="text-align:left">diabetes</td><td>0.255<sup>*</sup> (0.153)</td></tr>
<tr><td style="text-align:left">whr</td><td>0.410 (0.797)</td></tr>
<tr><td style="text-align:left">sexMale</td><td>0.368<sup>***</sup> (0.142)</td></tr>
<tr><td style="text-align:left">age</td><td>0.075<sup>***</sup> (0.009)</td></tr>
<tr><td style="text-align:left">peopleInHousehold</td><td>0.036 (0.035)</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncome18,000 to 30,999</td><td>0.451 (1,674.884)</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncome31,000 to 51,999</td><td>0.534 (1,674.884)</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncome52,000 to 100,000</td><td>0.818 (1,674.884)</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncomeGreater than 100,000</td><td>-0.106 (1,674.884)</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncomeLess than 18,000</td><td>0.742 (1,674.884)</td></tr>
<tr><td style="text-align:left">ethnicity_simwhite</td><td>-0.114 (0.169)</td></tr>
<tr><td style="text-align:left">highBPFalse</td><td>0.089 (0.399)</td></tr>
<tr><td style="text-align:left">highBPTrue</td><td>0.349 (0.398)</td></tr>
<tr><td style="text-align:left">PD_diagTrue</td><td>-0.136 (0.469)</td></tr>
<tr><td style="text-align:left">AD_diagTrue</td><td>1.472<sup>***</sup> (0.366)</td></tr>
<tr><td style="text-align:left">vascDem_diag</td><td>0.476 (0.582)</td></tr>
<tr><td style="text-align:left">frontoDem_diag</td><td>18.008 (1,455.398)</td></tr>
<tr><td style="text-align:left">edu_score</td><td>0.003 (0.005)</td></tr>
<tr><td style="text-align:left">log_crp</td><td>0.155<sup>***</sup> (0.051)</td></tr>
<tr><td style="text-align:left">log_edu</td><td>-0.035 (0.075)</td></tr>
<tr><td style="text-align:left">house_typeA house or bungalow</td><td>-0.090 (0.182)</td></tr>
<tr><td style="text-align:left">house_typeCare home</td><td>-13.033 (1,455.398)</td></tr>
<tr><td style="text-align:left">house_typeMobile or temporary structure (i.e. caravan)</td><td>-12.774 (392.309)</td></tr>
<tr><td style="text-align:left">house_typeNone of the above</td><td>-12.505 (295.194)</td></tr>
<tr><td style="text-align:left">house_typeSheltered accommodation</td><td>-12.846 (828.888)</td></tr>
<tr><td style="text-align:left">Constant</td><td>-8.740 (1,674.884)</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>8,897</td></tr>
<tr><td style="text-align:left">Log Likelihood</td><td>-1,545.468</td></tr>
<tr><td style="text-align:left">Akaike Inf. Crit.</td><td>3,154.936</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
</table>
Calculate odds ratios
%%R
ukb_covid_death.b_odds =
data.frame(cbind(exp(cbind(OR = coef(ukb_covid_death.b),
conf_int_low = summary(ukb_covid_death.b)$coefficients[,1] -
summary(ukb_covid_death.b)$coefficients[,2],
conf_int_high = summary(ukb_covid_death.b)$coefficients[,1] +
summary(ukb_covid_death.b)$coefficients[,2])),
p_value = summary(ukb_covid_death.b)$coefficients[,4]))
stargazer::stargazer(ukb_covid_death.b_odds, type ="html",
single.row=TRUE, summary = FALSE,
out = "fig_out/ukb_covid_death_odds_fullmodel.html")
write.csv(ukb_covid_death.b_odds,"data_output/ukb_covid_allDeath_odds.csv", row.names=TRUE)
ukb_covid_death.b_odds$significance = "p-value > 0.05"
ukb_covid_death.b_odds$significance[ukb_covid_death.b_odds$p_value < 0.05] <- "p-value < 0.05"
ukb_covid_death.b_odds$variables = row.names(ukb_covid_death.b_odds)
library(ggplot2)
ggplot(ukb_covid_death.b_odds[which(ukb_covid_death.b_odds$significance == "p-value < 0.05"),], aes(x=reorder(variables, OR), y=OR)) +
geom_point(shape=21, size = 2) +
geom_errorbar(aes(ymin=conf_int_low, ymax=conf_int_high),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
theme_classic() +
geom_hline(yintercept = 1, linetype="dotted") +
coord_flip()+ylab("Mortality odds ratios") +
xlab("Variables")
ggsave('fig_out/UKB_deathOR_plot_sig_only.pdf', width = 6, height = 4)
ggsave('fig_out/UKB_deathOR_plot_sig_only.png', width = 6, height = 4)
<table style="text-align:center"><tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td>OR</td><td>conf_int_low</td><td>conf_int_high</td><td>p_value</td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">(Intercept)</td><td>0.0002</td><td>0</td><td>Inf</td><td>0.996</td></tr>
<tr><td style="text-align:left">Population</td><td>1.002</td><td>1.000</td><td>1.004</td><td>0.305</td></tr>
<tr><td style="text-align:left">n_cancers</td><td>0.870</td><td>0.751</td><td>1.007</td><td>0.342</td></tr>
<tr><td style="text-align:left">townsend</td><td>1.005</td><td>0.983</td><td>1.027</td><td>0.837</td></tr>
<tr><td style="text-align:left">smokingNever</td><td>0.596</td><td>0.509</td><td>0.697</td><td>0.001</td></tr>
<tr><td style="text-align:left">smokingPrevious</td><td>0.604</td><td>0.518</td><td>0.704</td><td>0.001</td></tr>
<tr><td style="text-align:left">whistlingTRUE</td><td>1.165</td><td>1.038</td><td>1.308</td><td>0.186</td></tr>
<tr><td style="text-align:left">diabetesTRUE</td><td>1.291</td><td>1.108</td><td>1.504</td><td>0.095</td></tr>
<tr><td style="text-align:left">whr</td><td>1.507</td><td>0.679</td><td>3.344</td><td>0.607</td></tr>
<tr><td style="text-align:left">sexMale</td><td>1.445</td><td>1.254</td><td>1.666</td><td>0.010</td></tr>
<tr><td style="text-align:left">age</td><td>1.078</td><td>1.068</td><td>1.087</td><td>0</td></tr>
<tr><td style="text-align:left">peopleInHousehold</td><td>1.036</td><td>1.001</td><td>1.073</td><td>0.303</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncome18,000 to 30,999</td><td>1.570</td><td>0</td><td>Inf</td><td>1.000</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncome31,000 to 51,999</td><td>1.705</td><td>0</td><td>Inf</td><td>1.000</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncome52,000 to 100,000</td><td>2.266</td><td>0</td><td>Inf</td><td>1.000</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncomeGreater than 100,000</td><td>0.900</td><td>0</td><td>Inf</td><td>1.000</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncomeLess than 18,000</td><td>2.101</td><td>0</td><td>Inf</td><td>1.000</td></tr>
<tr><td style="text-align:left">ethnicity_simwhite</td><td>0.892</td><td>0.753</td><td>1.057</td><td>0.500</td></tr>
<tr><td style="text-align:left">highBPFalse</td><td>1.094</td><td>0.734</td><td>1.630</td><td>0.823</td></tr>
<tr><td style="text-align:left">highBPTrue</td><td>1.417</td><td>0.953</td><td>2.109</td><td>0.380</td></tr>
<tr><td style="text-align:left">PD_diagTrue</td><td>0.873</td><td>0.546</td><td>1.395</td><td>0.771</td></tr>
<tr><td style="text-align:left">AD_diagTrue</td><td>4.360</td><td>3.022</td><td>6.289</td><td>0.0001</td></tr>
<tr><td style="text-align:left">vascDem_diagTRUE</td><td>1.610</td><td>0.900</td><td>2.882</td><td>0.413</td></tr>
<tr><td style="text-align:left">frontoDem_diagTRUE</td><td>66,155,689.000</td><td>0</td><td>Inf</td><td>0.990</td></tr>
<tr><td style="text-align:left">edu_score</td><td>1.003</td><td>0.997</td><td>1.009</td><td>0.588</td></tr>
<tr><td style="text-align:left">log_crp</td><td>1.167</td><td>1.109</td><td>1.228</td><td>0.002</td></tr>
<tr><td style="text-align:left">log_edu</td><td>0.965</td><td>0.895</td><td>1.041</td><td>0.641</td></tr>
<tr><td style="text-align:left">house_typeA house or bungalow</td><td>0.914</td><td>0.763</td><td>1.096</td><td>0.622</td></tr>
<tr><td style="text-align:left">house_typeCare home</td><td>0.00000</td><td>0</td><td>Inf</td><td>0.993</td></tr>
<tr><td style="text-align:left">house_typeMobile or temporary structure (i.e. caravan)</td><td>0.00000</td><td>0</td><td>6</td><td>0.974</td></tr>
<tr><td style="text-align:left">house_typeNone of the above</td><td>0.00000</td><td>0</td><td>5</td><td>0.966</td></tr>
<tr><td style="text-align:left">house_typeSheltered accommodation</td><td>0.00000</td><td>0</td><td>Inf</td><td>0.988</td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr></table>
This shows that COVID-19 is a significant predictor of the risk of dying.
%%R
ukb_dt = read.csv("data_output/ukb_covid_addDeath.csv")
ukb_dt$covid_death = ifelse(ukb_dt$result == 1 & ukb_dt$death == TRUE,
1, 0)
print(sum(ukb_dt$covid_death))
write.csv(ukb_dt,"data_output/ukb_covid_addDeath.csv",row.names=FALSE)
[1] 299
There are here 299 participants that died and tested positive for COVID-19
#### Dementia-only model - COVID death
%%R
ukb_dt = read.csv("data_output/ukb_covid_addDeath.csv")
ukb_dt_dementia = subset(ukb_dt, select = c(result,covid_death,death, Population, n_cancers,townsend, smoking, whistling, diabetes,
whr , sex , age ,peopleInHousehold,AverageHouseholdIncome,ethnicity_sim,highBP,
PD_diag, AD_diag , vascDem_diag,frontoDem_diag,allDem_diag,
# greyVol , whiteVol , wmh_norm,
edu_score, crp_norm,log_crp,log_edu,
edu_level , house_type))
result covid_death death Population n_cancers townsend smoking whistling
1 0 0 FALSE 3.541627 0 -5.87237000 Previous FALSE
2 0 0 FALSE 2.932501 0 2.91871000 Previous FALSE
3 0 0 FALSE 10.947637 0 3.43245000 Never FALSE
4 0 0 TRUE 6.636715 0 -2.30303000 Previous FALSE
5 0 0 FALSE 11.849605 0 -0.00099431 Previous FALSE
6 0 0 FALSE 3.617117 0 -3.92656000 Never FALSE
diabetes whr sex age peopleInHousehold AverageHouseholdIncome
1 FALSE 0.8514851 Male 70 2 <NA>
2 FALSE 0.9090909 Female 72 2 18,000 to 30,999
3 FALSE 0.8942308 Female 53 1 <NA>
4 FALSE 0.9626168 Male 81 2 18,000 to 30,999
5 FALSE 0.9207921 Female 73 1 18,000 to 30,999
6 FALSE 1.0256410 Male 75 NA <NA>
ethnicity_sim highBP PD_diag AD_diag vascDem_diag frontoDem_diag allDem_diag
1 white False False False FALSE FALSE FALSE
2 white True False False FALSE FALSE FALSE
3 minority False False False FALSE FALSE FALSE
4 white True False False FALSE FALSE FALSE
5 white False False False FALSE FALSE FALSE
6 white True False False FALSE FALSE FALSE
edu_score crp_norm log_crp log_edu
1 3.57 0.127777778 -2.057463 1.2725656
2 23.23 0.060053619 -2.812517 3.1454445
3 17.37 0.003726548 -5.592273 2.8547446
4 25.85 0.022968705 -3.773623 3.2523106
5 2.96 0.036278039 -3.316543 1.0851893
6 1.45 0.030045435 -3.505045 0.3715636
edu_level
1 NVQ or HND or HNC or equivalent
2 None of the above
3 Other professional qualifications eg: nursing, teaching
4 College or University degree
5 Other professional qualifications eg: nursing, teaching
6 <NA>
house_type
1 A house or bungalow
2 A house or bungalow
3 A flat, maisonette or apartment
4 A house or bungalow
5 A house or bungalow
6 A house or bungalow
%%R
# I took out AverageHouseholdIncome and house_type because they are not significant, and their confidence intervals
# are so large that I cannot plot it
summary(ukb_covid_and_death.b <-glm(data = ukb_dt_dementia, covid_death ~ Population + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age +peopleInHousehold+ethnicity_sim+highBP+
allDem_diag+
# greyVol + whiteVol + wmh_norm+
log_crp, family = 'binomial'))
stargazer::stargazer(ukb_covid_and_death.b, type ="html",
single.row=TRUE,
out = "fig_out/ukb_deathAndCOVID_coefficients_dementia_model.html")
ukb_covid_death_covid.b_odds =
data.frame(cbind(exp(cbind(OR = coef(ukb_covid_and_death.b),
conf_int_low = summary(ukb_covid_and_death.b)$coefficients[,1] -
summary(ukb_covid_and_death.b)$coefficients[,2],
conf_int_high = summary(ukb_covid_and_death.b)$coefficients[,1] +
summary(ukb_covid_and_death.b)$coefficients[,2])),
p_value = summary(ukb_covid_and_death.b)$coefficients[,4]))
write.csv(ukb_covid_death_covid.b_odds,"data_output/ukb_deathAndCOVID_odds_dementia.csv", row.names=TRUE)
ukb_covid_death_covid.b_odds$significance = "p-value > 0.05"
ukb_covid_death_covid.b_odds$significance[ukb_covid_death_covid.b_odds$p_value < 0.05] <- "p-value < 0.05"
ukb_covid_death_covid.b_odds$variables = row.names(ukb_covid_death_covid.b_odds)
library(ggplot2)
ggplot(ukb_covid_death_covid.b_odds,
aes(x=reorder(variables, OR), y=OR, color = significance)) +
geom_point(shape=21, size = 2) +
geom_errorbar(aes(ymin=conf_int_low, ymax=conf_int_high),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
theme_classic() +
geom_hline(yintercept = 1, linetype="dotted") +
coord_flip()+ylab("Mortality odds ratios, death and COVID-19") +
xlab("")
ggsave('fig_out/UKB_deathAndCOVID_ORplot_dementiaModel.png', width = 6, height = 4)
<table style="text-align:center"><tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr>
<tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr>
<tr><td style="text-align:left"></td><td>covid_death</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Population</td><td>0.002 (0.003)</td></tr>
<tr><td style="text-align:left">n_cancers</td><td>-0.564<sup>**</sup> (0.236)</td></tr>
<tr><td style="text-align:left">townsend</td><td>0.072<sup>***</sup> (0.020)</td></tr>
<tr><td style="text-align:left">smokingNever</td><td>-0.158 (0.221)</td></tr>
<tr><td style="text-align:left">smokingPrevious</td><td>-0.085 (0.214)</td></tr>
<tr><td style="text-align:left">whistling</td><td>0.058 (0.152)</td></tr>
<tr><td style="text-align:left">diabetes</td><td>0.067 (0.197)</td></tr>
<tr><td style="text-align:left">whr</td><td>1.306 (1.023)</td></tr>
<tr><td style="text-align:left">sexMale</td><td>0.367<sup>**</sup> (0.184)</td></tr>
<tr><td style="text-align:left">age</td><td>0.090<sup>***</sup> (0.012)</td></tr>
<tr><td style="text-align:left">peopleInHousehold</td><td>0.012 (0.050)</td></tr>
<tr><td style="text-align:left">ethnicity_simwhite</td><td>-0.333<sup>*</sup> (0.197)</td></tr>
<tr><td style="text-align:left">highBPFalse</td><td>-0.260 (0.435)</td></tr>
<tr><td style="text-align:left">highBPTrue</td><td>0.045 (0.431)</td></tr>
<tr><td style="text-align:left">allDem_diag</td><td>1.465<sup>***</sup> (0.260)</td></tr>
<tr><td style="text-align:left">log_crp</td><td>0.102 (0.067)</td></tr>
<tr><td style="text-align:left">Constant</td><td>-10.892<sup>***</sup> (1.350)</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>10,821</td></tr>
<tr><td style="text-align:left">Log Likelihood</td><td>-1,035.227</td></tr>
<tr><td style="text-align:left">Akaike Inf. Crit.</td><td>2,104.454</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
</table>
#### Dementia subtypes
%%R
ukb_dt_subset = subset(ukb_dt, select = c(result,covid_death,death, Population, n_cancers,townsend, smoking, whistling, diabetes,
whr , sex , age ,peopleInHousehold,AverageHouseholdIncome,ethnicity_sim,highBP,
PD_diag, AD_diag , vascDem_diag,frontoDem_diag,
# greyVol , whiteVol , wmh_norm,
edu_score, crp_norm,log_crp,log_edu,
edu_level , house_type,
pm25_val,no2_val,nox_val,so2_val,pm10_val,o3_val))
ukb_dt_subset_na = na.omit(ukb_dt_subset)
summary(ukb_covid_and_death.b <-glm(data = ukb_dt_subset_na, covid_death ~ Population + n_cancers +townsend + smoking + whistling + diabetes+
whr + sex + age +peopleInHousehold+AverageHouseholdIncome+ethnicity_sim+highBP+
PD_diag+ AD_diag + vascDem_diag+frontoDem_diag+
# greyVol + whiteVol + wmh_norm+
edu_score+ log_crp+ house_type, family = 'binomial'))
stargazer::stargazer(ukb_covid_and_death.b, type ="html",
single.row=TRUE,
out = "fig_out/ukb_covid_deathAndCOVID_coefficients_fullmodel.html")
ukb_covid_death_covid.b_odds =
data.frame(cbind(exp(cbind(OR = coef(ukb_covid_and_death.b),
conf_int_low = summary(ukb_covid_and_death.b)$coefficients[,1] -
summary(ukb_covid_and_death.b)$coefficients[,2],
conf_int_high = summary(ukb_covid_and_death.b)$coefficients[,1] +
summary(ukb_covid_and_death.b)$coefficients[,2])),
p_value = summary(ukb_covid_and_death.b)$coefficients[,4]))
stargazer::stargazer(ukb_covid_death_covid.b_odds, type ="html",
single.row=TRUE, summary = FALSE,
out = "fig_out/ukb_covid_deathAndCOVID_odds_fullmodel.html")
write.csv(ukb_covid_death.b_odds,"data_output/ukb_covid_deathAndCOVID_odds.csv", row.names=TRUE)
ukb_covid_death_covid.b_odds$significance = "p-value > 0.05"
ukb_covid_death_covid.b_odds$significance[ukb_covid_death_covid.b_odds$p_value < 0.05] <- "p-value < 0.05"
ukb_covid_death_covid.b_odds$variables = row.names(ukb_covid_death_covid.b_odds)
library(ggplot2)
ggplot(ukb_covid_death_covid.b_odds[which(ukb_covid_death_covid.b_odds$significance == "p-value < 0.05"),],
aes(x=reorder(variables, OR), y=OR)) +
geom_point(shape=21, size = 2) +
geom_errorbar(aes(ymin=conf_int_low, ymax=conf_int_high),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
theme_classic() +
geom_hline(yintercept = 1, linetype="dotted") +
coord_flip()+ylab("Mortality odds ratios, death and COVID-19") +
xlab("Variables")
ggsave('fig_out/UKB_deathAndCOVID_ORplot_sig_only.pdf', width = 6, height = 4)
ggsave('fig_out/UKB_deathAndCOVID_ORplot_sig_only.png', width = 6, height = 4)
<table style="text-align:center"><tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr>
<tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr>
<tr><td style="text-align:left"></td><td>covid_death</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Population</td><td>0.002 (0.003)</td></tr>
<tr><td style="text-align:left">n_cancers</td><td>-0.381 (0.244)</td></tr>
<tr><td style="text-align:left">townsend</td><td>0.068<sup>**</sup> (0.031)</td></tr>
<tr><td style="text-align:left">smokingNever</td><td>-0.124 (0.254)</td></tr>
<tr><td style="text-align:left">smokingPrevious</td><td>0.009 (0.244)</td></tr>
<tr><td style="text-align:left">whistling</td><td>0.086 (0.172)</td></tr>
<tr><td style="text-align:left">diabetes</td><td>0.197 (0.214)</td></tr>
<tr><td style="text-align:left">whr</td><td>1.853 (1.170)</td></tr>
<tr><td style="text-align:left">sexMale</td><td>0.399<sup>*</sup> (0.213)</td></tr>
<tr><td style="text-align:left">age</td><td>0.088<sup>***</sup> (0.014)</td></tr>
<tr><td style="text-align:left">peopleInHousehold</td><td>0.033 (0.054)</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncome18,000 to 30,999</td><td>0.282 (2,758.201)</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncome31,000 to 51,999</td><td>0.770 (2,758.201)</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncome52,000 to 100,000</td><td>0.705 (2,758.201)</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncomeGreater than 100,000</td><td>-0.307 (2,758.201)</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncomeLess than 18,000</td><td>0.754 (2,758.201)</td></tr>
<tr><td style="text-align:left">ethnicity_simwhite</td><td>-0.357 (0.224)</td></tr>
<tr><td style="text-align:left">highBPFalse</td><td>-0.169 (0.527)</td></tr>
<tr><td style="text-align:left">highBPTrue</td><td>0.062 (0.525)</td></tr>
<tr><td style="text-align:left">PD_diagTrue</td><td>0.409 (0.537)</td></tr>
<tr><td style="text-align:left">AD_diagTrue</td><td>1.738<sup>***</sup> (0.442)</td></tr>
<tr><td style="text-align:left">vascDem_diag</td><td>0.832 (0.670)</td></tr>
<tr><td style="text-align:left">frontoDem_diag</td><td>19.358 (2,399.545)</td></tr>
<tr><td style="text-align:left">edu_score</td><td>-0.003 (0.005)</td></tr>
<tr><td style="text-align:left">log_crp</td><td>0.102 (0.077)</td></tr>
<tr><td style="text-align:left">house_typeA house or bungalow</td><td>-0.369 (0.240)</td></tr>
<tr><td style="text-align:left">house_typeCare home</td><td>-14.119 (2,399.545)</td></tr>
<tr><td style="text-align:left">house_typeMobile or temporary structure (i.e. caravan)</td><td>-12.958 (643.231)</td></tr>
<tr><td style="text-align:left">house_typeNone of the above</td><td>-12.831 (486.051)</td></tr>
<tr><td style="text-align:left">house_typeSheltered accommodation</td><td>-13.383 (1,360.094)</td></tr>
<tr><td style="text-align:left">Constant</td><td>-11.670 (2,758.201)</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>8,897</td></tr>
<tr><td style="text-align:left">Log Likelihood</td><td>-804.906</td></tr>
<tr><td style="text-align:left">Akaike Inf. Crit.</td><td>1,671.811</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
</table>
<table style="text-align:center"><tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td>OR</td><td>conf_int_low</td><td>conf_int_high</td><td>p_value</td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">(Intercept)</td><td>0.00001</td><td>0</td><td>Inf</td><td>0.997</td></tr>
<tr><td style="text-align:left">Population</td><td>1.002</td><td>0.999</td><td>1.005</td><td>0.561</td></tr>
<tr><td style="text-align:left">n_cancers</td><td>0.683</td><td>0.535</td><td>0.872</td><td>0.118</td></tr>
<tr><td style="text-align:left">townsend</td><td>1.070</td><td>1.038</td><td>1.104</td><td>0.027</td></tr>
<tr><td style="text-align:left">smokingNever</td><td>0.883</td><td>0.685</td><td>1.139</td><td>0.626</td></tr>
<tr><td style="text-align:left">smokingPrevious</td><td>1.009</td><td>0.791</td><td>1.289</td><td>0.969</td></tr>
<tr><td style="text-align:left">whistlingTRUE</td><td>1.089</td><td>0.918</td><td>1.293</td><td>0.618</td></tr>
<tr><td style="text-align:left">diabetesTRUE</td><td>1.218</td><td>0.984</td><td>1.508</td><td>0.356</td></tr>
<tr><td style="text-align:left">whr</td><td>6.379</td><td>1.980</td><td>20.556</td><td>0.113</td></tr>
<tr><td style="text-align:left">sexMale</td><td>1.491</td><td>1.205</td><td>1.844</td><td>0.060</td></tr>
<tr><td style="text-align:left">age</td><td>1.092</td><td>1.077</td><td>1.107</td><td>0</td></tr>
<tr><td style="text-align:left">peopleInHousehold</td><td>1.034</td><td>0.980</td><td>1.091</td><td>0.536</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncome18,000 to 30,999</td><td>1.326</td><td>0</td><td>Inf</td><td>1.000</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncome31,000 to 51,999</td><td>2.160</td><td>0</td><td>Inf</td><td>1.000</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncome52,000 to 100,000</td><td>2.024</td><td>0</td><td>Inf</td><td>1.000</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncomeGreater than 100,000</td><td>0.736</td><td>0</td><td>Inf</td><td>1.000</td></tr>
<tr><td style="text-align:left">AverageHouseholdIncomeLess than 18,000</td><td>2.126</td><td>0</td><td>Inf</td><td>1.000</td></tr>
<tr><td style="text-align:left">ethnicity_simwhite</td><td>0.700</td><td>0.559</td><td>0.876</td><td>0.111</td></tr>
<tr><td style="text-align:left">highBPFalse</td><td>0.845</td><td>0.499</td><td>1.432</td><td>0.749</td></tr>
<tr><td style="text-align:left">highBPTrue</td><td>1.064</td><td>0.630</td><td>1.798</td><td>0.905</td></tr>
<tr><td style="text-align:left">PD_diagTrue</td><td>1.505</td><td>0.880</td><td>2.575</td><td>0.446</td></tr>
<tr><td style="text-align:left">AD_diagTrue</td><td>5.684</td><td>3.655</td><td>8.840</td><td>0.0001</td></tr>
<tr><td style="text-align:left">vascDem_diagTRUE</td><td>2.299</td><td>1.176</td><td>4.492</td><td>0.214</td></tr>
<tr><td style="text-align:left">frontoDem_diagTRUE</td><td>255,357,389.000</td><td>0</td><td>Inf</td><td>0.994</td></tr>
<tr><td style="text-align:left">edu_score</td><td>0.997</td><td>0.992</td><td>1.002</td><td>0.591</td></tr>
<tr><td style="text-align:left">log_crp</td><td>1.107</td><td>1.026</td><td>1.196</td><td>0.182</td></tr>
<tr><td style="text-align:left">house_typeA house or bungalow</td><td>0.691</td><td>0.544</td><td>0.879</td><td>0.124</td></tr>
<tr><td style="text-align:left">house_typeCare home</td><td>0.00000</td><td>0</td><td>Inf</td><td>0.995</td></tr>
<tr><td style="text-align:left">house_typeMobile or temporary structure (i.e. caravan)</td><td>0.00000</td><td>0</td><td>5</td><td>0.984</td></tr>
<tr><td style="text-align:left">house_typeNone of the above</td><td>0.00000</td><td>0</td><td>3</td><td>0.979</td></tr>
<tr><td style="text-align:left">house_typeSheltered accommodation</td><td>0.00000</td><td>0</td><td>Inf</td><td>0.992</td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr></table>
Note that stepAIC must not have any NAs
%%R
library(MASS)
ukb_covid_and_death.b_step <- stepAIC(ukb_covid_and_death.b, direction = "both")
summary(ukb_covid_and_death.b_step)
Start: AIC=1671.81
covid_death ~ Population + n_cancers + townsend + smoking + whistling +
diabetes + whr + sex + age + peopleInHousehold + AverageHouseholdIncome +
ethnicity_sim + highBP + PD_diag + AD_diag + vascDem_diag +
frontoDem_diag + edu_score + log_crp + log_edu + +house_type
Df Deviance AIC
- house_type 5 1611.7 1665.7
- smoking 2 1608.5 1668.5
- highBP 2 1610.0 1670.0
- whistling 1 1608.0 1670.0
- Population 1 1608.1 1670.1
- peopleInHousehold 1 1608.1 1670.1
- edu_score 1 1608.3 1670.3
- PD_diag 1 1608.4 1670.4
- diabetes 1 1608.7 1670.7
- AverageHouseholdIncome 5 1617.1 1671.1
- vascDem_diag 1 1609.1 1671.1
- log_crp 1 1609.7 1671.7
- log_edu 1 1609.8 1671.8
<none> 1607.8 1671.8
- ethnicity_sim 1 1610.3 1672.3
- whr 1 1610.4 1672.4
- n_cancers 1 1610.7 1672.7
- sex 1 1611.3 1673.3
- townsend 1 1612.8 1674.8
- frontoDem_diag 1 1613.2 1675.2
- AD_diag 1 1619.2 1681.2
- age 1 1653.4 1715.4
Step: AIC=1665.65
covid_death ~ Population + n_cancers + townsend + smoking + whistling +
diabetes + whr + sex + age + peopleInHousehold + AverageHouseholdIncome +
ethnicity_sim + highBP + PD_diag + AD_diag + vascDem_diag +
frontoDem_diag + edu_score + log_crp + log_edu
Df Deviance AIC
- smoking 2 1612.4 1662.4
- peopleInHousehold 1 1611.8 1663.8
- highBP 2 1613.8 1663.8
- edu_score 1 1611.8 1663.8
- whistling 1 1611.9 1663.9
- Population 1 1612.0 1664.0
- PD_diag 1 1612.2 1664.2
- diabetes 1 1612.6 1664.6
- vascDem_diag 1 1613.0 1665.0
- log_crp 1 1613.5 1665.5
<none> 1611.7 1665.7
- AverageHouseholdIncome 5 1621.7 1665.7
- log_edu 1 1613.8 1665.8
- ethnicity_sim 1 1614.3 1666.3
- whr 1 1614.4 1666.4
- n_cancers 1 1614.5 1666.5
- sex 1 1615.4 1667.4
- frontoDem_diag 1 1616.9 1668.9
+ house_type 5 1607.8 1671.8
- townsend 1 1622.6 1674.6
- AD_diag 1 1622.8 1674.8
- age 1 1655.7 1707.7
Step: AIC=1662.38
covid_death ~ Population + n_cancers + townsend + whistling +
diabetes + whr + sex + age + peopleInHousehold + AverageHouseholdIncome +
ethnicity_sim + highBP + PD_diag + AD_diag + vascDem_diag +
frontoDem_diag + edu_score + log_crp + log_edu
Df Deviance AIC
- peopleInHousehold 1 1612.5 1660.5
- highBP 2 1614.5 1660.5
- edu_score 1 1612.6 1660.6
- Population 1 1612.7 1660.7
- whistling 1 1612.8 1660.8
- PD_diag 1 1612.8 1660.8
- diabetes 1 1613.3 1661.3
- vascDem_diag 1 1613.7 1661.7
<none> 1612.4 1662.4
- log_crp 1 1614.4 1662.4
- AverageHouseholdIncome 5 1622.5 1662.5
- log_edu 1 1614.5 1662.5
- ethnicity_sim 1 1615.0 1663.0
- n_cancers 1 1615.2 1663.2
- whr 1 1615.4 1663.4
- sex 1 1616.5 1664.5
+ smoking 2 1611.7 1665.7
- frontoDem_diag 1 1617.8 1665.8
+ house_type 5 1608.5 1668.5
- AD_diag 1 1623.5 1671.5
- townsend 1 1623.6 1671.6
- age 1 1658.2 1706.2
Step: AIC=1660.5
covid_death ~ Population + n_cancers + townsend + whistling +
diabetes + whr + sex + age + AverageHouseholdIncome + ethnicity_sim +
highBP + PD_diag + AD_diag + vascDem_diag + frontoDem_diag +
edu_score + log_crp + log_edu
Df Deviance AIC
- highBP 2 1614.6 1658.6
- edu_score 1 1612.7 1658.7
- Population 1 1612.8 1658.8
- whistling 1 1612.9 1658.9
- PD_diag 1 1613.0 1659.0
- diabetes 1 1613.4 1659.4
- vascDem_diag 1 1613.8 1659.8
- AverageHouseholdIncome 5 1622.5 1660.5
<none> 1612.5 1660.5
- log_crp 1 1614.5 1660.5
- log_edu 1 1614.7 1660.7
- ethnicity_sim 1 1615.2 1661.2
- n_cancers 1 1615.3 1661.3
- whr 1 1615.5 1661.5
+ peopleInHousehold 1 1612.4 1662.4
- sex 1 1616.6 1662.6
+ smoking 2 1611.8 1663.8
- frontoDem_diag 1 1617.9 1663.9
+ house_type 5 1608.8 1666.8
- townsend 1 1623.6 1669.6
- AD_diag 1 1623.6 1669.6
- age 1 1659.2 1705.2
Step: AIC=1658.59
covid_death ~ Population + n_cancers + townsend + whistling +
diabetes + whr + sex + age + AverageHouseholdIncome + ethnicity_sim +
PD_diag + AD_diag + vascDem_diag + frontoDem_diag + edu_score +
log_crp + log_edu
Df Deviance AIC
- edu_score 1 1614.8 1656.8
- Population 1 1614.9 1656.9
- whistling 1 1615.0 1657.0
- PD_diag 1 1615.0 1657.0
- diabetes 1 1615.4 1657.4
- vascDem_diag 1 1616.0 1658.0
<none> 1614.6 1658.6
- log_edu 1 1616.7 1658.7
- log_crp 1 1616.7 1658.7
- AverageHouseholdIncome 5 1624.9 1658.9
- ethnicity_sim 1 1617.4 1659.4
- n_cancers 1 1617.4 1659.4
- whr 1 1617.8 1659.8
+ highBP 2 1612.5 1660.5
+ peopleInHousehold 1 1614.5 1660.5
- sex 1 1618.9 1660.9
+ smoking 2 1613.9 1661.9
- frontoDem_diag 1 1620.2 1662.2
+ house_type 5 1610.9 1664.9
- townsend 1 1625.2 1667.2
- AD_diag 1 1625.9 1667.9
- age 1 1664.5 1706.5
Step: AIC=1656.83
covid_death ~ Population + n_cancers + townsend + whistling +
diabetes + whr + sex + age + AverageHouseholdIncome + ethnicity_sim +
PD_diag + AD_diag + vascDem_diag + frontoDem_diag + log_crp +
log_edu
Df Deviance AIC
- Population 1 1615.1 1655.1
- whistling 1 1615.2 1655.2
- PD_diag 1 1615.3 1655.3
- diabetes 1 1615.6 1655.6
- vascDem_diag 1 1616.2 1656.2
<none> 1614.8 1656.8
- log_crp 1 1617.0 1657.0
- AverageHouseholdIncome 5 1625.2 1657.2
- ethnicity_sim 1 1617.5 1657.5
- log_edu 1 1617.6 1657.6
- n_cancers 1 1617.6 1657.6
- whr 1 1618.0 1658.0
+ edu_score 1 1614.6 1658.6
+ highBP 2 1612.7 1658.7
+ peopleInHousehold 1 1614.7 1658.7
- sex 1 1619.2 1659.2
+ smoking 2 1614.1 1660.1
- frontoDem_diag 1 1620.4 1660.4
+ house_type 5 1611.4 1663.4
- AD_diag 1 1626.0 1666.0
- townsend 1 1627.6 1667.6
- age 1 1664.7 1704.7
Step: AIC=1655.13
covid_death ~ n_cancers + townsend + whistling + diabetes + whr +
sex + age + AverageHouseholdIncome + ethnicity_sim + PD_diag +
AD_diag + vascDem_diag + frontoDem_diag + log_crp + log_edu
Df Deviance AIC
- whistling 1 1615.5 1653.5
- PD_diag 1 1615.6 1653.6
- diabetes 1 1615.9 1653.9
- vascDem_diag 1 1616.5 1654.5
<none> 1615.1 1655.1
- log_crp 1 1617.3 1655.3
- AverageHouseholdIncome 5 1625.5 1655.5
- ethnicity_sim 1 1617.8 1655.8
- n_cancers 1 1618.0 1656.0
- log_edu 1 1618.0 1656.0
- whr 1 1618.3 1656.3
+ Population 1 1614.8 1656.8
+ edu_score 1 1614.9 1656.9
+ highBP 2 1613.0 1657.0
+ peopleInHousehold 1 1615.0 1657.0
- sex 1 1619.5 1657.5
+ smoking 2 1614.4 1658.4
- frontoDem_diag 1 1620.7 1658.7
+ house_type 5 1611.7 1661.7
- AD_diag 1 1626.5 1664.5
- townsend 1 1628.5 1666.5
- age 1 1664.8 1702.8
Step: AIC=1653.51
covid_death ~ n_cancers + townsend + diabetes + whr + sex + age +
AverageHouseholdIncome + ethnicity_sim + PD_diag + AD_diag +
vascDem_diag + frontoDem_diag + log_crp + log_edu
Df Deviance AIC
- PD_diag 1 1616.0 1652.0
- diabetes 1 1616.3 1652.3
- vascDem_diag 1 1617.0 1653.0
<none> 1615.5 1653.5
- log_crp 1 1617.9 1653.9
- AverageHouseholdIncome 5 1625.9 1653.9
- ethnicity_sim 1 1618.1 1654.1
- n_cancers 1 1618.3 1654.3
- log_edu 1 1618.3 1654.3
- whr 1 1619.0 1655.0
+ whistling 1 1615.1 1655.1
+ Population 1 1615.2 1655.2
+ edu_score 1 1615.3 1655.3
+ peopleInHousehold 1 1615.4 1655.4
+ highBP 2 1613.5 1655.5
- sex 1 1619.8 1655.8
+ smoking 2 1614.7 1656.7
- frontoDem_diag 1 1621.0 1657.0
+ house_type 5 1612.0 1660.0
- AD_diag 1 1626.8 1662.8
- townsend 1 1629.1 1665.1
- age 1 1665.2 1701.2
Step: AIC=1651.97
covid_death ~ n_cancers + townsend + diabetes + whr + sex + age +
AverageHouseholdIncome + ethnicity_sim + AD_diag + vascDem_diag +
frontoDem_diag + log_crp + log_edu
Df Deviance AIC
- diabetes 1 1616.8 1650.8
- vascDem_diag 1 1617.5 1651.5
<none> 1616.0 1652.0
- log_crp 1 1618.3 1652.3
- AverageHouseholdIncome 5 1626.5 1652.5
- ethnicity_sim 1 1618.6 1652.6
- n_cancers 1 1618.7 1652.7
- log_edu 1 1618.8 1652.8
+ PD_diag 1 1615.5 1653.5
- whr 1 1619.5 1653.5
+ whistling 1 1615.6 1653.6
+ Population 1 1615.7 1653.7
+ edu_score 1 1615.8 1653.8
+ peopleInHousehold 1 1615.9 1653.9
+ highBP 2 1613.9 1653.9
- sex 1 1620.3 1654.3
+ smoking 2 1615.2 1655.2
- frontoDem_diag 1 1621.5 1655.5
+ house_type 5 1612.5 1658.5
- AD_diag 1 1627.1 1661.1
- townsend 1 1629.7 1663.7
- age 1 1666.1 1700.1
Step: AIC=1650.77
covid_death ~ n_cancers + townsend + whr + sex + age + AverageHouseholdIncome +
ethnicity_sim + AD_diag + vascDem_diag + frontoDem_diag +
log_crp + log_edu
Df Deviance AIC
- vascDem_diag 1 1618.5 1650.5
<none> 1616.8 1650.8
- log_crp 1 1619.1 1651.1
- AverageHouseholdIncome 5 1627.4 1651.4
- log_edu 1 1619.5 1651.5
- n_cancers 1 1619.5 1651.5
- ethnicity_sim 1 1619.5 1651.5
+ diabetes 1 1616.0 1652.0
+ PD_diag 1 1616.3 1652.3
+ whistling 1 1616.4 1652.4
+ Population 1 1616.5 1652.5
+ edu_score 1 1616.6 1652.6
+ peopleInHousehold 1 1616.7 1652.7
- sex 1 1620.8 1652.8
+ highBP 2 1614.8 1652.8
- whr 1 1621.2 1653.2
+ smoking 2 1616.0 1654.0
- frontoDem_diag 1 1622.2 1654.2
+ house_type 5 1613.3 1657.3
- AD_diag 1 1628.2 1660.2
- townsend 1 1630.8 1662.8
- age 1 1667.7 1699.7
Step: AIC=1650.47
covid_death ~ n_cancers + townsend + whr + sex + age + AverageHouseholdIncome +
ethnicity_sim + AD_diag + frontoDem_diag + log_crp + log_edu
Df Deviance AIC
<none> 1618.5 1650.5
+ vascDem_diag 1 1616.8 1650.8
- log_crp 1 1620.8 1650.8
- AverageHouseholdIncome 5 1629.0 1651.0
- n_cancers 1 1621.2 1651.2
- ethnicity_sim 1 1621.3 1651.3
- log_edu 1 1621.4 1651.4
+ diabetes 1 1617.5 1651.5
+ PD_diag 1 1617.9 1651.9
+ whistling 1 1618.1 1652.1
+ Population 1 1618.1 1652.1
+ edu_score 1 1618.3 1652.3
+ peopleInHousehold 1 1618.3 1652.3
- sex 1 1622.4 1652.4
+ highBP 2 1616.5 1652.5
- whr 1 1623.2 1653.2
+ smoking 2 1617.8 1653.8
- frontoDem_diag 1 1623.9 1653.9
+ house_type 5 1614.9 1656.9
- AD_diag 1 1630.8 1660.8
- townsend 1 1632.5 1662.5
- age 1 1670.1 1700.1
Call:
glm(formula = covid_death ~ n_cancers + townsend + whr + sex +
age + AverageHouseholdIncome + ethnicity_sim + AD_diag +
frontoDem_diag + log_crp + log_edu, family = "binomial",
data = ukb_dt_subset_na)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.9767 -0.2305 -0.1643 -0.1084 3.4518
Coefficients:
Estimate Std. Error z value
(Intercept) -22.30594 535.41339 -0.042
n_cancers -0.37601 0.24317 -1.546
townsend 0.09685 0.02535 3.821
whr 2.44369 1.12285 2.176
sexMale 0.40982 0.20927 1.958
age 0.08768 0.01331 6.589
AverageHouseholdIncome18,000 to 30,999 10.41858 535.41129 0.019
AverageHouseholdIncome31,000 to 51,999 10.87387 535.41130 0.020
AverageHouseholdIncome52,000 to 100,000 10.76709 535.41134 0.020
AverageHouseholdIncomeGreater than 100,000 9.67883 535.41177 0.018
AverageHouseholdIncomeLess than 18,000 10.94271 535.41126 0.020
ethnicity_simwhite -0.38248 0.21965 -1.741
AD_diagTrue 1.82048 0.43361 4.198
frontoDem_diagTRUE 16.25771 535.41123 0.030
log_crp 0.11620 0.07562 1.537
log_edu -0.12269 0.07075 -1.734
Pr(>|z|)
(Intercept) 0.966769
n_cancers 0.122043
townsend 0.000133 ***
whr 0.029531 *
sexMale 0.050193 .
age 4.44e-11 ***
AverageHouseholdIncome18,000 to 30,999 0.984475
AverageHouseholdIncome31,000 to 51,999 0.983797
AverageHouseholdIncome52,000 to 100,000 0.983956
AverageHouseholdIncomeGreater than 100,000 0.985577
AverageHouseholdIncomeLess than 18,000 0.983694
ethnicity_simwhite 0.081636 .
AD_diagTrue 2.69e-05 ***
frontoDem_diagTRUE 0.975776
log_crp 0.124371
log_edu 0.082875 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1783.7 on 8896 degrees of freedom
Residual deviance: 1618.5 on 8881 degrees of freedom
AIC: 1650.5
Number of Fisher Scoring iterations: 12
The simplified model is:
glm(formula = covid_death ~ n_cancers + townsend + whr + sex + age + AverageHouseholdIncome + ethnicity_sim + AD_diag + frontoDem_diag + log_crp + log_edu + so2_val, family = “binomial”, data = ukb_dt_subset_na)
#### Simplified COVID death model
%%R
ukb_covid_death_covid_red.b = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim + AD_diag + frontoDem_diag +
log_crp,
family = "binomial", data = ukb_dt)
stargazer::stargazer(ukb_covid_death_covid_red.b, type ="html",
single.row=TRUE,
out = "fig_out/ukb_covid_deathAndCOVID_coefficients_simplifiedModel.html")
ukb_covid_death_covid_red.b_odds =
data.frame(cbind(exp(cbind(OR = coef(ukb_covid_death_covid_red.b),
conf_int_low = summary(ukb_covid_death_covid_red.b)$coefficients[,1] -
summary(ukb_covid_death_covid_red.b)$coefficients[,2],
conf_int_high = summary(ukb_covid_death_covid_red.b)$coefficients[,1] +
summary(ukb_covid_death_covid_red.b)$coefficients[,2])),
p_value = summary(ukb_covid_death_covid_red.b)$coefficients[,4]))
stargazer::stargazer(ukb_covid_death_covid_red.b_odds, type ="html",
single.row=TRUE, summary = FALSE,
out = "fig_out/ukb_covid_deathAndCOVID_odds_simplifiedModel.html")
write.csv(ukb_covid_death_covid_red.b_odds,"data_output/ukb_covid_deathAndCOVID_odds_simplifiedModel.csv", row.names=TRUE)
ukb_covid_death_covid_red.b_odds$significance = "p-value > 0.05"
ukb_covid_death_covid_red.b_odds$significance[ukb_covid_death_covid_red.b_odds$p_value < 0.05] <- "p-value < 0.05"
ukb_covid_death_covid_red.b_odds$variables = row.names(ukb_covid_death_covid_red.b_odds)
library(ggplot2)
ggplot(ukb_covid_death_covid_red.b_odds,
aes(x=reorder(variables, OR), y=OR, color = significance)) +
geom_point(shape=21, size = 2) +
geom_errorbar(aes(ymin=conf_int_low, ymax=conf_int_high),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
theme_classic() +
geom_hline(yintercept = 1, linetype="dotted") +
coord_flip()+ylab("Mortality odds ratios, death and COVID-19, simplified model") +
xlab("")
ggsave('fig_out/UKB_deathAndCOVID_ORplot_simplifiedModel.png', width = 6, height = 4)
<table style="text-align:center"><tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr>
<tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr>
<tr><td style="text-align:left"></td><td>covid_death</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">n_cancers</td><td>-0.449<sup>**</sup> (0.218)</td></tr>
<tr><td style="text-align:left">townsend</td><td>0.070<sup>***</sup> (0.019)</td></tr>
<tr><td style="text-align:left">diabetes</td><td>0.087 (0.190)</td></tr>
<tr><td style="text-align:left">whr</td><td>1.763<sup>*</sup> (0.983)</td></tr>
<tr><td style="text-align:left">sexMale</td><td>0.329<sup>*</sup> (0.176)</td></tr>
<tr><td style="text-align:left">age</td><td>0.089<sup>***</sup> (0.011)</td></tr>
<tr><td style="text-align:left">ethnicity_simwhite</td><td>-0.274 (0.190)</td></tr>
<tr><td style="text-align:left">AD_diagTrue</td><td>1.428<sup>***</sup> (0.372)</td></tr>
<tr><td style="text-align:left">frontoDem_diag</td><td>2.795<sup>**</sup> (1.100)</td></tr>
<tr><td style="text-align:left">log_crp</td><td>0.103 (0.065)</td></tr>
<tr><td style="text-align:left">Constant</td><td>-11.384<sup>***</sup> (1.189)</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>11,255</td></tr>
<tr><td style="text-align:left">Log Likelihood</td><td>-1,103.131</td></tr>
<tr><td style="text-align:left">Akaike Inf. Crit.</td><td>2,228.262</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
</table>
<table style="text-align:center"><tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td>OR</td><td>conf_int_low</td><td>conf_int_high</td><td>p_value</td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">(Intercept)</td><td>0.00001</td><td>0.00000</td><td>0.00004</td><td>0</td></tr>
<tr><td style="text-align:left">n_cancers</td><td>0.639</td><td>0.514</td><td>0.794</td><td>0.039</td></tr>
<tr><td style="text-align:left">townsend</td><td>1.073</td><td>1.052</td><td>1.094</td><td>0.0003</td></tr>
<tr><td style="text-align:left">diabetesTRUE</td><td>1.091</td><td>0.902</td><td>1.320</td><td>0.648</td></tr>
<tr><td style="text-align:left">whr</td><td>5.831</td><td>2.182</td><td>15.581</td><td>0.073</td></tr>
<tr><td style="text-align:left">sexMale</td><td>1.389</td><td>1.165</td><td>1.658</td><td>0.062</td></tr>
<tr><td style="text-align:left">age</td><td>1.093</td><td>1.082</td><td>1.105</td><td>0</td></tr>
<tr><td style="text-align:left">ethnicity_simwhite</td><td>0.760</td><td>0.629</td><td>0.919</td><td>0.148</td></tr>
<tr><td style="text-align:left">AD_diagTrue</td><td>4.172</td><td>2.876</td><td>6.054</td><td>0.0001</td></tr>
<tr><td style="text-align:left">frontoDem_diagTRUE</td><td>16.361</td><td>5.446</td><td>49.157</td><td>0.011</td></tr>
<tr><td style="text-align:left">log_crp</td><td>1.108</td><td>1.039</td><td>1.182</td><td>0.111</td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr></table>
%%R
ukb_covid_death_covid_AD.b = glm(formula = covid_death ~
age + AD_diag+sex+whr, family = "binomial",
data = ukb_dt)
stargazer::stargazer(ukb_covid_death_covid_AD.b, type ="html",
single.row=TRUE,
out = "fig_out/ukb_covid_deathAndCOVID_coefficients_ADmodel.html")
ukb_covid_death_covid_AD.b_odds =
data.frame(cbind(exp(cbind(OR = coef(ukb_covid_death_covid_AD.b),
conf_int_low = summary(ukb_covid_death_covid_AD.b)$coefficients[,1] -
summary(ukb_covid_death_covid_AD.b)$coefficients[,2],
conf_int_high = summary(ukb_covid_death_covid_AD.b)$coefficients[,1] +
summary(ukb_covid_death_covid_AD.b)$coefficients[,2])),
p_value = summary(ukb_covid_death_covid_AD.b)$coefficients[,4]))
stargazer::stargazer(ukb_covid_death_covid_AD.b_odds, type ="html",
single.row=TRUE, summary = FALSE,
out = "fig_out/ukb_covid_deathAndCOVID_odds_ADmodel.html")
write.csv(ukb_covid_death_covid_AD.b_odds,"data_output/ukb_covid_deathAndCOVID_odds_ADmodel.csv", row.names=TRUE)
ukb_covid_death_covid_AD.b_odds$significance = "p-value > 0.05"
ukb_covid_death_covid_AD.b_odds$significance[ukb_covid_death_covid_AD.b_odds$p_value < 0.05] <- "p-value < 0.05"
ukb_covid_death_covid_AD.b_odds$variables = row.names(ukb_covid_death_covid_AD.b_odds)
library(ggplot2)
ggplot(ukb_covid_death_covid_AD.b_odds,
aes(x=reorder(variables, OR), y=OR, color = significance)) +
geom_point(shape=21, size = 2) +
geom_errorbar(aes(ymin=conf_int_low, ymax=conf_int_high),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
theme_classic() +
geom_hline(yintercept = 1, linetype="dotted") +
coord_flip()+ylab("Mortality odds ratios, death and COVID-19, AD model") +
xlab("Variables")
ggsave('fig_out/UKB_deathAndCOVID_ORplot_sig_only_ADmodel.pdf', width = 6, height = 4)
ggsave('fig_out/UKB_deathAndCOVID_ORplot_sig_only_ADmodel.png', width = 6, height = 4)
<table style="text-align:center"><tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr>
<tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr>
<tr><td style="text-align:left"></td><td>covid_death</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">age</td><td>0.667<sup>***</sup> (0.080)</td></tr>
<tr><td style="text-align:left">AD_diagTrue</td><td>1.440<sup>***</sup> (0.349)</td></tr>
<tr><td style="text-align:left">sexMale</td><td>0.158 (0.156)</td></tr>
<tr><td style="text-align:left">whr</td><td>0.322<sup>***</sup> (0.077)</td></tr>
<tr><td style="text-align:left">Constant</td><td>-4.178<sup>***</sup> (0.111)</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>13,252</td></tr>
<tr><td style="text-align:left">Log Likelihood</td><td>-1,328.771</td></tr>
<tr><td style="text-align:left">Akaike Inf. Crit.</td><td>2,667.542</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
</table>
<table style="text-align:center"><tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td>OR</td><td>conf_int_low</td><td>conf_int_high</td><td>p_value</td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">(Intercept)</td><td>0.015</td><td>0.014</td><td>0.017</td><td>0</td></tr>
<tr><td style="text-align:left">age</td><td>1.948</td><td>1.798</td><td>2.110</td><td>0</td></tr>
<tr><td style="text-align:left">AD_diagTrue</td><td>4.220</td><td>2.978</td><td>5.982</td><td>0.00004</td></tr>
<tr><td style="text-align:left">sexMale</td><td>1.171</td><td>1.003</td><td>1.368</td><td>0.310</td></tr>
<tr><td style="text-align:left">whr</td><td>1.381</td><td>1.279</td><td>1.491</td><td>0.00003</td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr></table>
#### White matter hypersintensity model
%%R
summary(ukb_covid_and_death.b <-glm(data = ukb_dt, covid_death ~ #Population + n_cancers + smoking + whistling + diabetes+
whr + sex + age +townsend
# +peopleInHousehold+AverageHouseholdIncome
+ethnicity_sim+highBP+
# PD_diag+ AD_diag + vascDem_diag+frontoDem_diag+
# greyVol
# + whiteVol
+ AD_diag
# edu_score+ crp_norm+
# edu_level + house_type+
# pm25_val+no2_val+nox_val+so2_val+pm10_val+o3_val
, family = 'binomial'))
Call:
glm(formula = covid_death ~ whr + sex + age + townsend + ethnicity_sim +
highBP + +AD_diag, family = "binomial", data = ukb_dt)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8392 -0.2439 -0.1808 -0.1240 3.3356
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -11.868701 1.034713 -11.471 < 2e-16 ***
whr 2.812646 0.845737 3.326 0.000882 ***
sexMale 0.227886 0.157048 1.451 0.146762
age 0.081450 0.009763 8.342 < 2e-16 ***
townsend 0.074186 0.017420 4.259 2.06e-05 ***
ethnicity_simwhite -0.276915 0.172010 -1.610 0.107425
highBPFalse -0.441129 0.338972 -1.301 0.193131
highBPTrue -0.085265 0.334391 -0.255 0.798735
AD_diagTrue 1.435865 0.350335 4.099 4.16e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2812.2 on 13229 degrees of freedom
Residual deviance: 2617.3 on 13221 degrees of freedom
(108 observations deleted due to missingness)
AIC: 2635.3
Number of Fisher Scoring iterations: 7
%%R
summary(ukb_covid_and_death.b <-glm(data = ukb_dt, covid_death ~ #Population + n_cancers + smoking + whistling + diabetes+
whr + sex + age +townsend+
# +peopleInHousehold+AverageHouseholdIncome
# +ethnicity_sim+highBP+
# PD_diag+ AD_diag + vascDem_diag+frontoDem_diag+
# greyVol
# + whiteVol
log_wmh
# edu_score+ crp_norm+
# edu_level + house_type+
# pm25_val+no2_val+nox_val+so2_val+pm10_val+o3_val
, family = 'binomial'))
Call:
glm(formula = covid_death ~ whr + sex + age + townsend + log_wmh,
family = "binomial", data = ukb_dt)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.7815 -0.0728 -0.0183 -0.0037 3.6233
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -40.73758 16.53183 -2.464 0.0137 *
whr 11.18633 7.31094 1.530 0.1260
sexMale -1.56655 1.24340 -1.260 0.2077
age 0.39223 0.16850 2.328 0.0199 *
townsend 0.06657 0.15811 0.421 0.6737
log_wmh 0.52021 0.48487 1.073 0.2833
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 61.174 on 836 degrees of freedom
Residual deviance: 42.614 on 831 degrees of freedom
(12501 observations deleted due to missingness)
AIC: 54.614
Number of Fisher Scoring iterations: 10
%%R
ukb_dt_wmh = na.omit(subset(ukb_dt, select = c(AD_diag, log_wmh, covid_death)))
sum(ukb_dt_wmh$AD_diag==TRUE)
[1] 0
The white matter hyperintensity analysis was not significant because there were only 2 participants who both died and received brain scans. Future work could elucidate this further.
The CRP variable was deleted due to NA. Adding it does not change the main conclusion of the model.
%%R
ukb_dt_AD = subset(ukb_dt, AD_diag == "True")
ukb_covid_death_covid_red.b = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim,
family = "binomial", data = ukb_dt_AD)
stargazer::stargazer(ukb_covid_death_covid_red.b, type ="html",
single.row=TRUE,
out = "fig_out/ukb_deathAndCOVID_coefficients_subsetAD.html")
ukb_covid_death_covid_red.b_odds =
data.frame(cbind(exp(cbind(OR = coef(ukb_covid_death_covid_red.b),
conf_int_low = summary(ukb_covid_death_covid_red.b)$coefficients[,1] -
summary(ukb_covid_death_covid_red.b)$coefficients[,2],
conf_int_high = summary(ukb_covid_death_covid_red.b)$coefficients[,1] +
summary(ukb_covid_death_covid_red.b)$coefficients[,2])),
p_value = summary(ukb_covid_death_covid_red.b)$coefficients[,4]))
write.csv(ukb_covid_death_covid_red.b_odds,"data_output/ukb_deathAndCOVID_odds_simplifiedModel_subsetAD.csv", row.names=TRUE)
ukb_covid_death_covid_red.b_odds$significance = "p-value > 0.05"
ukb_covid_death_covid_red.b_odds$significance[ukb_covid_death_covid_red.b_odds$p_value < 0.05] <- "p-value < 0.05"
ukb_covid_death_covid_red.b_odds$variables = row.names(ukb_covid_death_covid_red.b_odds)
library(ggplot2)
ggplot(ukb_covid_death_covid_red.b_odds,
aes(x=reorder(variables, OR), y=OR, color = significance)) +
geom_point(shape=21, size = 2) +
geom_errorbar(aes(ymin=conf_int_low, ymax=conf_int_high),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
theme_classic() +
geom_hline(yintercept = 1, linetype="dotted") +
coord_flip()+ylab("Mortality odds ratios, death and COVID-19, simplified model for AD participants") +
xlab("")
ggsave('fig_out/UKB_deathAndCOVID_ORplot_simplifiedModel_subsetAD.png', width = 6, height = 4)
<table style="text-align:center"><tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr>
<tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr>
<tr><td style="text-align:left"></td><td>covid_death</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">n_cancers</td><td>1.453 (1.097)</td></tr>
<tr><td style="text-align:left">townsend</td><td>-0.096 (0.118)</td></tr>
<tr><td style="text-align:left">diabetes</td><td>0.250 (0.944)</td></tr>
<tr><td style="text-align:left">whr</td><td>-8.547 (6.453)</td></tr>
<tr><td style="text-align:left">sexMale</td><td>1.400 (0.992)</td></tr>
<tr><td style="text-align:left">age</td><td>0.015 (0.107)</td></tr>
<tr><td style="text-align:left">ethnicity_simwhite</td><td>15.363 (1,546.944)</td></tr>
<tr><td style="text-align:left">Constant</td><td>-11.942 (1,546.975)</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>83</td></tr>
<tr><td style="text-align:left">Log Likelihood</td><td>-27.042</td></tr>
<tr><td style="text-align:left">Akaike Inf. Crit.</td><td>70.083</td></tr>
<tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
</table>
%%R
med_raw = read.csv("/Users/yizhouyu/camDrive/ARUK_project/additional_datasets/data/medications_20003_dt.csv")
med_coding = read.csv("/Users/yizhouyu/camDrive/ARUK_project/additional_datasets/medications_coding4.tsv",
sep = "\t")
%%R
drug_list = c("fluoxetine",
"warfarin",
"hydrocortisone",
"amantadine",
"fluvoxamine",
"donepezil hydrochloride",
"galantamine",
"memantine hydrochloride","doxycycline")
coding_subset = med_coding[med_coding$meaning %in% drug_list, ]
# length(drug_list)
nrow(coding_subset)
[1] 9
%%R
#subset
eid_covid = read.csv("/Users/yizhouyu/camDrive/Miguel_shared_Yizhou/COVID_project_yizhou/AirPoll_UKB_6000/ukb_covid19/data_output/ukb_covid_addDeath.csv")[,c("eid")]
med_covid = med_raw[med_raw$eid %in% eid_covid, ]
### Create separate columns for each of the 8 drugs investigated
%%R
coding_subset
coding meaning
2577 1140873394 doxycycline
2960 1140874896 hydrocortisone
3509 1140879540 fluoxetine
3510 1140879544 fluvoxamine
3523 1140879644 amantadine
4247 1140888266 warfarin
5199 1141150834 donepezil hydrochloride
5828 1141171578 galantamine
6219 1141182732 memantine hydrochloride
%%R
med_covid$hydrocortisone = apply(med_covid, 1, function(x) any(x %in% c("1140874896")))
med_covid$fluoxetine = apply(med_covid, 1, function(x) any(x %in% c("1140879540")))
med_covid$fluvoxamine = apply(med_covid, 1, function(x) any(x %in% c("1140879544")))
med_covid$amantadine = apply(med_covid, 1, function(x) any(x %in% c("1140879644")))
med_covid$warfarin = apply(med_covid, 1, function(x) any(x %in% c("1140888266")))
med_covid$donepezil = apply(med_covid, 1, function(x) any(x %in% c("1141150834")))
med_covid$galantamine = apply(med_covid, 1, function(x) any(x %in% c("1141171578")))
med_covid$memantine = apply(med_covid, 1, function(x) any(x %in% c("1141182732")))
med_covid$doxycycline = apply(med_covid, 1, function(x) any(x %in% c("1140873394")))
%%R
med_covid_out = med_covid[,c("eid","hydrocortisone","fluoxetine",
"fluvoxamine","amantadine","warfarin","donepezil","galantamine","memantine",
"doxycycline")]
write.csv(med_covid_out,"data_output/covid_drug_subset.csv",row.names=FALSE)
### Check how many people take each drug in the drug dataset
%%R
colSums(med_covid_out[,-1])
hydrocortisone fluoxetine fluvoxamine amantadine warfarin
36 228 1 9 249
donepezil galantamine memantine doxycycline
1 0 0 15
%%R
write.csv(merge(
read.csv(
"/Users/yizhouyu/camDrive/Miguel_shared_Yizhou/COVID_project_yizhou/AirPoll_UKB_6000/ukb_covid19/data_output/ukb_covid_addDeath.csv"),
read.csv("data_output/covid_drug_subset.csv"), by = "eid",all.x = TRUE),
"data_output/ukb_covid_addDrugs.csv", row.names = F)
### Descriptive stats for the drugs
%%R
ukb_dt = read.csv("data_output/ukb_covid_addDrugs.csv")
ukb_dt$covid_death = ifelse(ukb_dt$result == 1 & ukb_dt$death == TRUE,
1, 0)
library(arsenal)
ukb_descript_stats <- tableby(covid_death ~ ., data = ukb_dt[,2:ncol(ukb_dt)])
write2pdf(ukb_descript_stats, "ukb_descriptStats_drugs.pdf",
keep.md = TRUE,
quiet = TRUE, # passed to rmarkdown::render
title = "Descriptive statistics of the UK Biobank data") # passed to summary.tableby
write2word(ukb_descript_stats, "ukb_descriptStats_drugs.doc",
keep.md = TRUE,
quiet = TRUE, # passed to rmarkdown::render
title = "Descriptive statistics of the UK Biobank data") # passed to summary.tableby
%%R
ukb_dt = read.csv("data_output/ukb_covid_addDrugs.csv")
ukb_dt$covid_death = ifelse(ukb_dt$result == 1 & ukb_dt$death == TRUE,
1, 0)
ukb_dt_pos = subset(ukb_dt, result == 1)
ukb_dt_AD = subset(ukb_dt_pos, AD_diag == "True")
ukb_dt_PD = subset(ukb_dt_pos, PD_diag == "True")
ukb_dt_dem = subset(ukb_dt_pos, allDem_diag == TRUE)
library(arsenal)
ukb_descript_COVIDpos <- tableby(covid_death ~ ., data = ukb_dt_pos[,3:ncol(ukb_dt_pos)])
write2word(ukb_descript_COVIDpos, "ukb_descript_COVIDpos.doc",
keep.md = TRUE,
quiet = TRUE, # passed to rmarkdown::render
title = "Descriptive statistics of the UK Biobank data, all participants were tested positive") # passed to summary.tableby
%%R
# infection with drugs
ukb_dt = read.csv("data_output/ukb_covid_addDrugs.csv")
colnames(ukb_dt)
[1] "eid" "n_cancers" "townsend"
[4] "smoking" "ethnicity" "whistling"
[7] "diabetes" "sex" "peopleInHousehold"
[10] "AverageHouseholdIncome" "result" "PD_diag"
[13] "AD_diag" "highBP" "age"
[16] "no2_val" "nox_val" "so2_val"
[19] "pm10_val" "o3_val" "pm25_val"
[22] "Population" "whr" "edu_level"
[25] "house_type" "ethnicity_sim" "greyVol"
[28] "whiteVol" "brainVol" "wmh"
[31] "edu_score" "crp" "tot_p"
[34] "all_cause_dem_date" "all_cause_dem_source" "AD_date"
[37] "AD_source" "vasc_dem_date" "vasc_dem_source"
[40] "fronto_dem_date" "fronto_dem_source" "allDem_diag"
[43] "frontoDem_diag" "vascDem_diag" "crp_norm"
[46] "wmh_norm" "blood_group" "log_wmh"
[49] "log_crp" "log_edu" "death"
[52] "hydrocortisone" "fluoxetine" "fluvoxamine"
[55] "amantadine" "warfarin" "donepezil"
[58] "galantamine" "memantine" "doxycycline"
%%R
# infection with drugs
ukb_dt = read.csv("data_output/ukb_covid_addDrugs.csv")
ukb_descript_COVID_infection <- tableby(result ~ ., data = ukb_dt[,c(11,12,13,15,43,44,52:ncol(ukb_dt))])
write2word(ukb_descript_COVID_infection, "ukb_descript_COVID_infectionDrugs.doc",
keep.md = TRUE,
quiet = TRUE, # passed to rmarkdown::render
title = "Descriptive statistics of the UK Biobank data, drugs & COVID-19 infection") # passed to summary.tableby
ukb_descript_COVID_amantidine <- tableby(amantadine ~ ., data = ukb_dt[,c(11,12,13,15,43,44,52:ncol(ukb_dt))])
write2word(ukb_descript_COVID_amantidine, "ukb_descript_COVID_infectionAmantidine.doc",
keep.md = TRUE,
quiet = TRUE, # passed to rmarkdown::render
title = "Descriptive statistics of the UK Biobank data, Amantadine") # passed to summary.tableby
%%R
ukb_descript_COVIDpos_AD <- tableby(covid_death ~ ., data = ukb_dt_AD[,54:ncol(ukb_dt_AD)])
write2word(ukb_descript_COVIDpos_AD, "ukb_descript_COVIDpos_AD.doc",
keep.md = TRUE,
quiet = TRUE, # passed to rmarkdown::render
title = "Descriptive statistics of the UK Biobank data, all participants were tested positive and have AD") # passed to summary.tableby
ukb_descript_COVIDpos_PD <- tableby(covid_death ~ ., data = ukb_dt_PD[,54:ncol(ukb_dt_PD)])
write2word(ukb_descript_COVIDpos_PD, "ukb_descript_COVIDpos_PD.doc",
keep.md = TRUE,
quiet = TRUE, # passed to rmarkdown::render
title = "Descriptive statistics of the UK Biobank data, all participants were tested positive and have PD") # passed to summary.tableby
ukb_descript_COVIDpos_dem <- tableby(covid_death ~ ., data = ukb_dt_dem[,54:ncol(ukb_dt_dem)])
write2word(ukb_descript_COVIDpos_dem, "ukb_descript_COVIDpos_dem.doc",
keep.md = TRUE,
quiet = TRUE, # passed to rmarkdown::render
title = "Descriptive statistics of the UK Biobank data, all participants were tested positive and have dementia") # passed to summary.tableby
%%R
# covid_hydro.b = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
# whr + sex + age + ethnicity_sim + AD_diag + frontoDem_diag +
# log_crp + log_edu+
# hydrocortisone,
# family = "binomial", data = ukb_dt)
ukb_dt = read.csv("data_output/ukb_covid_addDrugs.csv")
ukb_dt$covid_death = ifelse(ukb_dt$result == 1 & ukb_dt$death == TRUE,
1, 0)
ukb_dt_covid = subset(ukb_dt, result == 1)
covid_hydro.b = glm(formula = covid_death ~ sex + age +
hydrocortisone,
family = "binomial", data = ukb_dt_covid)
covid_fluo.b = glm(formula = covid_death ~ sex + age +
fluoxetine,
family = "binomial", data = ukb_dt_covid)
covid_aman.b = glm(formula = covid_death ~ sex + age +
amantadine,
family = "binomial", data = ukb_dt_covid)
covid_war.b = glm(formula = covid_death ~ sex + age +
warfarin,
family = "binomial", data = ukb_dt_covid)
covid_dox.b = glm(formula = covid_death ~ sex + age +
doxycycline,
family = "binomial", data = ukb_dt_covid)
stargazer::stargazer(covid_hydro.b,covid_fluo.b,covid_aman.b,covid_war.b,covid_dox.b, type ="html",
single.row=TRUE,
out = "fig_out/ukb_deathAndCOVID_drug_models.html")
<table style="text-align:center"><tr><td colspan="6" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="5"><em>Dependent variable:</em></td></tr>
<tr><td></td><td colspan="5" style="border-bottom: 1px solid black"></td></tr>
<tr><td style="text-align:left"></td><td colspan="5">covid_death</td></tr>
<tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td><td>(3)</td><td>(4)</td><td>(5)</td></tr>
<tr><td colspan="6" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">sexMale</td><td>0.443<sup>***</sup> (0.141)</td><td>0.443<sup>***</sup> (0.141)</td><td>0.454<sup>***</sup> (0.141)</td><td>0.447<sup>***</sup> (0.141)</td><td>0.449<sup>***</sup> (0.141)</td></tr>
<tr><td style="text-align:left">age</td><td>0.115<sup>***</sup> (0.010)</td><td>0.115<sup>***</sup> (0.010)</td><td>0.115<sup>***</sup> (0.010)</td><td>0.114<sup>***</sup> (0.010)</td><td>0.115<sup>***</sup> (0.010)</td></tr>
<tr><td style="text-align:left">hydrocortisone</td><td>1.614 (1.272)</td><td></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">fluoxetine</td><td></td><td>-1.229 (1.043)</td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">amantadine</td><td></td><td></td><td>-12.314 (324.744)</td><td></td><td></td></tr>
<tr><td style="text-align:left">warfarin</td><td></td><td></td><td></td><td>0.344 (0.421)</td><td></td></tr>
<tr><td style="text-align:left">doxycycline</td><td></td><td></td><td></td><td></td><td>0.395 (1.262)</td></tr>
<tr><td style="text-align:left">Constant</td><td>-9.894<sup>***</sup> (0.717)</td><td>-9.859<sup>***</sup> (0.715)</td><td>-9.922<sup>***</sup> (0.717)</td><td>-9.873<sup>***</sup> (0.717)</td><td>-9.904<sup>***</sup> (0.716)</td></tr>
<tr><td colspan="6" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>1,626</td><td>1,626</td><td>1,626</td><td>1,626</td><td>1,626</td></tr>
<tr><td style="text-align:left">Log Likelihood</td><td>-669.139</td><td>-669.025</td><td>-669.436</td><td>-669.684</td><td>-669.963</td></tr>
<tr><td style="text-align:left">Akaike Inf. Crit.</td><td>1,346.279</td><td>1,346.049</td><td>1,346.872</td><td>1,347.367</td><td>1,347.926</td></tr>
<tr><td colspan="6" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="5" style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
</table>
%%R
ukb_dt = read.csv("data_output/ukb_covid_addDrugs.csv")
ukb_dt_amantadine = subset(ukb_dt, amantadine == TRUE)
write.csv(ukb_dt_amantadine, "data_output/ukb_dt_amantadine_susbet.csv")
# AD / PD drug analysis
## Data curation
### Subset AD & PD eids
%%R
ukb_dt = read.csv("data_output/ukb_covid_addDrugs.csv")
ad_pd = subset(ukb_dt, AD_diag == TRUE | PD_diag == TRUE)
length(ad_pd$eid)
[1] 239
%%R
med_raw = read.csv("/Users/yizhouyu/camDrive/ARUK_project/additional_datasets/data/medications_20003_dt.csv")
med_coding = read.csv("/Users/yizhouyu/camDrive/ARUK_project/additional_datasets/medications_coding4.tsv",
sep = "\t")
ad_pd_med = med_raw[med_raw$eid %in% ad_pd$eid, ]
%%R
ad_pd_med <- ad_pd_med[,colSums(is.na(ad_pd_med))<nrow(ad_pd_med)]
ncol(ad_pd_med)
write.csv(ad_pd_med,"data_output/ukb_adpd_med.csv",row.names = F)
%%R
ad_pd_med = read.csv("data_output/ukb_adpd_med.csv")
ad_pd_med_dt = ad_pd_med[,-c(1:4)]
Make the dataframe into a single vector
%%R
ad_pd_med_vec = c()
for (rowi in 1:nrow(ad_pd_med_dt)){
curr_row = as.vector(unlist(ad_pd_med_dt[rowi,]))
ad_pd_med_vec = c(ad_pd_med_vec,curr_row)
}
%%R
ad_pd_med_vec_na = na.omit(ad_pd_med_vec)
length(ad_pd_med_vec_na)
[1] 1147
Count the frequencies of drugs present in this list
%%R
ad_pd_med_freq = table(ad_pd_med_vec_na)
ad_pd_med_freq[order(-as.numeric(ad_pd_med_freq))]
ad_pd_med_vec_na
2038460150 1140868226 1140861958 99999 1140865634 1140860806 1140884600
64 58 49 34 26 24 21
1140871310 1141146234 1141164060 1141188442 1140864752 1141194794 1140928274
19 19 18 18 17 17 15
1141191044 1140866738 1140921600 1193 1140879802 1140879616 1140882112
15 14 13 12 12 11 11
1140923346 1140872338 1140879668 1140860696 1140861998 1140909674 1140872420
11 10 10 9 9 9 8
1140884488 1140888560 1140909708 1140926934 1141164872 1141189134 1140852976
8 8 8 8 8 8 7
1140879406 1140879644 1140879760 1140888648 1140925800 1141168318 1140874420
7 7 7 7 7 7 6
1140879540 1140879778 1140888266 1140888552 1140922174 1141176832 1140864992
6 6 6 6 6 6 5
1140865350 1140865800 1140870422 1140871050 1140874744 1140875408 1140881856
5 5 5 5 5 5 5
1140883066 1140883476 1141156836 1141169700 1141177526 1140867888 1140871462
5 5 5 5 5 4 4
1140872228 1140872320 1140883504 1140888646 1140910766 1140916282 1140928338
4 4 4 4 4 4 4
1141152732 1140861090 1140861806 1140863152 1140865716 1140865752 1140866122
4 3 3 3 3 3 3
1140869848 1140870390 1140870856 1140871024 1140872072 1140872150 1140875420
3 3 3 3 3 3 3
1140876592 1140879428 1140879634 1140879842 1140883568 1140888098 1140888510
3 3 3 3 3 3 3
1140888688 1140909786 1140909806 1140909816 1141162764 1141164068 1141171948
3 3 3 3 3 3 3
1141192410 1187 1203 1140860954 1140861088 1140861954 1140862572
3 2 2 2 2 2 2
1140862772 1140863144 1140865396 1140866078 1140866236 1140867878 1140868170
2 2 2 2 2 2 2
1140868550 1140869034 1140871112 1140871692 1140872200 1140872434 1140874930
2 2 2 2 2 2 2
1140876404 1140879430 1140882110 1140882374 1140883524 1140884444 1140884672
2 2 2 2 2 2 2
1140888366 1140909702 1140910832 1140911734 1140923336 1140928916 1141146612
2 2 2 2 2 2 2
1141150944 1141156846 1141157126 1141163114 1141164086 1141164828 1141167594
2 2 2 2 2 2 2
1141171932 1141174520 1141175684 1141179992 1141187810 1141192736 1141200322
2 2 2 2 2 2 2
2038459814 1195 1140851812 1140861120 1140861166 1140861416 1140861884
2 1 1 1 1 1 1
1140861924 1140862016 1140862260 1140862476 1140862810 1140862952 1140863182
1 1 1 1 1 1 1
1140863202 1140863552 1140864286 1140864950 1140865010 1140865414 1140865416
1 1 1 1 1 1 1
1140865426 1140865580 1140865588 1140865654 1140865668 1140866026 1140866116
1 1 1 1 1 1 1
1140866280 1140866406 1140867304 1140867420 1140867444 1140867504 1140867726
1 1 1 1 1 1 1
1140867818 1140868120 1140868134 1140868192 1140868258 1140868408 1140868472
1 1 1 1 1 1 1
1140868848 1140870306 1140870554 1140870788 1140870932 1140870954 1140871168
1 1 1 1 1 1 1
1140871188 1140871196 1140871266 1140872198 1140872290 1140873474 1140873548
1 1 1 1 1 1 1
1140873642 1140874138 1140874718 1140874794 1140875452 1140875596 1140875840
1 1 1 1 1 1 1
1140875990 1140876076 1140876136 1140876312 1140876424 1140877892 1140878512
1 1 1 1 1 1 1
1140879048 1140879404 1140879464 1140879494 1140879620 1140879648 1140879774
1 1 1 1 1 1 1
1140879794 1140879854 1140880014 1140881318 1140881334 1140881442 1140881446
1 1 1 1 1 1 1
1140881472 1140882236 1140882618 1140882694 1140883494 1140883548 1140883560
1 1 1 1 1 1 1
1140883664 1140884498 1140884516 1140888386 1140888502 1140888538 1140888758
1 1 1 1 1 1 1
1140909368 1140909480 1140909788 1140911658 1140911698 1140911730 1140911732
1 1 1 1 1 1 1
1140916790 1140923350 1140923402 1140923670 1140926732 1140926848 1140927086
1 1 1 1 1 1 1
1140927328 1140929012 1141145660 1141145668 1141146138 1141152098 1141152590
1 1 1 1 1 1 1
1141152848 1141152998 1141153490 1141156858 1141157252 1141157260 1141157364
1 1 1 1 1 1 1
1141157398 1141162824 1141165512 1141168326 1141168554 1141168680 1141168822
1 1 1 1 1 1 1
1141168936 1141168944 1141169666 1141171496 1141171502 1141171646 1141172698
1 1 1 1 1 1 1
1141172924 1141173342 1141173872 1141173888 1141174032 1141176732 1141180200
1 1 1 1 1 1 1
1141180212 1141180392 1141181700 1141182628 1141184748 1141187776 1141188146
1 1 1 1 1 1 1
1141189090 1141189132 1141189680 1141190160 1141192248 1141192740 1141192916
1 1 1 1 1 1 1
1141193290 1141193346 1141194800 1141199858 1141199940 1141200004 1141200072
1 1 1 1 1 1 1
1141200708
1
2038460150 - paracetamol - 64
1140868226 - aspirin - 58
1140861958 - simvastatin - 49
99999 - NA
1140865634 - omeprazole - 26
1140860806 - ramipril - 24
1140884600 - metformin - 21
1140871310 - ibuprofen - 19
1141146234 - atorvastatin - 19
1141164060 - pramipexole - 18
1141188442 - glucosamine - 18
1140864752 - lansoprazole - 17
1141194794 - bendroflumethiazide - 17
1140928274 - ropinirole - 15
1141191044 - levothyroxine - 15
I should take away paracetamol, aspirin, ibuprofen as these are common drugs
%%R
ad_pd_med = read.csv("data_output/ukb_adpd_med.csv")
ad_pd_med$simvastatin = apply(ad_pd_med, 1, function(x) any(x %in% c("1140861958")))
ad_pd_med$omeprazole = apply(ad_pd_med, 1, function(x) any(x %in% c("1140865634")))
ad_pd_med$ramipril = apply(ad_pd_med, 1, function(x) any(x %in% c("1140860806")))
ad_pd_med$metformin = apply(ad_pd_med, 1, function(x) any(x %in% c("1140884600")))
ad_pd_med$atorvastatin = apply(ad_pd_med, 1, function(x) any(x %in% c("1141146234")))
ad_pd_med$pramipexole = apply(ad_pd_med, 1, function(x) any(x %in% c("1141164060")))
ad_pd_med$glucosamine = apply(ad_pd_med, 1, function(x) any(x %in% c("1141188442")))
ad_pd_med$lansoprazole = apply(ad_pd_med, 1, function(x) any(x %in% c("1140864752")))
ad_pd_med$bendroflumethiazide = apply(ad_pd_med, 1, function(x) any(x %in% c("1141194794")))
ad_pd_med$ropinirole = apply(ad_pd_med, 1, function(x) any(x %in% c("1140928274")))
ad_pd_med$levothyroxine = apply(ad_pd_med, 1, function(x) any(x %in% c("1141191044")))
%%R
ad_pd_med_subs = subset(ad_pd_med, select = c(eid,simvastatin,omeprazole,ramipril,
metformin,atorvastatin,pramipexole,
glucosamine,lansoprazole,
bendroflumethiazide,ropinirole,levothyroxine))
%%R
ad_pd_merge = merge(ad_pd,ad_pd_med_subs,by="eid")
ad_pd_merge$covid_death = ifelse(ad_pd_merge$result == 1 & ad_pd_merge$death == TRUE,
1, 0)
write.csv(ad_pd_merge, "data_output/ad_pd_drugs_fulldt.csv", row.names=F)
%%R
ad_pd = read.csv("data_output/ad_pd_drugs_fulldt.csv")
colnames(ad_pd)
[1] "eid" "n_cancers" "townsend"
[4] "smoking" "ethnicity" "whistling"
[7] "diabetes" "sex" "peopleInHousehold"
[10] "AverageHouseholdIncome" "result" "PD_diag"
[13] "AD_diag" "highBP" "age"
[16] "no2_val" "nox_val" "so2_val"
[19] "pm10_val" "o3_val" "pm25_val"
[22] "Population" "whr" "edu_level"
[25] "house_type" "ethnicity_sim" "greyVol"
[28] "whiteVol" "brainVol" "wmh"
[31] "edu_score" "crp" "tot_p"
[34] "all_cause_dem_date" "all_cause_dem_source" "AD_date"
[37] "AD_source" "vasc_dem_date" "vasc_dem_source"
[40] "fronto_dem_date" "fronto_dem_source" "allDem_diag"
[43] "frontoDem_diag" "vascDem_diag" "crp_norm"
[46] "wmh_norm" "blood_group" "log_wmh"
[49] "log_crp" "log_edu" "death"
[52] "hydrocortisone" "fluoxetine" "fluvoxamine"
[55] "amantadine" "warfarin" "donepezil"
[58] "galantamine" "memantine" "doxycycline"
[61] "simvastatin" "omeprazole" "ramipril"
[64] "metformin" "atorvastatin" "pramipexole"
[67] "glucosamine" "lansoprazole" "bendroflumethiazide"
[70] "ropinirole" "levothyroxine" "covid_death"
In this dataset, the updated drug data starts from column number 61 and ends on column number 71. Drug names: simvastatin,omeprazole,ramipril, metformin,atorvastatin,pramipexole, glucosamine,lansoprazole, bendroflumethiazide,ropinirole,levothyroxine
%%R
library(arsenal)
ad_med_descrip = subset(ad_pd, AD_diag == TRUE, select = c(covid_death,simvastatin,omeprazole,ramipril,
metformin,atorvastatin,pramipexole,
glucosamine,lansoprazole,
bendroflumethiazide,ropinirole,levothyroxine))
pd_med_descrip = subset(ad_pd, PD_diag == TRUE, select = c(covid_death,simvastatin,omeprazole,ramipril,
metformin,atorvastatin,pramipexole,
glucosamine,lansoprazole,
bendroflumethiazide,ropinirole,levothyroxine))
ad_med_descrip_t <- tableby(covid_death ~ ., data = ad_med_descrip)
pd_med_descrip_t <- tableby(covid_death ~ ., data = pd_med_descrip)
write2word(ad_med_descrip_t, "ad_drugs_ukb_descriptStats_deaths.doc",
keep.md = TRUE,
quiet = TRUE, # passed to rmarkdown::render
title = "Descriptive statistics of the UK Biobank data, drugs in AD patients, 1 = death from covid") # passed to summary.tableby
write2word(pd_med_descrip_t, "pd_drugs_ukb_descriptStats_deaths.doc",
keep.md = TRUE,
quiet = TRUE, # passed to rmarkdown::render
title = "Descriptive statistics of the UK Biobank data, drugs in PD patients, 1 = death from covid") # passed to summary.tableby
In this dataset, the updated drug data starts from column number 61 and ends on column number 71. Drug names: simvastatin,omeprazole,ramipril, metformin,atorvastatin,pramipexole, glucosamine,lansoprazole, bendroflumethiazide,ropinirole,levothyroxine
I will do separate models for AD vs PD
#### Complex model
%%R
# AD only dataset
ad_drugs_dt = subset(ad_pd, AD_diag == TRUE)
colnames(ad_drugs_dt)
[1] "eid" "n_cancers" "townsend"
[4] "smoking" "ethnicity" "whistling"
[7] "diabetes" "sex" "peopleInHousehold"
[10] "AverageHouseholdIncome" "result" "PD_diag"
[13] "AD_diag" "highBP" "age"
[16] "no2_val" "nox_val" "so2_val"
[19] "pm10_val" "o3_val" "pm25_val"
[22] "Population" "whr" "edu_level"
[25] "house_type" "ethnicity_sim" "greyVol"
[28] "whiteVol" "brainVol" "wmh"
[31] "edu_score" "crp" "tot_p"
[34] "all_cause_dem_date" "all_cause_dem_source" "AD_date"
[37] "AD_source" "vasc_dem_date" "vasc_dem_source"
[40] "fronto_dem_date" "fronto_dem_source" "allDem_diag"
[43] "frontoDem_diag" "vascDem_diag" "crp_norm"
[46] "wmh_norm" "blood_group" "log_wmh"
[49] "log_crp" "log_edu" "death"
[52] "hydrocortisone" "fluoxetine" "fluvoxamine"
[55] "amantadine" "warfarin" "donepezil"
[58] "galantamine" "memantine" "doxycycline"
[61] "simvastatin" "omeprazole" "ramipril"
[64] "metformin" "atorvastatin" "pramipexole"
[67] "glucosamine" "lansoprazole" "bendroflumethiazide"
[70] "ropinirole" "levothyroxine" "covid_death"
%%R
simvastatin = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
simvastatin,
family = "binomial", data = ad_drugs_dt)
omeprazole = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
omeprazole,
family = "binomial", data = ad_drugs_dt)
ramipril = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
ramipril,
family = "binomial", data = ad_drugs_dt)
metformin = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
metformin,
family = "binomial", data = ad_drugs_dt)
atorvastatin = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
atorvastatin,
family = "binomial", data = ad_drugs_dt)
# pramipexole = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
# whr + sex + age + ethnicity_sim +
# log_crp +
# pramipexole,
# family = "binomial", data = ad_drugs_dt)
glucosamine = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
glucosamine,
family = "binomial", data = ad_drugs_dt)
lansoprazole = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
lansoprazole,
family = "binomial", data = ad_drugs_dt)
bendroflumethiazide = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
bendroflumethiazide,
family = "binomial", data = ad_drugs_dt)
# ropinirole = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
# whr + sex + age + ethnicity_sim +
# log_crp +
# ropinirole,
# family = "binomial", data = ad_drugs_dt)
levothyroxine = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
levothyroxine,
family = "binomial", data = ad_drugs_dt)
AD_drugs_df = data.frame(drug = c("simvastatin","omeprazole","ramipril",
"metformin","atorvastatin",
# "pramipexole",
"glucosamine","lansoprazole",
"bendroflumethiazide",
# "ropinirole",
"levothyroxine"),
OR = c(summary(simvastatin)$coefficients[10,1],
summary(omeprazole)$coefficients[10,1],
summary(ramipril)$coefficients[10,1],
summary(metformin)$coefficients[10,1],
summary(atorvastatin)$coefficients[10,1],
# summary(pramipexole)$coefficients[10,1],
summary(glucosamine)$coefficients[10,1],
summary(lansoprazole)$coefficients[10,1],
summary(bendroflumethiazide)$coefficients[10,1],
# summary(ropinirole)$coefficients[10,1],
summary(levothyroxine)$coefficients[10,1]),
p_val = c(summary(simvastatin)$coefficients[10,4],
summary(omeprazole)$coefficients[10,4],
summary(ramipril)$coefficients[10,4],
summary(metformin)$coefficients[10,4],
summary(atorvastatin)$coefficients[10,4],
# summary(pramipexole)$coefficients[10,4],
summary(glucosamine)$coefficients[10,4],
summary(lansoprazole)$coefficients[10,4],
summary(bendroflumethiazide)$coefficients[10,4],
# summary(ropinirole)$coefficients[10,4],
summary(levothyroxine)$coefficients[10,4]))
AD_drugs_df
drug OR p_val
1 simvastatin -1.3623323 0.2611871
2 omeprazole 1.6841169 0.2615778
3 ramipril 1.1253548 0.4517741
4 metformin 17.4057213 0.9946937
5 atorvastatin 1.7336503 0.1490702
6 glucosamine 1.4611078 0.2085539
7 lansoprazole -15.5097862 0.9966842
8 bendroflumethiazide -16.9287283 0.9948159
9 levothyroxine 0.9282706 0.4837088
%%R
write.csv(AD_drugs_df,"data_output/AD_drugs_models_full.csv",row.names=F)
#### Models with only age, sex and whr as covariates
%%R
simvastatin = glm(formula = covid_death ~ whr + sex + age +
simvastatin,
family = "binomial", data = ad_drugs_dt)
omeprazole = glm(formula = covid_death ~ whr + sex + age +
omeprazole,
family = "binomial", data = ad_drugs_dt)
ramipril = glm(formula = covid_death ~ whr + sex + age +
ramipril,
family = "binomial", data = ad_drugs_dt)
metformin = glm(formula = covid_death ~ whr + sex + age +
metformin,
family = "binomial", data = ad_drugs_dt)
atorvastatin = glm(formula = covid_death ~ whr + sex + age +
atorvastatin,
family = "binomial", data = ad_drugs_dt)
pramipexole = glm(formula = covid_death ~ whr + sex + age +
pramipexole,
family = "binomial", data = ad_drugs_dt)
glucosamine = glm(formula = covid_death ~ whr + sex + age +
glucosamine,
family = "binomial", data = ad_drugs_dt)
lansoprazole = glm(formula = covid_death ~ whr + sex + age +
lansoprazole,
family = "binomial", data = ad_drugs_dt)
bendroflumethiazide = glm(formula = covid_death ~ whr + sex + age +
bendroflumethiazide,
family = "binomial", data = ad_drugs_dt)
ropinirole = glm(formula = covid_death ~ whr + sex + age +
ropinirole,
family = "binomial", data = ad_drugs_dt)
levothyroxine = glm(formula = covid_death ~ whr + sex + age +
levothyroxine,
family = "binomial", data = ad_drugs_dt)
AD_drugs_df = data.frame(drug = c("simvastatin","omeprazole","ramipril",
"metformin","atorvastatin",
# "pramipexole",
"glucosamine","lansoprazole",
"bendroflumethiazide",
# "ropinirole",
"levothyroxine"),
OR = c(summary(simvastatin)$coefficients[5,1],
summary(omeprazole)$coefficients[5,1],
summary(ramipril)$coefficients[5,1],
summary(metformin)$coefficients[5,1],
summary(atorvastatin)$coefficients[5,1],
# summary(pramipexole)$coefficients[5,1],
summary(glucosamine)$coefficients[5,1],
summary(lansoprazole)$coefficients[5,1],
summary(bendroflumethiazide)$coefficients[5,1],
# summary(ropinirole)$coefficients[5,1],
summary(levothyroxine)$coefficients[5,1]),
p_val = c(summary(simvastatin)$coefficients[5,4],
summary(omeprazole)$coefficients[5,4],
summary(ramipril)$coefficients[5,4],
summary(metformin)$coefficients[5,4],
summary(atorvastatin)$coefficients[5,4],
# summary(pramipexole)$coefficients[5,4],
summary(glucosamine)$coefficients[5,4],
summary(lansoprazole)$coefficients[5,4],
summary(bendroflumethiazide)$coefficients[5,4],
# summary(ropinirole)$coefficients[5,4],
summary(levothyroxine)$coefficients[5,4]))
AD_drugs_df
drug OR p_val
1 simvastatin -1.2312245 0.2742804
2 omeprazole 0.9135951 0.4585437
3 ramipril -0.2833312 0.8075671
4 metformin 0.6881912 0.4810317
5 atorvastatin 0.5498309 0.5564945
6 glucosamine 1.0631918 0.2631753
7 lansoprazole -15.8980908 0.9933369
8 bendroflumethiazide -16.4780011 0.9945993
9 levothyroxine 1.0232247 0.4181616
%%R
write.csv(AD_drugs_df,"data_output/AD_drugs_models_simplified.csv",row.names=F)
%%R
simvastatin = glm(formula = covid_death ~# whr + sex + age +
simvastatin,
family = "binomial", data = ad_drugs_dt)
omeprazole = glm(formula = covid_death ~ #whr + sex + age +
omeprazole,
family = "binomial", data = ad_drugs_dt)
ramipril = glm(formula = covid_death ~ #whr + sex + age +
ramipril,
family = "binomial", data = ad_drugs_dt)
metformin = glm(formula = covid_death ~ #whr + sex + age +
metformin,
family = "binomial", data = ad_drugs_dt)
atorvastatin = glm(formula = covid_death ~ #whr + sex + age +
atorvastatin,
family = "binomial", data = ad_drugs_dt)
pramipexole = glm(formula = covid_death ~ #whr + sex + age +
pramipexole,
family = "binomial", data = ad_drugs_dt)
glucosamine = glm(formula = covid_death ~ #whr + sex + age +
glucosamine,
family = "binomial", data = ad_drugs_dt)
lansoprazole = glm(formula = covid_death ~ #whr + sex + age +
lansoprazole,
family = "binomial", data = ad_drugs_dt)
bendroflumethiazide = glm(formula = covid_death ~ #whr + sex + age +
bendroflumethiazide,
family = "binomial", data = ad_drugs_dt)
ropinirole = glm(formula = covid_death ~ #whr + sex + age +
ropinirole,
family = "binomial", data = ad_drugs_dt)
levothyroxine = glm(formula = covid_death ~ #whr + sex + age +
levothyroxine,
family = "binomial", data = ad_drugs_dt)
AD_drugs_df = data.frame(drug = c("simvastatin","omeprazole","ramipril",
"metformin","atorvastatin",
# "pramipexole",
"glucosamine","lansoprazole",
"bendroflumethiazide",
# "ropinirole",
"levothyroxine"),
OR = c(summary(simvastatin)$coefficients[2,1],
summary(omeprazole)$coefficients[2,1],
summary(ramipril)$coefficients[2,1],
summary(metformin)$coefficients[2,1],
summary(atorvastatin)$coefficients[2,1],
# summary(pramipexole)$coefficients[2,1],
summary(glucosamine)$coefficients[2,1],
summary(lansoprazole)$coefficients[2,1],
summary(bendroflumethiazide)$coefficients[2,1],
# summary(ropinirole)$coefficients[2,1],
summary(levothyroxine)$coefficients[2,1]),
p_val = c(summary(simvastatin)$coefficients[2,4],
summary(omeprazole)$coefficients[2,4],
summary(ramipril)$coefficients[2,4],
summary(metformin)$coefficients[2,4],
summary(atorvastatin)$coefficients[2,4],
# summary(pramipexole)$coefficients[2,4],
summary(glucosamine)$coefficients[2,4],
summary(lansoprazole)$coefficients[2,4],
summary(bendroflumethiazide)$coefficients[2,4],
# summary(ropinirole)$coefficients[2,4],
summary(levothyroxine)$coefficients[2,4]))
AD_drugs_df
drug OR p_val
1 simvastatin -0.9873867 0.3648760
2 omeprazole 0.4274440 0.7104472
3 ramipril -0.2200619 0.8431803
4 metformin 0.3589451 0.6747753
5 atorvastatin 0.5908683 0.4954917
6 glucosamine 1.2383742 0.1765039
7 lansoprazole -15.6201583 0.9936995
8 bendroflumethiazide -15.6639610 0.9916419
9 levothyroxine 0.2305237 0.8393149
%%R
write.csv(AD_drugs_df,"data_output/AD_drugs_models_onlyDrug.csv",row.names=F)
%%R
pd_drugs_dt = subset(ad_pd, PD_diag == TRUE)
%%R
simvastatin = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
simvastatin,
family = "binomial", data = pd_drugs_dt)
omeprazole = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
omeprazole,
family = "binomial", data = pd_drugs_dt)
ramipril = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
ramipril,
family = "binomial", data = pd_drugs_dt)
metformin = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
metformin,
family = "binomial", data = pd_drugs_dt)
atorvastatin = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
atorvastatin,
family = "binomial", data = pd_drugs_dt)
pramipexole = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
pramipexole,
family = "binomial", data = pd_drugs_dt)
glucosamine = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
glucosamine,
family = "binomial", data = pd_drugs_dt)
lansoprazole = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
lansoprazole,
family = "binomial", data = pd_drugs_dt)
bendroflumethiazide = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
bendroflumethiazide,
family = "binomial", data = pd_drugs_dt)
ropinirole = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
ropinirole,
family = "binomial", data = pd_drugs_dt)
levothyroxine = glm(formula = covid_death ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
levothyroxine,
family = "binomial", data = pd_drugs_dt)
PD_drugs_df = data.frame(drug = c("simvastatin","omeprazole","ramipril",
"metformin","atorvastatin",
"pramipexole",
"glucosamine","lansoprazole",
"bendroflumethiazide",
"ropinirole",
"levothyroxine"),
OR = c(summary(simvastatin)$coefficients[10,1],
summary(omeprazole)$coefficients[10,1],
summary(ramipril)$coefficients[10,1],
summary(metformin)$coefficients[10,1],
summary(atorvastatin)$coefficients[10,1],
summary(pramipexole)$coefficients[10,1],
summary(glucosamine)$coefficients[10,1],
summary(lansoprazole)$coefficients[10,1],
summary(bendroflumethiazide)$coefficients[10,1],
summary(ropinirole)$coefficients[10,1],
summary(levothyroxine)$coefficients[10,1]),
p_val = c(summary(simvastatin)$coefficients[10,4],
summary(omeprazole)$coefficients[10,4],
summary(ramipril)$coefficients[10,4],
summary(metformin)$coefficients[10,4],
summary(atorvastatin)$coefficients[10,4],
summary(pramipexole)$coefficients[10,4],
summary(glucosamine)$coefficients[10,4],
summary(lansoprazole)$coefficients[10,4],
summary(bendroflumethiazide)$coefficients[10,4],
summary(ropinirole)$coefficients[10,4],
summary(levothyroxine)$coefficients[10,4]))
PD_drugs_df
drug OR p_val
1 simvastatin 0.4658756 0.7674519
2 omeprazole 1.2807060 0.5388554
3 ramipril -18.2709600 0.9964870
4 metformin -0.7207236 0.9999425
5 atorvastatin -12.6315602 0.9982307
6 pramipexole -16.7533019 0.9963677
7 glucosamine -16.7480751 0.9972084
8 lansoprazole -17.7079015 0.9967769
9 bendroflumethiazide 0.8652933 0.5511678
10 ropinirole 2.2390895 0.1834277
11 levothyroxine 20.0595389 0.9955084
%%R
write.csv(PD_drugs_df,"data_output/PD_drugs_models_full.csv",row.names=F)
I went through the drugs one by one here, but did not find any significance in any of these drugs.
%%R
simvastatin = glm(formula = covid_death ~ whr + sex + age +
simvastatin,
family = "binomial", data = pd_drugs_dt)
omeprazole = glm(formula = covid_death ~ whr + sex + age +
omeprazole,
family = "binomial", data = pd_drugs_dt)
ramipril = glm(formula = covid_death ~ whr + sex + age +
ramipril,
family = "binomial", data = pd_drugs_dt)
metformin = glm(formula = covid_death ~ whr + sex + age +
metformin,
family = "binomial", data = pd_drugs_dt)
atorvastatin = glm(formula = covid_death ~ whr + sex + age +
atorvastatin,
family = "binomial", data = pd_drugs_dt)
pramipexole = glm(formula = covid_death ~ whr + sex + age +
pramipexole,
family = "binomial", data = pd_drugs_dt)
glucosamine = glm(formula = covid_death ~ whr + sex + age +
glucosamine,
family = "binomial", data = pd_drugs_dt)
lansoprazole = glm(formula = covid_death ~ whr + sex + age +
lansoprazole,
family = "binomial", data = pd_drugs_dt)
bendroflumethiazide = glm(formula = covid_death ~ whr + sex + age +
bendroflumethiazide,
family = "binomial", data = pd_drugs_dt)
ropinirole = glm(formula = covid_death ~ whr + sex + age +
ropinirole,
family = "binomial", data = pd_drugs_dt)
levothyroxine = glm(formula = covid_death ~ whr + sex + age +
levothyroxine,
family = "binomial", data = pd_drugs_dt)
PD_drugs_df = data.frame(drug = c("simvastatin","omeprazole","ramipril",
"metformin","atorvastatin",
"pramipexole",
"glucosamine","lansoprazole",
"bendroflumethiazide",
"ropinirole",
"levothyroxine"),
OR = c(summary(simvastatin)$coefficients[5,1],
summary(omeprazole)$coefficients[5,1],
summary(ramipril)$coefficients[5,1],
summary(metformin)$coefficients[5,1],
summary(atorvastatin)$coefficients[5,1],
summary(pramipexole)$coefficients[5,1],
summary(glucosamine)$coefficients[5,1],
summary(lansoprazole)$coefficients[5,1],
summary(bendroflumethiazide)$coefficients[5,1],
summary(ropinirole)$coefficients[5,1],
summary(levothyroxine)$coefficients[5,1]),
p_val = c(summary(simvastatin)$coefficients[5,4],
summary(omeprazole)$coefficients[5,4],
summary(ramipril)$coefficients[5,4],
summary(metformin)$coefficients[5,4],
summary(atorvastatin)$coefficients[5,4],
summary(pramipexole)$coefficients[5,4],
summary(glucosamine)$coefficients[5,4],
summary(lansoprazole)$coefficients[5,4],
summary(bendroflumethiazide)$coefficients[5,4],
summary(ropinirole)$coefficients[5,4],
summary(levothyroxine)$coefficients[5,4]))
PD_drugs_df
drug OR p_val
1 simvastatin 0.07162034 0.95304938
2 omeprazole 1.74720240 0.09091277
3 ramipril -16.50258464 0.99512349
4 metformin -14.88151427 0.99361670
5 atorvastatin -14.11293706 0.99438294
6 pramipexole -16.46162860 0.99438476
7 glucosamine -14.93117251 0.99368414
8 lansoprazole -15.91273856 0.99531145
9 bendroflumethiazide 0.89648212 0.46777092
10 ropinirole 1.43610977 0.26555036
11 levothyroxine 1.65734261 0.28396684
%%R
summary(omeprazole)
Call:
glm(formula = covid_death ~ whr + sex + age + omeprazole, family = "binomial",
data = pd_drugs_dt)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.97748 -0.27678 -0.16898 -0.09128 2.92397
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.1369 12.1535 -0.587 0.5570
whr -14.0625 6.9706 -2.017 0.0437 *
sexMale 1.6665 1.1387 1.464 0.1433
age 0.1955 0.1369 1.428 0.1532
omeprazoleTRUE 1.7472 1.0335 1.691 0.0909 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 50.785 on 154 degrees of freedom
Residual deviance: 39.988 on 150 degrees of freedom
(2 observations deleted due to missingness)
AIC: 49.988
Number of Fisher Scoring iterations: 7
%%R
write.csv(PD_drugs_df,"data_output/PD_drugs_models_simplified.csv",row.names=F)
#### Model with only drug - raw odds
%%R
simvastatin = glm(formula = covid_death ~ #whr + sex + age +
simvastatin,
family = "binomial", data = pd_drugs_dt)
omeprazole = glm(formula = covid_death ~ #whr + sex + age +
omeprazole,
family = "binomial", data = pd_drugs_dt)
ramipril = glm(formula = covid_death ~ #whr + sex + age +
ramipril,
family = "binomial", data = pd_drugs_dt)
metformin = glm(formula = covid_death ~ #whr + sex + age +
metformin,
family = "binomial", data = pd_drugs_dt)
atorvastatin = glm(formula = covid_death ~ #whr + sex + age +
atorvastatin,
family = "binomial", data = pd_drugs_dt)
pramipexole = glm(formula = covid_death ~ #whr + sex + age +
pramipexole,
family = "binomial", data = pd_drugs_dt)
glucosamine = glm(formula = covid_death ~ #whr + sex + age +
glucosamine,
family = "binomial", data = pd_drugs_dt)
lansoprazole = glm(formula = covid_death ~ #whr + sex + age +
lansoprazole,
family = "binomial", data = pd_drugs_dt)
bendroflumethiazide = glm(formula = covid_death ~ #whr + sex + age +
bendroflumethiazide,
family = "binomial", data = pd_drugs_dt)
ropinirole = glm(formula = covid_death ~ #whr + sex + age +
ropinirole,
family = "binomial", data = pd_drugs_dt)
levothyroxine = glm(formula = covid_death ~ #whr + sex + age +
levothyroxine,
family = "binomial", data = pd_drugs_dt)
PD_drugs_df = data.frame(drug = c("simvastatin","omeprazole","ramipril",
"metformin","atorvastatin",
"pramipexole",
"glucosamine","lansoprazole",
"bendroflumethiazide",
"ropinirole",
"levothyroxine"),
OR = c(summary(simvastatin)$coefficients[2,1],
summary(omeprazole)$coefficients[2,1],
summary(ramipril)$coefficients[2,1],
summary(metformin)$coefficients[2,1],
summary(atorvastatin)$coefficients[2,1],
summary(pramipexole)$coefficients[2,1],
summary(glucosamine)$coefficients[2,1],
summary(lansoprazole)$coefficients[2,1],
summary(bendroflumethiazide)$coefficients[2,1],
summary(ropinirole)$coefficients[2,1],
summary(levothyroxine)$coefficients[2,1]),
p_val = c(summary(simvastatin)$coefficients[2,4],
summary(omeprazole)$coefficients[2,4],
summary(ramipril)$coefficients[2,4],
summary(metformin)$coefficients[2,4],
summary(atorvastatin)$coefficients[2,4],
summary(pramipexole)$coefficients[2,4],
summary(glucosamine)$coefficients[2,4],
summary(lansoprazole)$coefficients[2,4],
summary(bendroflumethiazide)$coefficients[2,4],
summary(ropinirole)$coefficients[2,4],
summary(levothyroxine)$coefficients[2,4]))
PD_drugs_df
drug OR p_val
1 simvastatin -0.2148447 0.8471060
2 omeprazole 1.2452158 0.1665432
3 ramipril -15.4378471 0.9929342
4 metformin -15.3949833 0.9946736
5 atorvastatin -15.4020009 0.9943479
6 pramipexole -16.4674789 0.9948164
7 glucosamine -15.4161856 0.9937456
8 lansoprazole -15.4305743 0.9931944
9 bendroflumethiazide 1.1491646 0.3168656
10 ropinirole 0.8401294 0.4595782
11 levothyroxine 1.4144652 0.2234264
%%R
write.csv(PD_drugs_df,"data_output/PD_drugs_models_drugsOnly.csv",row.names=F)
### AD infection models
%%R
ad_pd = read.csv("data_output/ad_pd_drugs_fulldt.csv")
library(arsenal)
pd_med_descrip_cases = subset(ad_pd, PD_diag == TRUE, select = c(result,simvastatin,omeprazole,ramipril,
metformin,atorvastatin,pramipexole,
glucosamine,lansoprazole,
bendroflumethiazide,ropinirole,levothyroxine))
ad_med_descrip_cases = subset(ad_pd, AD_diag == TRUE, select = c(result,simvastatin,omeprazole,ramipril,
metformin,atorvastatin,pramipexole,
glucosamine,lansoprazole,
bendroflumethiazide,ropinirole,levothyroxine))
pd_med_descrip_t <- tableby(result ~ ., data = pd_med_descrip_cases)
ad_med_descrip_t <- tableby(result ~ ., data = ad_med_descrip_cases)
write2word(ad_med_descrip_t, "ad_drugs_ukb_descriptStats_cases.doc",
keep.md = TRUE,
quiet = TRUE, # passed to rmarkdown::render
title = "Descriptive statistics of the UK Biobank data, drugs in AD patients, COVID-19 positive = 1") # passed to summary.tableby
write2word(pd_med_descrip_t, "pd_drugs_ukb_descriptStats_cases.doc",
keep.md = TRUE,
quiet = TRUE, # passed to rmarkdown::render
title = "Descriptive statistics of the UK Biobank data, drugs in PD patients, COVID-19 positive = 1") # passed to summary.tableby
#### Complex model
%%R
ad_drugs_dt = subset(ad_pd, AD_diag == TRUE)
simvastatin = glm(formula = result ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
simvastatin,
family = "binomial", data = ad_drugs_dt)
omeprazole = glm(formula = result ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
omeprazole,
family = "binomial", data = ad_drugs_dt)
ramipril = glm(formula = result ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
ramipril,
family = "binomial", data = ad_drugs_dt)
metformin = glm(formula = result ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
metformin,
family = "binomial", data = ad_drugs_dt)
atorvastatin = glm(formula = result ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
atorvastatin,
family = "binomial", data = ad_drugs_dt)
# pramipexole = glm(formula = result ~ n_cancers + townsend + diabetes +
# whr + sex + age + ethnicity_sim +
# log_crp +
# pramipexole,
# family = "binomial", data = ad_drugs_dt)
glucosamine = glm(formula = result ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
glucosamine,
family = "binomial", data = ad_drugs_dt)
lansoprazole = glm(formula = result ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
lansoprazole,
family = "binomial", data = ad_drugs_dt)
bendroflumethiazide = glm(formula = result ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
bendroflumethiazide,
family = "binomial", data = ad_drugs_dt)
# ropinirole = glm(formula = result ~ n_cancers + townsend + diabetes +
# whr + sex + age + ethnicity_sim +
# log_crp +
# ropinirole,
# family = "binomial", data = ad_drugs_dt)
levothyroxine = glm(formula = result ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
levothyroxine,
family = "binomial", data = ad_drugs_dt)
AD_drugs_df = data.frame(drug = c("simvastatin","omeprazole","ramipril",
"metformin","atorvastatin",
# "pramipexole",
"glucosamine","lansoprazole",
"bendroflumethiazide",
# "ropinirole",
"levothyroxine"),
OR = c(summary(simvastatin)$coefficients[10,1],
summary(omeprazole)$coefficients[10,1],
summary(ramipril)$coefficients[10,1],
summary(metformin)$coefficients[10,1],
summary(atorvastatin)$coefficients[10,1],
# summary(pramipexole)$coefficients[10,1],
summary(glucosamine)$coefficients[10,1],
summary(lansoprazole)$coefficients[10,1],
summary(bendroflumethiazide)$coefficients[10,1],
# summary(ropinirole)$coefficients[10,1],
summary(levothyroxine)$coefficients[10,1]),
p_val = c(summary(simvastatin)$coefficients[10,4],
summary(omeprazole)$coefficients[10,4],
summary(ramipril)$coefficients[10,4],
summary(metformin)$coefficients[10,4],
summary(atorvastatin)$coefficients[10,4],
# summary(pramipexole)$coefficients[10,4],
summary(glucosamine)$coefficients[10,4],
summary(lansoprazole)$coefficients[10,4],
summary(bendroflumethiazide)$coefficients[10,4],
# summary(ropinirole)$coefficients[10,4],
summary(levothyroxine)$coefficients[10,4]))
AD_drugs_df
drug OR p_val
1 simvastatin 0.36144397 0.61416383
2 omeprazole 0.67235271 0.62079398
3 ramipril 0.76086798 0.49491516
4 metformin -0.04267783 0.97664534
5 atorvastatin 1.79817407 0.06132538
6 glucosamine 0.79224227 0.45546781
7 lansoprazole -16.17441979 0.99422477
8 bendroflumethiazide -1.14082976 0.34534770
9 levothyroxine -0.77575472 0.52331018
#### Simplified model
%%R
simvastatin = glm(formula = result ~ whr + sex + age +
simvastatin,
family = "binomial", data = ad_drugs_dt)
omeprazole = glm(formula = result ~ whr + sex + age +
omeprazole,
family = "binomial", data = ad_drugs_dt)
ramipril = glm(formula = result ~ whr + sex + age +
ramipril,
family = "binomial", data = ad_drugs_dt)
metformin = glm(formula = result ~ whr + sex + age +
metformin,
family = "binomial", data = ad_drugs_dt)
atorvastatin = glm(formula = result ~ whr + sex + age +
atorvastatin,
family = "binomial", data = ad_drugs_dt)
pramipexole = glm(formula = result ~ whr + sex + age +
pramipexole,
family = "binomial", data = ad_drugs_dt)
glucosamine = glm(formula = result ~ whr + sex + age +
glucosamine,
family = "binomial", data = ad_drugs_dt)
lansoprazole = glm(formula = result ~ whr + sex + age +
lansoprazole,
family = "binomial", data = ad_drugs_dt)
bendroflumethiazide = glm(formula = result ~ whr + sex + age +
bendroflumethiazide,
family = "binomial", data = ad_drugs_dt)
ropinirole = glm(formula = result ~ whr + sex + age +
ropinirole,
family = "binomial", data = ad_drugs_dt)
levothyroxine = glm(formula = result ~ whr + sex + age +
levothyroxine,
family = "binomial", data = ad_drugs_dt)
AD_drugs_df = data.frame(drug = c("simvastatin","omeprazole","ramipril",
"metformin","atorvastatin",
# "pramipexole",
"glucosamine","lansoprazole",
"bendroflumethiazide",
# "ropinirole",
"levothyroxine"),
OR = c(summary(simvastatin)$coefficients[5,1],
summary(omeprazole)$coefficients[5,1],
summary(ramipril)$coefficients[5,1],
summary(metformin)$coefficients[5,1],
summary(atorvastatin)$coefficients[5,1],
# summary(pramipexole)$coefficients[5,1],
summary(glucosamine)$coefficients[5,1],
summary(lansoprazole)$coefficients[5,1],
summary(bendroflumethiazide)$coefficients[5,1],
# summary(ropinirole)$coefficients[5,1],
summary(levothyroxine)$coefficients[5,1]),
p_val = c(summary(simvastatin)$coefficients[5,4],
summary(omeprazole)$coefficients[5,4],
summary(ramipril)$coefficients[5,4],
summary(metformin)$coefficients[5,4],
summary(atorvastatin)$coefficients[5,4],
# summary(pramipexole)$coefficients[5,4],
summary(glucosamine)$coefficients[5,4],
summary(lansoprazole)$coefficients[5,4],
summary(bendroflumethiazide)$coefficients[5,4],
# summary(ropinirole)$coefficients[5,4],
summary(levothyroxine)$coefficients[5,4]))
AD_drugs_df
drug OR p_val
1 simvastatin -0.094957715 0.8764135
2 omeprazole 0.322974403 0.7303070
3 ramipril -0.198887226 0.7973555
4 metformin -0.001851328 0.9979772
5 atorvastatin 0.116301789 0.8707328
6 glucosamine 1.059994442 0.1990808
7 lansoprazole -16.116506142 0.9888694
8 bendroflumethiazide -1.027919542 0.3589906
9 levothyroxine -0.779117535 0.4958791
### PD infection models
%%R
pd_drugs_dt = subset(ad_pd, PD_diag == TRUE)
simvastatin = glm(formula = result ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
simvastatin,
family = "binomial", data = pd_drugs_dt)
omeprazole = glm(formula = result ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
omeprazole,
family = "binomial", data = pd_drugs_dt)
ramipril = glm(formula = result ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
ramipril,
family = "binomial", data = pd_drugs_dt)
metformin = glm(formula = result ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
metformin,
family = "binomial", data = pd_drugs_dt)
atorvastatin = glm(formula = result ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
atorvastatin,
family = "binomial", data = pd_drugs_dt)
pramipexole = glm(formula = result ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
pramipexole,
family = "binomial", data = pd_drugs_dt)
glucosamine = glm(formula = result ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
glucosamine,
family = "binomial", data = pd_drugs_dt)
lansoprazole = glm(formula = result ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
lansoprazole,
family = "binomial", data = pd_drugs_dt)
bendroflumethiazide = glm(formula = result ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
bendroflumethiazide,
family = "binomial", data = pd_drugs_dt)
ropinirole = glm(formula = result ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
ropinirole,
family = "binomial", data = pd_drugs_dt)
levothyroxine = glm(formula = result ~ n_cancers + townsend + diabetes +
whr + sex + age + ethnicity_sim +
log_crp +
levothyroxine,
family = "binomial", data = pd_drugs_dt)
PD_drugs_df = data.frame(drug = c("simvastatin","omeprazole","ramipril",
"metformin","atorvastatin",
"pramipexole",
"glucosamine","lansoprazole",
"bendroflumethiazide",
"ropinirole",
"levothyroxine"),
OR = c(summary(simvastatin)$coefficients[10,1],
summary(omeprazole)$coefficients[10,1],
summary(ramipril)$coefficients[10,1],
summary(metformin)$coefficients[10,1],
summary(atorvastatin)$coefficients[10,1],
summary(pramipexole)$coefficients[10,1],
summary(glucosamine)$coefficients[10,1],
summary(lansoprazole)$coefficients[10,1],
summary(bendroflumethiazide)$coefficients[10,1],
summary(ropinirole)$coefficients[10,1],
summary(levothyroxine)$coefficients[10,1]),
p_val = c(summary(simvastatin)$coefficients[10,4],
summary(omeprazole)$coefficients[10,4],
summary(ramipril)$coefficients[10,4],
summary(metformin)$coefficients[10,4],
summary(atorvastatin)$coefficients[10,4],
summary(pramipexole)$coefficients[10,4],
summary(glucosamine)$coefficients[10,4],
summary(lansoprazole)$coefficients[10,4],
summary(bendroflumethiazide)$coefficients[10,4],
summary(ropinirole)$coefficients[10,4],
summary(levothyroxine)$coefficients[10,4]))
PD_drugs_df
drug OR p_val
1 simvastatin 0.25023003 0.7005671
2 omeprazole -1.16952379 0.3544743
3 ramipril -0.98028857 0.3899831
4 metformin 0.11757809 0.9999603
5 atorvastatin -16.07450429 0.9939753
6 pramipexole -0.27469056 0.7583992
7 glucosamine -1.02818106 0.3594908
8 lansoprazole -0.08191310 0.9463307
9 bendroflumethiazide 0.07602376 0.9351695
10 ropinirole 0.88815282 0.2796343
11 levothyroxine -0.73294971 0.5415217
#### Simplified model
%%R
simvastatin = glm(formula = result ~ whr + sex + age +
simvastatin,
family = "binomial", data = pd_drugs_dt)
omeprazole = glm(formula = covid_death ~ whr + sex + age +
omeprazole,
family = "binomial", data = pd_drugs_dt)
ramipril = glm(formula = result ~ whr + sex + age +
ramipril,
family = "binomial", data = pd_drugs_dt)
metformin = glm(formula = result ~ whr + sex + age +
metformin,
family = "binomial", data = pd_drugs_dt)
atorvastatin = glm(formula = result ~ whr + sex + age +
atorvastatin,
family = "binomial", data = pd_drugs_dt)
pramipexole = glm(formula = result ~ whr + sex + age +
pramipexole,
family = "binomial", data = pd_drugs_dt)
glucosamine = glm(formula = result ~ whr + sex + age +
glucosamine,
family = "binomial", data = pd_drugs_dt)
lansoprazole = glm(formula = result ~ whr + sex + age +
lansoprazole,
family = "binomial", data = pd_drugs_dt)
bendroflumethiazide = glm(formula = result ~ whr + sex + age +
bendroflumethiazide,
family = "binomial", data = pd_drugs_dt)
ropinirole = glm(formula = result ~ whr + sex + age +
ropinirole,
family = "binomial", data = pd_drugs_dt)
levothyroxine = glm(formula = result ~ whr + sex + age +
levothyroxine,
family = "binomial", data = pd_drugs_dt)
PD_drugs_df = data.frame(drug = c("simvastatin","omeprazole","ramipril",
"metformin","atorvastatin",
"pramipexole",
"glucosamine","lansoprazole",
"bendroflumethiazide",
"ropinirole",
"levothyroxine"),
OR = c(summary(simvastatin)$coefficients[5,1],
summary(omeprazole)$coefficients[5,1],
summary(ramipril)$coefficients[5,1],
summary(metformin)$coefficients[5,1],
summary(atorvastatin)$coefficients[5,1],
summary(pramipexole)$coefficients[5,1],
summary(glucosamine)$coefficients[5,1],
summary(lansoprazole)$coefficients[5,1],
summary(bendroflumethiazide)$coefficients[5,1],
summary(ropinirole)$coefficients[5,1],
summary(levothyroxine)$coefficients[5,1]),
p_val = c(summary(simvastatin)$coefficients[5,4],
summary(omeprazole)$coefficients[5,4],
summary(ramipril)$coefficients[5,4],
summary(metformin)$coefficients[5,4],
summary(atorvastatin)$coefficients[5,4],
summary(pramipexole)$coefficients[5,4],
summary(glucosamine)$coefficients[5,4],
summary(lansoprazole)$coefficients[5,4],
summary(bendroflumethiazide)$coefficients[5,4],
summary(ropinirole)$coefficients[5,4],
summary(levothyroxine)$coefficients[5,4]))
PD_drugs_df
drug OR p_val
1 simvastatin 0.24650610 0.64874657
2 omeprazole 1.74720240 0.09091277
3 ramipril -1.27956865 0.23458240
4 metformin -0.20326644 0.85681866
5 atorvastatin -15.85869318 0.99028600
6 pramipexole -0.61906401 0.44903942
7 glucosamine -0.83502936 0.43646734
8 lansoprazole -0.90239215 0.39972759
9 bendroflumethiazide 0.03655082 0.96506836
10 ropinirole 0.45563021 0.52247216
11 levothyroxine -0.50047378 0.66130760
%%R
ukb_covid_death_covid_red.b_odds =
data.frame(cbind(exp(cbind(OR = coef(ukb_covid_death_covid_red.b),
conf_int_low = summary(ukb_covid_death_covid_red.b)$coefficients[,1] -
summary(ukb_covid_death_covid_red.b)$coefficients[,2],
conf_int_high = summary(ukb_covid_death_covid_red.b)$coefficients[,1] +
summary(ukb_covid_death_covid_red.b)$coefficients[,2])),
p_value = summary(ukb_covid_death_covid_red.b)$coefficients[,4]))
write.csv(ukb_covid_death_covid_red.b_odds,"data_output/ukb_covid_deathAndCOVID_odds_simplifiedModel.csv", row.names=TRUE)
ukb_covid_death_covid_red.b_odds$significance = "p-value > 0.05"
ukb_covid_death_covid_red.b_odds$significance[ukb_covid_death_covid_red.b_odds$p_value < 0.05] <- "p-value < 0.05"
ukb_covid_death_covid_red.b_odds$variables = row.names(ukb_covid_death_covid_red.b_odds)
library(ggplot2)
ggplot(ukb_covid_death_covid_red.b_odds,
aes(x=reorder(variables, OR), y=OR, color = significance)) +
geom_point(shape=21, size = 2) +
geom_errorbar(aes(ymin=conf_int_low, ymax=conf_int_high),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
theme_classic() +
geom_hline(yintercept = 1, linetype="dotted") +
coord_flip()+ylab("Mortality odds ratios, death and COVID-19, simplified model") +
xlab("")
ggsave('fig_out/UKB_deathAndCOVID_ORplot_simplifiedModel.png', width = 6, height = 4)