Genetic Associations and Architecture of Asthma-COPD Overlap

Background Some people have characteristics of both asthma and COPD (asthma-COPD overlap), and evidence suggests they experience worse outcomes than those with either condition alone. Research Question What is the genetic architecture of asthma-COPD overlap, and do the determinants of risk for asthma-COPD overlap differ from those for COPD or asthma? Study Design and Methods We conducted a genome-wide association study in 8,068 asthma-COPD overlap case subjects and 40,360 control subjects without asthma or COPD of European ancestry in UK Biobank (stage 1). We followed up promising signals (P < 5 × 10–6) that remained associated in analyses comparing (1) asthma-COPD overlap vs asthma-only control subjects, and (2) asthma-COPD overlap vs COPD-only control subjects. These variants were analyzed in 12 independent cohorts (stage 2). Results We selected 31 independent variants for further investigation in stage 2, and discovered eight novel signals (P < 5 × 10–8) for asthma-COPD overlap (meta-analysis of stage 1 and 2 studies). These signals suggest a spectrum of shared genetic influences, some predominantly influencing asthma (FAM105A, GLB1, PHB, TSLP), others predominantly influencing fixed airflow obstruction (IL17RD, C5orf56, HLA-DQB1). One intergenic signal on chromosome 5 had not been previously associated with asthma, COPD, or lung function. Subgroup analyses suggested that associations at these eight signals were not driven by smoking or age at asthma diagnosis, and in phenome-wide scans, eosinophil counts, atopy, and asthma traits were prominent. Interpretation We identified eight signals for asthma-COPD overlap, which may represent loci that predispose to type 2 inflammation, and serious long-term consequences of asthma.

Asthma and COPD have substantial global impacts. 1 They are heterogeneous conditions 2-4 that share some common features, including airflow obstruction with differing degrees of reversibility. Inflammatory processes are important in both conditions, and cytokine profiles identify both distinct and overlapping groups of patients. 5 People with characteristics of both conditions have until recently been referred to as having "asthma-COPD overlap" (ACO), 4 and a number of studies have suggested that such patients have significantly worse outcomes than those with either condition alone. [6][7][8][9][10][11][12][13] Guidelines emphasize that asthma and COPD are different conditions, but may coexist in the same patient. 14 People with features of both diseases risk being excluded from research that might provide evidence about the most effective treatment strategies. 3 Take-home Points Study Question: What are the genetic determinants of risk for asthma-COPD overlap, and how do these differ from those for COPD or asthma? Results: We discovered eight novel signals for asthma-COPD overlap in a meta-analysis of 12,369 case subjects and 88,969 control subjects; most signals suggested a spectrum of shared genetic influences on asthma, COPD, or lung function, and in phenome-wide scans of these signals, eosinophil counts, atopy, and asthma traits were prominent. Interpretation: We identified eight signals for asthma-COPD overlap, not driven by smoking or age at asthma diagnosis, which may represent loci that predispose to type 2 inflammation, and serious longterm consequences of asthma.
ABBREVIATIONS: ACO = asthma-COPD overlap; EAF = effect allele frequency; eQTL = expression quantitative trait locus; GWAS = genomewide association study; LDSC = linkage disequilibrium score regression; r g = genetic correlation; SNP = single-nucleotide polymorphism Environmental risk factors-notably smoking in COPD-are central, but genetics also plays an important role in both asthma and COPD, [15][16][17] and it has long been hypothesized that there may be a shared, underlying genetic predisposition to both diseases. 2,18 Genome-wide association studies (GWASs) examine variants across the genome in an unbiased manner, to identify varianttrait associations that inform our understanding of disease biology and potential treatment strategies. GWASs have identified many loci associated with asthma or COPD in European populations (e-Appendix 1). The genetic correlation (r g ) between asthma and COPD is 0.38 (P ¼ 6.2 Â 10 À5 ), suggesting a shared genetic etiology. 19 A GWAS of ACO compared with COPD alone (n ¼ 3,570) did not identify any variants associated at the conventional threshold, 8 and a meta-analysis of an asthma and COPD GWAS found one association, driven by COPD. 20 Eighteen loci outside the HLA (human leukocyte antigen) region have been identified as associated with both asthma and lung function/COPD at P < 5 Â 10 -8 , but have not been specifically described as ACO loci.
Notwithstanding the controversies of changing terminology for people with both asthma and COPD, we refer to this case status as "ACO." Improved knowledge of genetic variants associated with coexisting asthma and COPD would contribute to our understanding of underlying molecular pathways, and potentially inform diagnostic terminology and specific management strategies for those with coexisting asthma and COPD. Accordingly, using spirometry, self-report, and electronic health care record data to define case subjects with both asthma and COPD (ACO) and suitable control subjects, we undertook the largest GWAS of coexisting asthma and COPD to date, including up to 12,369 case subjects and 88,969 control subjects, in a two-stage design incorporating 13 studies.

Stage 1
The data source for this study was UK Biobank. 21 Eligibility criteria, genotyping, and quality control are described in e-Appendix 1. A total of 321,057 people and 37 million single-nucleotide polymorphisms (SNPs) were included.
We defined cases of ACO if patients had self-reported asthma (see e-Appendix 1) and FEV 1 /FVC < 0.7 with Global Initiative for Chronic Obstructive Lung Disease (GOLD) 2þ airflow limitation (FEV 1 < 80% predicted). Case subjects who reported alpha-1 antitrypsin deficiency were excluded. Control subjects reported no asthma or COPD (e-Appendix 1), and had FEV 1 $ 80% predicted and FEV 1 /FVC > 0.7. Five control subjects were randomly selected for each case. Case subjects and control subjects were unrelated (second degree or closer). Two additional control sets were defined for signal prioritization: people with asthma but without COPD, and people with COPD but without asthma. Asthma and COPD were defined as above.
Association testing was undertaken in SNPTEST, version 2.5.2 ("score" option), 22 under an additive model. Age, sex, smoking status (ever/ never), genotyping array, and 10 principal components were included as covariates. Variants were filtered on the basis of minor allele frequency (MAF) > 0.01 and imputation quality (INFO) > 0.5. P values and SEs were adjusted for the linkage disequilibrium score regression (LDSC) intercept 23 (e- Fig 1).
In stage 1, we defined distinct signals passing a P value threshold of P < 5 Â 10 -6 . We defined regions of association around the most strongly associated variant (sentinel variant) AE 1 Mb. To identify distinct signals, and additional signals within the regions described above, conditional analyses were undertaken with GCTA-COJO 24 (e-Appendix 1, e- Fig 2).
Two further "signal prioritization" analyses were undertaken to ascertain the extent to which signals were driven by association with COPD and/or asthma alone. These included the same cases as the primary analysis, plus the two additional control sets described above. Variants were selected for follow-up in stage 2 if they were associated at P < 5 Â 10 -6 in the main stage 1 analysis and at P < .01 in both signal prioritization analyses.

Stage 2 and Joint Analysis
SNPs identified in stage 1 signal prioritization analyses were tested for association in 12 independent studies of European ancestry (up to 4 Case subjects had both asthma and COPD. Asthma was defined as any lifetime self-report of asthma, or asthma diagnosis in the health care record, including billing codes (see e-Appendix 1 for further details and validation). 25 All case subjects had spirometry results indicating FEV 1 /FVC < 0.7, and FEV 1 < 80% predicted. All control subjects had FEV 1 /FVC > 0.7, FEV 1 $ 80% predicted, and no asthma diagnosis. When possible, studies excluded people with alpha-1 antitrypsin deficiency.
Details of statistical analysis in stage 2 studies are presented in e-Appendix 1 (e- Table 3). Results were combined across stage 2 studies, using fixed-effect meta-analysis. Heterogeneity was assessed using the I 2 statistic. We combined these results with those from UK Biobank (stage 1).
We performed a sensitivity analysis to assess whether the way COPD was defined changed our stage 2 results (e-Appendix 1).
To assess whether associations with our stage 1 signals changed according to age at asthma diagnosis, we divided case subjects into those who self-reported their age at asthma diagnosis as < 12 years, chestjournal.org and > 25 years. 26 We then repeated the association tests in UK Biobank. In addition, we repeated association testing after stratifying our sample into ever/never smokers.

Definition of Top Signals for Bioinformatic Analyses
We undertook bioinformatic analyses on ACO signals reaching P < 5 Â 10 -8 in the joint analysis of stages 1 and 2, and which also had a lower P value in the joint analysis than in UK Biobank (stage 1) alone or had P < .05 in stage 2. For each of these, we identified the set of SNPs that was 99% likely to contain the causal variant ("99% credible set"), assuming that the causal variant was included in the data set (e-Appendix 1). 27 For bioinformatic analysis methods, see e-Appendix 1.

Approvals
The research was conducted using UK Biobank, under application 648. UK Biobank has ethical approval from the UK National Health Service National Research Ethics Service (11/NW/0382). All included studies were approved by the appropriate research ethics committee or institutional review board (e-Appendix 1). All participants gave informed consent.

Results
In stage 1, 8,068 ACO case subjects were selected from UK Biobank, and 40,360 as healthy control subjects free of asthma and COPD. For signal prioritization analyses, another 16,762 people were selected as control subjects with COPD alone (without asthma), and 26,815 as control subjects with asthma alone (without COPD). Descriptive statistics for case subjects and control subjects are presented in Table 1. ACO case subjects were slightly older than healthy control subjects, and included more men and ever smokers.
After filtering on MAF and INFO, 7,693,381 variants were analyzed. The LDSC intercept was 1.018, suggesting that results were not strongly inflated due to population structure (e- Fig 1). 28

ACO Association Signals
In stage 1, there were 83 distinct signals at P < 5 Â 10  Table 4 for results). Of these, 31 retained significance (P < .01) in signal prioritization analyses comparing ACO case subjects separately with either COPD case subjects or asthma case subjects, to determine whether signals were driven by asthma or COPD alone (e- Table 4). In stage 2, comprising 12 independent cohorts (4,301 case subjects, 48,609 control subjects) (e-Tables 1, 2), 26 of 31 signals had a direction of effect concordant with stage 1 (e- Table 5), and the median value for heterogeneity (I 2 ) across these signals was 15%. Although the sample size of participants of African American ethnicity was small (297 case subjects, 1,335 control subjects) and CIs were broad, 22 of 31 signals had a direction of effect consistent with the European ancestry studies (e- Table 5).
Results for the stage 2 sensitivity analysis (9,638 case subjects and 128,273 control subjects from 15 studies), in which COPD was defined either by available spirometry or, alternatively, by electronic health care record diagnoses (e-Appendix 1), are in e- Table 6.

Subgroup Analyses
Effect sizes for the 31 signals among case subjects with childhood-onset asthma were highly correlated with those among people with adult-onset asthma (R ¼ 0.883) (e- Table 7, e- Fig 3). Effect sizes in ever and never smokers were also closely correlated (R ¼ 0.911) (e- Table 7, e- Fig 4).

Eight Top Signals for ACO Defined From Joint Analysis
After meta-analysis combining stage 1 and stage 2, 13 signals were genome-wide significant (P < 5 Â 10 -8 ) (e- Table 4; e- Fig 2 for flow diagram). Of these, eight either had a lower P value in the joint analysis than in stage 1 alone, or P < .05 in stage 2 studies alone ( Table 2, e-Figs 5, 6). None of these eight signals were previously reported as associated specifically with ACO. 8 For the novel intergenic ACO signal on chromosome 5 (rs80101740, effect allele frequency [EAF], 0.015; OR, 1.42; P ¼ 3.72 Â 10 -8 ) (e- Table 5), which has not been previously associated with asthma, lung function, or COPD, the sentinel SNP had the largest posterior probability (0.77) of being the true causal variant, assuming the causal variant was genotyped/imputed (e- Table 8). There was no evidence of colocalization with expression quantitative trait locus (eQTL) signals at this locus (e-Tables 9, 10), and no chromatin interactions were identified.
Four of our novel signals for ACO were previously reported for asthma but not COPD/lung function. [35][36][37] For rs35570272 in GLB1 (OR, 1.10; EAF, 0.398; P ¼ 2.44 Â 10 -9 ), there were 11 SNPs in the credible set, and the intronic sentinel SNP had the highest posterior probability (0.655). There were significant chromatin interactions nearby in GLB1 in fetal lung fibroblasts. chestjournal.org GLB1 encodes the b-galactosidase enzyme involved in lysosomal function, and an elastin-binding protein involved in extracellular elastic fiber formation. Two SNPs (both with a posterior probability of w0.13) in the 99% credible set, rs7646283 and rs34064757, were eQTLs for the gene encoding cartilage-associated protein (CRTAP) in lung (e- Table 9), implicated in bone development and osteogenesis imperfecta.
Another signal (previously reported for asthma) 26,35 was rs16903574 (EAF, 0.077; OR, 1.20; P ¼ 3.8 Â 10 -10 ), a missense variant in FAM105A, deleterious according to its combined annotation-dependent depletion (CADD) score (22.6). 38 FAM105A encodes a pseudoenzyme, possibly involved in protein-protein interactions. 39 This sentinel had a posterior probability of 0.99. A previous study in asthma predicted FAM105A as the target based on chromatin interactions and correlation between enhancer epigenetic marks and gene expression, although we did not identify any eQTL evidence in lung or whole blood. 35 We also identified a highly significant chromatin interaction in fetal lung fibroblasts overlapping FAM105A and another nearby gene (TRIO), but not in adult lung.
An intergenic signal between PHB and ZNF652 (rs2584662) (EAF, 0.42; OR, 0.92; P ¼ 2.21 Â 10 -8 ) was previously associated with asthma and reported as a blood eQTL for GNGT2 (implicated in NF-kB activation), 29,35 although we did not identify this in our eQTL analysis. In our analysis, eight SNPs were in the credible set (posterior probabilities all # 0.2). Hi-C data suggested a significant chromatin interaction in ZNF652, with another, less significant peak close to GNGT2. Nearby loci in ZNF652 have previously been associated with asthma/allergic disease and moderate-to-severe asthma. 35 We also identified rs1837253, an intergenic signal near TSLP (EAF, 0.739; OR, 1.16; P ¼ 1.53 Â 10 -18 ), with a posterior probability of 1, that is, the only variant in the credible set. No eQTL evidence was identified. There were highly significant chromatin interactions with SLC25A46 in fetal lung fibroblasts and in adult lung tissue, and with a region between TSLP and SLC25A46 in fetal lung fibroblasts only. The gene encoding thymic stromal lymphopoietin (TSLP) was implicated in asthma and allergic disease before the GWAS era, 40 and an anti-TSLP antibody has been trialled in allergic asthma. 41 Another signal, rs6787279 in IL17RD (EAF, 0.169; OR, 0.89; P ¼ 7.87 Â 10 -9 ), has been previously reported for lung function and COPD. 31,42 There were 55 variants in the credible set, all with posterior probability # 0.12, meaning it is not yet possible to fine-map this signal confidently. One SNP in the credible set was exonic and possibly damaging (rs17057718), but the posterior probability was only 0.012. Multiple SNPs at this locus were eQTLs for IL17RD in lung, with the ACO risk allele corresponding to decreased IL17RD expression. IL17RD  . The x axis shows genomic location by chromosome, the y axis shows the -log 10 P value, corrected for the intercept of linkage disequilibrium score regression (1.018). The eight top signals (from joint analysis) are highlighted in red, and labeled with rsIDs (reference SNP [single-nucleotide polymorphism] ID numbers). The black line indicates P ¼ 5 Â 10 -8 (commonly known as genome-wide significance), and the dotted line corresponds to P ¼ 5 Â 10 -6 (genome-wide suggestive threshold). A quantile-quantile plot is shown in e- Figure 1. For further details on the eight SNPs shown here, see also Table 2. is in the IL-17 signaling pathway, which is implicated in asthma, 43 and in COPD pathogenesis, 44,45 potentially by mediating effects of cigarette smoke.
Two ACO signals have previously been reported separately for both asthma and lung function or COPD: rs9273410 in HLA-DQB1 (EAF, 0.445; OR, 1.20; P ¼ 9.19 Â 10 -28 ) and rs3749833 in C5orf56 (EAF, 0.263; OR, 1.12; P ¼ 9.37 Â 10 -12 ). HLA-DQB1 encodes a major histocompatibility complex type II molecule involved in antigen presentation. HLA-DQB1 alleles are associated with numerous inflammatory and autoimmune diseases. In our analysis, the sentinel was the only SNP in the credible set. For lung function, an amino acid change in the gene product HLA-DQb1 has been identified as the main driver of signals in the major histocompatibility complex region. 33 Analyses in asthma have identified HLA-DQA1 as the likely driver gene. 35 C5orf56 is located on a cytokine gene cluster on chromosome 5, including IL3, IL4, and IL5. Several interleukins in this region have been considered as therapeutic targets in asthma. In severe eosinophilic asthma, the anti-IL-5 monoclonal antibodies mepolizumab and reslizumab reduce exacerbations and improve quality of life. [46][47][48] SNPs in the credible set were eQTLs in lung and/or blood for SLC22A5, AC116366.6, RAD50, and a noncoding Y RNA. SLC22A4 has been identified as the most likely candidate gene for the lung function association. 33 The gene products of SLC22A4 and SLC22A5 are involved in bronchial uptake of bronchodilators and anticholinergic drugs. 49 An analysis in asthma predicted C5orf56 (which encodes the  interferon regulatory factor 1 antisense RNA, IRF1-AS1) as the causal gene. 35 In our phenome-wide scan, all ACO loci previously associated with asthma showed association with blood cell counts, particularly eosinophils and neutrophils, and atopic traits (e- Table 11). The HLA locus was associated with a wide range of autoimmune/inflammatory traits. Another locus (rs2584662, near PHB and ZNF652) was associated with anthropometric traits, cardiovascular phenotypes, and chronic diseases/multimorbidity, whereas rs3749833 (near C5orf56) was associated with anthropometric traits and inflammatory bowel disease. SNPs in the credible set for the intergenic chromosome 5 signal (rs80101740) were associated with cardiovascular and a range of other traits.

Discussion
We conducted the largest GWAS of ACO to date, and identified 83 independent signals associated at P < 5 Â 10 -6 in stage 1. After excluding variants associated with asthma only or COPD only, we studied 31 variants in stage 2, with eight distinct signals for ACO showing genome-wide significance (P < 5 Â 10 -8 ) in a stage 1 and stage 2 meta-analysis.
Our study contributes to understanding of the genetic architecture of ACO. We showed strong genetic correlation between ACO and COPD/lung function, and between ACO and asthma, especially moderate-severe asthma. Furthermore, we showed that the genetic correlation of blood eosinophil counts with ACO was more similar in magnitude to the genetic correlation of eosinophils with asthma than of eosinophils with FEV 1 / FVC and COPD. Increased eosinophils are associated with asthma and COPD exacerbations, [50][51][52] and with lung function decline in subjects with and without asthma. 53 Eosinophil counts, atopy, and asthma traits were prominent in phenome-wide scans of our top eight signals, consistent with an important role for type 2 inflammation in ACO. 54,55 One intergenic signal on chromosome 5 (rs80101740) is not previously reported as associated with asthma, COPD, or lung function. Although near to a putative signal for lung function without replication support (rs377731, r 2 ¼ 0.02 with rs80101740), 33 the ACO sentinel was independent of this lung function signal in conditional analyses. Evidence from eQTL studies suggests that the nearby lung function signal is associated with RGMB and LINC02062 expression.
Four of the eight signals identified as novel (GLB1, FAM105A, PHB, TSLP) are known signals for asthma or allergic disease, but not COPD. Our results suggest that these loci also have a role in COPD. All four have been associated with child-and adult-onset asthma, and could represent an opportunity to intervene in early life to prevent serious long-term sequelae. 26 One ACO signal (IL17RD) is a known lung function and COPD locus; our findings demonstrate its relevance in reversible airflow obstruction. Together, these loci could represent targets for intervention, potentially to prevent development of fixed airflow obstruction.
Two signals are known to be associated with asthma and COPD/lung function, including the HLA-DQB1 locus (the first signal identified as associated with both asthma and COPD), and a signal at C5orf56, encoding IRF1-AS1, on chromosome 5, near a cytokine gene cluster.
In subgroup analyses, there was a strong positive correlation between stage 1 effect sizes for ACO in ever and never smokers, suggesting that ACO is not due solely to smoking in people with asthma, although childhood asthma in smokers increases COPD risk compared with nonasthmatics, possibly via early lung development. 56 Similarly, when stratifying by childvs adult-onset asthma, there was a strong correlation between effect sizes in both groups. Nevertheless, for some of the eight top signals, we found evidence of chromatin interactions in fetal but not adult lung. Although this may implicate developmental processes in chestjournal.org ACO, inference is difficult, due to differences in experimental conditions, sample sizes, and reporting practices. Clearer conclusions may become possible as functional genomic assays advance.
Our study has some potential limitations. The stage 2 sample size (4,301 case subjects) was substantial, although relatively underpowered compared with stage 1 (8,068 case subjects). All signals reported met commonly adopted criteria for genome-wide significance, but stricter criteria are starting to be used for genome sequencing studies 57 ; future work using sequence data would provide an opportunity to reevaluate the genomic regions we highlight. Misclassification of asthma and COPD diagnoses is possible: asthma in older patients may mimic COPD, and physicians may be less likely to suspect COPD in nonsmokers. To mitigate this, we used GOLD 2þ spirometric criteria to define COPD wherever possible, and note that self-reported asthma has been shown to accurately identify subjects with clinical and genetic characteristics of asthma. 56 We hypothesize that any remaining misclassification would attenuate effect estimates toward the null, that is, reduce power to detect true genetic associations with ACO. Our main analysis was undertaken in European ancestry populations only; although for many loci there was good concordance in a small sample of participants of African American ethnicity, it is essential to study this trait further in diverse populations.

Interpretation
In the largest genome-wide association study to date, we identified eight signals associated with ACO. Our findings suggest a spectrum of shared genetic influences, from variants predominantly influencing asthma to those predominantly influencing fixed airflow obstruction. We focus on variants that tend toward an intermediate phenotype with features of both asthma and fixed airflow obstruction, with pathways implicating innate and adaptive immunity and potentially bone development, and signals for which the biology remains unclear. Further biological understanding is likely to be important for therapeutics to prevent the development of fixed airflow obstruction among people with asthma.

Role of sponsors:
The funders had no role in the design of the analyses or conduct of the study.