Coronavirus disease-2019 (COVID-19) is sweeping the globe. Despite multiple case-series, actionable knowledge to tailor decision-making proactively is missing.
Can a statistical model accurately predict infection with COVID-19?
Study Design and Methods
We developed a prospective registry of all patients tested for COVID-19 in Cleveland Clinic to create individualized risk prediction models. We focus here on the likelihood of a positive nasal or oropharyngeal COVID-19 test. A least absolute shrinkage and selection operator logistic regression algorithm was constructed that removed variables that were not contributing to the model’s cross-validated concordance index. After external validation in a temporally and geographically distinct cohort, the statistical prediction model was illustrated as a nomogram and deployed in an online risk calculator.
In the development cohort, 11,672 patients fulfilled study criteria, including 818 patients (7.0%) who tested positive for COVID-19; in the validation cohort, 2295 patients fulfilled criteria, including 290 patients who tested positive for COVID-19. Male, African American, older patients, and those with known COVID-19 exposure were at higher risk of being positive for COVID-19. Risk was reduced in those who had pneumococcal polysaccharide or influenza vaccine or who were on melatonin, paroxetine, or carvedilol. Our model had favorable discrimination (c-statistic = 0.863 in the development cohort and 0.840 in the validation cohort) and calibration. We present sensitivity, specificity, negative predictive value, and positive predictive value at different prediction cutoff points. The calculator is freely available at https://riskcalc.org/COVID19.
Prediction of a COVID-19 positive test is possible and could help direct health care resources. We demonstrate relevance of age, race, sex, and socioeconomic characteristics in COVID-19 susceptibility and suggest a potential modifying role of certain common vaccinations and drugs that have been identified in drug-repurposing studies.
The first infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the novel virus responsible for coronavirus disease 2019 (COVID-19) was reported in the United States on January 21, 2020.
Three months later, the US health care system and our society are struggling in an ever-changing environment of social distancing policies and projected utilization requirements, with constantly shifting treatment guidelines. A scientific approach to planning and delivering health care is sorely needed to match our limited resources with the persistently unmet demand. This supply-vs-demand gap is most obvious with diagnostic testing. Plagued with technical and regulatory challenges,
the production of COVID-19 test reagents and tests is lagging behind what is needed to fight a pandemic of this scale. Consequently, most hospitals are limiting testing to symptomatic patients and their own exposed health care workers. This is occurring at a time when experts are calling for expanding testing capabilities beyond symptomatic individuals to better measure the infection’s transmissibility, limit the spread by quarantine of those infected, and characterize COVID-19’s epidemiologic components.
Recent loosening of the Food and Drug Administration testing regulations and the development of point-of-care testing will make more tests available; however, given the anticipated demand, it is unlikely that testing supply will be enough. Even if enough testing supplies become available, indications driven by scientific data are still needed. Another challenge is the suboptimal diagnostic performance of the test,
which raises concerns about false-negative results complicating efforts to contain the pandemic. Unless we develop intelligent targeting of our testing capabilities, we will be handicapped significantly in our ability to make progress in assessing the extent of the disease, directing clinical care, and ultimately controlling COVID-19.
We developed a prospective registry aligning data collection for research with clinical care of all patients who are tested for COVID-19 in our integrated health system. We present here the first analysis of our Cleveland Clinic COVID-19 Registry, with the aim to develop and validate a statistical prediction model to guide utilization of this scarce resource by predicting an individualized risk of a “positive test.” A nomogram is a visual statistical tool that can take into account numerous variables to predict an outcome of interest for a patient.
We included all patients, regardless of age, who were tested for COVID-19 at all Cleveland Clinic locations in Ohio and Florida. Albeit imperfect, this provides better representation of the population than testing restricted to the Cleveland Clinic main campus. The Cleveland Clinic Institutional Review Board approval was obtained concurrently with the initiation of testing capabilities (IRB#20-283). The requirement for written informed consent was waived.
Cleveland Clinic COVID-19 Registry
Demographics, comorbidities, travel, and COVID-19 exposure history, medications, presenting symptoms, treatment, and disease outcomes are collected (e-Appendix 1). Registry variables were chosen to reflect available literature on COVID-19 disease characterization, progression, and proposed treatments, including medications proposed to have potential benefits through drug-repurposing studies.
Capture of detailed research data is facilitated by the creation of standardized clinical templates that are implemented across the health care system as patients were seeking care for COVID-19-related concerns.
Data were extracted via previously validated automated feeds
from our electronic health record (EPIC; EPIC Systems Corporation, Madison, WI) and manually by a study team trained on uniform sources for the study variables. Study data were collected and managed with the use of Research Electronic Data Capture (REDCap; Vanderbilt University, Nashville, TN) electronic data capture tools hosted at Cleveland Clinic.
The clinical framework for our testing practice is shown in Figure 1. As testing demand increased, we adapted our organizational policies and protocols to reconcile demand with patient and caregiver safety. This occurred in three phases.
Phase I (March 12-13, 2020)
We expanded primary care through telemedicine. If patients called for concerns that they had COVID-19, they were screened through a virtual visit with the use of Cleveland Clinic’s Express Care Online or called their primary care provider. If they needed to travel to our locations, we asked them to call ahead before arrival. Our goal was to limit exposure to caregivers and to ensure that physicians could order testing when appropriate, while following the Center for Disease Control testing recommendations. A doctor’s order was required for testing.
Phase II (March 14-17, 2020)
Drive-through testing was initiated on Saturday March 14. Patients still needed to have a doctor’s order for a COVID-19 test, similar to Phase I. Testing guidelines were similar to Phase I. On arrival at the drive-through location, patients stayed in their car, provided their doctor’s order, and remained in their car as samples were collected. Patients were tested regardless of their ability to pay and were not charged copays.
Phase III (March 18-onwards)
Given high testing demand, low initial testing yield, and backlog of tests awaiting to be processed, there was a shift to testing high-risk patients (Fig 1).
Processing of COVID tests
Test samples were obtained through naso- and oropharyngeal swabs; both were collected and pooled for testing. Tests were run with the use of the Centers for Disease Control and Prevention assay using Roche magnapure extraction and ABI 7500 DX PCR machines, as per the standard laboratory testing in our organization.
Data from 11,672 patients who were tested before April 2 were used to develop the model (development cohort). Baseline data are presented as median (interquartile range) and number (percentage). Continuous variables were compared with the use of the Mann-Whitney U test, and categoric variables were compared with the use of the chi-square test. A full multivariable logistic model was constructed initially to predict COVID-19 Nasopharyngeal Swab Test Result based on demographics, comorbidities, immunization history, symptoms, travel history, laboratory variables, and medications identified before testing. For modeling purposes, methods of missing value imputation for laboratory variables were compared with the use of median values and values from multivariate imputation by chained equations via the R package mice. Restricted cubic splines with 3 knots were applied to continuous variables to relax the linearity assumption. A least absolute shrinkage and selection operator (LASSO) logistic regression algorithm was performed to retain the most predictive features. A 10-fold cross validation method was applied to find the regularization parameter lambda, which gave the minimum mean cross-validated concordance index. Predictors with nonzero coefficients in the LASSO regression model were chosen for calculating predicted risk.
The final model was first internally validated by assessment of the discrimination and calibration with 1000 bootstrap resamples. The LASSO procedure, which included 10-fold cross validation for optimizing lambda, was repeated within each resample. We then validated it in a temporally and geographically distinct cohort of 2295 patients tested at the Cleveland Clinic hospitals in Florida from April 2-16, 2020. This was done to assess the model’s stability over time and its generalizability to another geographical region.
Discrimination was measured with the concordance index.
Calibration was assessed visually by plotting the nomogram predicted probabilities against the observed event proportions. The closer the calibration curve lies along the 45-degree line, the better the calibration. A scaled Brier score (index of prediction accuracy [IPA])
was also calculated, because this has some advantages over the more popular concordance index. The IPA ranges from -1 to 1, where a value of 0 indicates a useless model, and negative values imply a harmful model. Finally, decision curve analysis was conducted to inform clinicians about the range of threshold probabilities for which the prediction model might be of clinical value.
We then calculated sensitivity, specificity, positive predictive value, and negative predictive value for different recommended test cutoffs (Fig 2). We adhered to the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) checklist for prediction model development.
There were 11,672 patients who presented with symptoms of a respiratory tract infection or with other risk factors for COVID-19 before April 2, 2020, and who underwent testing according to the framework illustrated in Figure 1. The testing yield changed as the selection criteria became stricter (e-Fig 1). Between April 2 and 16, 2020, 2295 patients were tested in Florida (Florida validation cohort). The clinical characteristics of the development cohort and validation cohort are found in Table 1.
Table 1Baseline Demographic and Clinical Characteristics in 11,672 Patients Who Tested Positive vs Negative to COVID-19 in the Development Cohort in the Cleveland Clinic Health System before April 2, 2020, and a Validation Cohort of 2,295 Patients in the Florida Cleveland Clinic Health System Patients Tested Between April 2 and 16, 2020
Florida Validation Cohort
Physician discretion, No. (%)
Race, No. (%)
Ethnicity, No. (%)
Smoking, No. (%)
Age, median [IQR], y Missing: 0.3%
Exposure history: Yes, No. (%)
Exposed to COVID-19 ?
Family member with COVID-19?
Presenting symptoms: Yes, No. (%)
Shortness of breath?
Loss of appetite?
BMI, median [IQR], kg/m2 Missing: 43.3%
COPD/emphysema? Yes, No. (%)
Asthma? Yes, No. (%)
Diabetes mellitus? Yes, No. (%)
Hypertension? Yes, No. (%)
Coronary artery disease? Yes, No. (%)
Heart failure? Yes, No. (%)
Cancer? Yes, No. (%)
Transplantation history? Yes, No. (%)
Multiple sclerosis? Yes, No. (%)
Connective tissue disease? Yes, No. (%)
Inflammatory bowel disease? Yes, No. (%)
Immunosuppressive disease? Yes, No. (%)
Vaccination history: Yes, No. (%)
Pneumococcal polysaccharide vaccine?
Laboratory findings on presentation
Pretesting platelets, median [IQR], •••• Missing: 67.3%
Pretesting AST, median [IQR], •••• Missing: 72.9%
Pretesting BUN, median [IQR], •••• Missing: 67.2%
Pretesting chloride, median [IQR], •••• Missing: 67.2 %
Pretesting creatinine, median [IQR], •••• Missing: 67.2%
Pretesting hematocrit, median [IQR], •••• Missing: 67.3%
Pretesting potassium, median [IQR], ••••• Missing: 67.3%
Imputation methods were evaluated with 1000 repeated bootstrapped samples. We found that models based on median imputation appeared to outperform those based on data from multivariate imputation by chained equations imputation, so median imputation was selected for the basis of the final model. Variables that we looked at that were not found to add value beyond those included in our final model for the prediction of the COVID-19 test result included being a health care worker in Cleveland Clinic, fatigue, sputum production, shortness of breath, diarrhea, and transplantation history. The bootstrap-corrected concordance index in the development cohort was 0.863 (95% CI, 0.852-0.874), and the IPA was 20.9% (95% CI, 18.1%-23.7%). The concordance index in the Florida validation cohort was 0.839 (95% CI, 0.817-0.861), and the IPA was 18.7% (95% CI, 13.6%-23.9%). Figure 3 shows the calibration curves in the development and validation cohorts. In the development cohort, the predicted risk matches observed proportions for low predictions before the model begins to overpredict at high-risk levels. Calibration in the Florida validation cohort is acceptable, although predictions >40% become too high as the predicted probability increases.
Cut Off Definition
Given that the tool provides a probability that an individual subject will test positive, the challenge is to use the tool in practice. This usually would require choosing a cut off below which the risk is sufficiently low that the subject would not be tested. Figure 2 shows the tradeoff by plotting the proportion of negative tests avoided vs the proportion of positive tests retained as the cut off is increased. A decision curve analysis showed that, if the threshold of action is ≤1.3%, the model is not better than simply assuming everyone is “high risk.” However, once the threshold becomes >1.3%, using the model to determine who is high risk is preferable. The nomogram and its online version are shown in Figure 4.
The COVID-19 pandemic has impacted the world significantly, changing medical practice and our society. Some countries are now recovering from it, but many regions are just beginning to be affected. In the United States, some states are still preparing for a “surge” that may overwhelm the health care delivery system, while others are preparing to “reopen” and lift social distancing measures. In a “presurge” situation, resources needed to address every step of a patient’s trajectory through COVID-19 are limited, starting from testing through hospitalization and intensive care if needed. In a “pre-reopening” situation, tools to better identify individuals who are at risk of experiencing COVID-19 are sorely needed to inform policy.
We developed the Cleveland Clinic COVID-19 Registry to include all patients who were tested for COVID-19 (rather than just those with the disease) to better understand disease epidemiology and to develop nomograms, which are tools that go beyond cohort descriptions to individualize risk prediction for any given patient. This could empower front-line health care providers and inform decision-making, immediately impacting clinical care. We present here our first such nomogram, one that predicts the risk of a positive COVID-19 test. We want to emphasize that our work should not be interpreted as “accepting” or rationalizing inadequate testing capacity. Our tool should not take the pressure off being able to do what is right clinically for individual patients by expanding testing capabilities.
COVID-19 Testing Challenge
Available COVID-19 clinical literature is based mostly on small case series or descriptive cohort studies of patients already documented to have COVID-19
Yang X, Yu Y, Xu J, et al. Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. Lancet Respir Med. In press.
: this provides some information on the population that may be at greatest risk of adverse outcomes if they get infected with the virus but does little to inform us on who is most at risk to get infected. The proportion of COVID negative tests fell significantly in the patient population with stricter testing guidelines (e-Fig 1), but the yield remained very low, which suggests that our ability to differentiate COVID-19 clinically from other respiratory illnesses at the early stages of the disease is limited, further supporting the need for better tools to individualize testing indications.
COVID-19 Risk Factors
Some of our predictors for developing COVID confirm previous literature. For example, we corroborate a recent World Health Organization report that suggests that men may be at higher risk of experiencing COVID-19,
which is thought to reflect underlying hormonal or genetic risk. Our finding of a higher COVID-19 risk with advancing age can be explained by known age-related changes in the angiotensin-renin system in mice
that may facilitate infection with the SARS-CoV-2 virus, which binds to the host cells through angiotensin receptors. A family member with COVID-19 also increased the risk of testing positive in our cohort, which is consistent with familial disease clustering observed in China and highlights the limitations of disease containment strategies that focus on home lock-down without isolation of sick individuals. In addition, our study provides several unique insights that are made possible by our large sample size and our inclusion of a control cohort of patients who tested negative for COVID. The following list includes critical findings that ultimately were relevant to our model’s performance.
The lower risk of being COVID positive in Asian individuals relative to white individuals in our cohort is intriguing, given the higher rates of spread and disease severity that were observed in the western hemisphere now when compared with China.
The lower risk observed with pneumococcal polysaccharide vaccine and flu vaccine is also a unique finding. The mechanism could be biologic, related possibly to the documented sustained activation of Toll-Like Receptor 7 by the influenza vaccine
: Toll-Like Receptor 7 is critical for the binding of single-stranded RNA respiratory viruses, such as SARS-CoV-2, and may thus explain some cross protection. Alternatively, this correlation may just reflect safer health practices in general of people who seek and obtain vaccination.
The higher risk observed with poor socioeconomic status. Using the zip code, our team was able to infer estimated population per square kilometer and estimated median income from the 5-year American Community Survey dataset. The end year of the 5-year dataset was 2018. The critical role played by these variables in our final model emphasize the importance of social influencers of health and their influence on disparities in health care outcomes.
Most potentially impactful is the reduced risk of testing positive in patients who were on melatonin, carvedilol, and paroxetine, which are drugs identified in drug-repurposing studies to have a potential benefit against COVID-19.
It is unclear whether it has similar effects on ACE2 in lung endothelium. With ACE2 being key in the pathophysiologic findings of infection with SARS-CoV-2, our findings are intriguing.
These findings would have to be reproduced and validated in clinical trials before their full significance can be assessed. When interpreting our multivariable model, it is important to recognize that a single predictor cannot be interpreted in isolation. For example, it is artificial to claim that a drug is reducing risk because, in reality, other variables tend to be different for a patient who is on, or not on, a drug. Moving a patient on a nomogram axis, holding all other axes constant, is hypothetical, because he or she is likely moving on other axes when moved on one. This is the case for all multivariable statistical prediction models.
Model performance, as measured by the concordance index, is very good in the development and in the validation cohort (c-statistic = 0.863 and 0.839, respectively). This level of discrimination is clearly superior to a coin toss or assuming all patients are at equivalent risk (both c-statistics = 0.5). The internal calibration of the model is excellent at low predicted probabilities (Fig 3), but some regression to the mean is apparent at predictions >40% or so in the validation cohort. This would seem to be of little concern, that the model is overpredicting risk at that level, because this is considerably high risk clinically and likely beyond a threshold of action. Moreover, the metric that considers calibration, the IPA value, confirms that the model predicts better than chance or no model at all. The good performance of our model in a geographically distinct region (Florida), and over time (validation cohort in patients tested at a later timeframe) suggests that patterns and predictors identified in our model are likely consistent across health systems and regions, rather than specific to the unique spread of the virus within Cleveland’s social structures.
As with any predictive tool, the utility of a nomogram depends on the clinical context. The decision curve analysis suggests that, if the goal is to distinguish patients with a risk of 1.3% (or a higher cut off) vs those of higher risk, then the prediction model is useful. In other words, using the model to determine whom to test detects more true positives per test performed than does testing everyone as long as one is willing to test 1000 subjects to detect 13 cases. Any cut off choice involves tradeoffs of avoiding negative tests vs missing positive cases (Fig 2). Using a low prediction cut off (<1.3% from the tool) as a trigger to order testing will allow us to continue to identify a vast majority of COVID positive cases (assuming we maintain our other selection criteria for testing constant) while avoiding testing a large proportion of patients who are indeed COVID negative. This may be appropriate when testing supplies are abundant and one wants to comprehensively survey the extent of COVID-19 in the population. Conversely, in a resource-limited setting (eg, hospital facing a surge), a cut off ≥1.3% may be more appropriate to avoid unnecessary testing.
Available real-time reverse transcriptase polymerase chain reaction tests of nasopharyngeal swabs have been used typically for diagnosis, but data suggest suboptimal test performance because it detected only the SARS-CoV-2 virus in 63% of nasal swabs and 32% of pharyngeal swabs in patients with known disease.
In our study, we did both swabs, hoping to at least partly address this limitation. Although we performed validation of our model in a temporally and geographically distinct cohort, we acknowledge the fact that our results depend on the particular time and place that the data were collected. As the pandemic evolves, our results may not reflect updated distribution of the virus in any given region, and our model will need to be refit. To accommodate an ever-increasing COVID-19 prevalence, the model will need to be recalibrated and refit over time. Our online risk calculator is publicly available, but direct integration with the electronic health record can further improve its utility. The online calculator will reflect this updating. Our study is not designed to evaluate the very real issue of health care disparities, which would require a population-based approach for the study of health care delivery that is beyond the scope of the work presented here. Our conclusions are highly dependent on access to testing sites and doctors’ orders rather than population-based predictors of positive results.
We provide an online risk calculator that effectively can identify individualized risk of a positive COVID-19 test. Such a tool provides immediate benefit to the patients and health care providers as we face anticipated increased demand and limited resources but does not obviate the critical need for adequate testing. The scarcity of resources must not be accepted as an unalterable fact, and we should resist the inevitability of lack of resources and inequities in health care. We also provide some mechanistic and therapeutic insights.
Author contributions: L. J. is the guarantor of submission and participated in literature search, figures, study design, data collection, data interpretation, and writing; X. J. participated in data analysis and figures; A. M. participated in data collection and data analysis; S. E. participated in data interpretation, study design, and writing; B. R., S. G., and J. Y. participated in data interpretation and writing; and M. W. K. participated in literature search, study design, data interpretation, data analysis, and writing.
Financial/nonfinancial disclosures: The authors have reported to CHEST the following: A. M. reports personal fees from nPhase, during the conduct of the study; grants from Novo Nordisk , grants from Boehringer Ingelheim, grants from Merck , grants from Novartis , grants from NIH, outside the submitted work. M. W. K. reports personal fees from nPhase, during the conduct of the study; grants from Novo Nordisk, grants from Boehringer Ingelheim, grants from Merck, grants from Novartis, grants from NIH, consulting for Stratify Genomics and RenatlyxAI outside the submitted work. None declared (L. J., X. J., S. E., B. R., S. G., J. Y.).
Role of sponsors: The sponsor had no role in the design of the study, the collection and analysis of the data, or the preparation of the manuscript.
Additional information: The e-Appendix and e-Figure can be found in the Supplemental Materials section of the online article.
Yang X, Yu Y, Xu J, et al. Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. Lancet Respir Med. In press.