Forecasting has emerged as an important component of informed, data-driven decision-making in a wide array of fields. We introduce a new data model for probabilistic predictions that encompasses a wide range of forecasting settings. This framework clearly defines the constituent parts of a probabilistic forecast and proposes one approach for representing these data elements. The data model is implemented in Zoltar, a new software application that stores forecasts using the data model and provides standardized API access to the data. In one real-time case study, an instance of the Zoltar web application was used to store, provide access to, and evaluate real-time forecast data on the order of 107 rows, provided by over 20 international research teams from academia and industry making forecasts of the COVID-19 outbreak in the US. Tools and data infrastructure for probabilistic forecasts, such as those introduced here, will play an increasingly important role in ensuring that future forecasting research adheres to a strict set of rigorous and reproducible standards.
With an estimated $10 . 4 billion in medical costs and 31.4 million outpatient visits each year, influenza poses a serious burden of disease in the United States. To provide insights and advance warning into the spread of influenza, the U.S. Centers for Disease Control and Prevention (CDC) runs a challenge for forecasting weighted influenza-like illness (wILI) at the national and regional level. Many models produce independent forecasts for each geographical unit, ignoring the constraint that the national wILI is a weighted sum of regional wILI, where the weights correspond to the population size of the region. We propose a novel algorithm that transforms a set of independent forecast distributions to obey this constraint, which we refer to as probabilistically coherent. Enforcing probabilistic coherence led to an increase in forecast skill for 90% of the models we tested over multiple flu seasons, highlighting the importance of respecting the forecasting system's geographical hierarchy.
For practical reasons, many forecasts of case, hospitalization and death counts in the context of the current COVID-19 pandemic are issued in the form of central predictive intervals at various levels. This is also the case for the forecasts collected in the COVID-19 Forecast Hub run by the UMass-Amherst Influenza Forecasting Center of Excellence. Forecast evaluation metrics like the logarithmic score, which has been applied in several infectious disease forecasting challenges, are then not available as they require full predictive distributions. This note provides an overview of how established methods for the evaluation of quantile and interval forecasts can be applied to epidemic forecasts. Specifically, we discuss the computation and interpretation of the weighted interval score, which is a proper score that approximates the continuous ranked probability score. It can be interpreted as a generalization of the absolute error to probabilistic forecasts and allows for a simple decomposition into a measure of sharpness and penalties for over- and underprediction.
Short-term probabilistic forecasts of the trajectory of the COVID-19 pandemic in the United States have served as a visible and important communication channel between the scientific modeling community and both the general public and decision-makers. Forecasting models provide specific, quantitative, and evaluable predictions that inform short-term decisions such as healthcare staffing needs, school closures, and allocation of medical supplies. In 2020, the COVID-19 Forecast Hub (https://covid19forecasthub.org/) collected, disseminated, and synthesized hundreds of thousands of specific predictions from more than 50 different academic, industry, and independent research groups. This manuscript systematically evaluates 23 models that regularly submitted forecasts of reported weekly incident COVID-19 mortality counts in the US at the state and national level. One of these models was a multi-model ensemble that combined all available forecasts each week. The performance of individual models showed high variability across time, geospatial units, and forecast horizons. Half of the models evaluated showed better accuracy than a naïve baseline model. In combining the forecasts from all teams, the ensemble showed the best overall probabilistic accuracy of any model. Forecast accuracy degraded as models made predictions farther into the future, with probabilistic accuracy at a 20-week horizon more than 5 times worse than when predicting at a 1-week horizon. This project underscores the role that collaboration and active coordination between governmental public health agencies, academic modeling teams, and industry partners can play in developing modern modeling capabilities to support local, state, and federal response to outbreaks.
High quality epidemic forecasting and prediction are critical to support response to local, regional and global infectious disease threats. Other fields of biomedical research use consensus reporting guidelines to ensure standardization and quality of research practice among researchers, and to provide a framework for end-users to interpret the validity of study results. The purpose of this study was to determine whether guidelines exist specifically for epidemic forecast and prediction publications.
We undertook a formal systematic review to identify and evaluate any published infectious disease epidemic forecasting and prediction reporting guidelines. This review leveraged a team of 18 investigators from US Government and academic sectors.
A literature database search through May 26, 2019, identified 1467 publications (MEDLINE n = 584, EMBASE n = 883), and a grey-literature review identified a further 407 publications, yielding a total 1777 unique publications. A paired-reviewer system screened in 25 potentially eligible publications, of which two were ultimately deemed eligible. A qualitative review of these two published reporting guidelines indicated that neither were specific for epidemic forecasting and prediction, although they described reporting items which may be relevant to epidemic forecasting and prediction studies.
This systematic review confirms that no specific guidelines have been published to standardize the reporting of epidemic forecasting and prediction studies. These findings underscore the need to develop such reporting guidelines in order to improve the transparency, quality and implementation of epidemic forecasting and prediction research in operational public health.
The COVID-19 pandemic emerged in late December 2019. In the first six months of the global outbreak, the US reported more cases and deaths than any other country in the world. Effective modeling of the course of the pandemic can help assist with public health resource planning, intervention efforts, and vaccine clinical trials. However, building applied forecasting models presents unique challenges during a pandemic. First, case data available to models in real-time represent a non-stationary fraction of the true case incidence due to changes in available diagnostic tests and test-seeking behavior. Second, interventions varied across time and geography leading to large changes in transmissibility over the course of the pandemic. We propose a mechanistic Bayesian model (MechBayes) that builds upon the classic compartmental susceptible-exposed-infected-recovered (SEIR) model to operationalize COVID-19 forecasting in real time. This framework includes non-parametric modeling of varying transmission rates, non-parametric modeling of case and death discrepancies due to testing and reporting issues, and a joint observation likelihood on new case counts and new deaths; it is implemented in a probabilistic programming language to automate the use of Bayesian reasoning for quantifying uncertainty in probabilistic forecasts. The model has been used to submit forecasts to the US Centers for Disease Control, through the COVID-19 Forecast Hub. We examine the performance relative to a baseline model as well as alternate models submitted to the Forecast Hub. Additionally, we include an ablation test of our extensions to the classic SEIR models. We demonstrate a significant gain in both point and probabilistic forecast scoring measures using MechBayes when compared to a baseline model. We show that MechBayes ranks as one of the top models out of those submitted to the COVID-19 Forecast Hub. Finally, we demonstrate that MechBayes performs significantly better than the classical SEIR model.
In Southeast Asia, endemic dengue follows strong spatio-temporal patterns with major epidemics occurring every 2-5 years. However, important spatio-temporal variation in seasonal dengue epidemics remains poorly understood. Using 13 years (2003-2015) of dengue surveillance data from 926 districts in Thailand and wavelet analysis, we show that rural epidemics lead urban epidemics within a dengue season, both nationally and within health regions. However, local dengue fade-outs are more likely in rural areas than in urban areas during the off season, suggesting rural areas are not the source of viral dispersion. Simple dynamic models show that stronger seasonal forcing in rural areas could explain the inconsistency between earlier rural epidemics and dengue “over wintering” in urban areas. These results add important nuance to earlier work showing the importance of urban areas in driving multi-annual patterns of dengue incidence in Thailand. Feedback between geographically linked locations with markedly different ecology is key to explaining full disease dynamics across urban-rural gradient.
During early stages of the COVID-19 pandemic, forecasts provided actionable information about disease transmission to public health decision-makers. Between February and May 2020, experts in infectious disease modeling made weekly predictions about the impact of the pandemic in the U.S. We aggregated these predictions into consensus predictions. In March and April 2020, experts predicted that the number of COVID-19 related deaths in the U.S. by the end of 2020 would be in the range of 150,000 to 250,000, with scenarios of near 1m deaths considered plausible. The wide range of possible future outcomes underscored the uncertainty surrounding the outbreak’s trajectory. Experts’ predictions of measurable short-term outcomes had varying levels of accuracy over the surveys but showed appropriate levels of uncertainty when aggregated. An expert consensus model can provide important insight early on in an emerging global catastrophe.
Background: The COVID-19 pandemic has driven demand for forecasts to guide policy and planning. Previous research has suggested that combining forecasts from multiple models into a single “ensemble” forecast can increase the robustness of forecasts. Here we evaluate the real-time application of an open, collaborative ensemble to forecast deaths attributable to COVID-19 in the U.S. Methods: Beginning on April 13, 2020, we collected and combined one- to four-week ahead forecasts of cumulative deaths for U.S. jurisdictions in standardized, probabilistic formats to generate real-time, publicly available ensemble forecasts. We evaluated the point prediction accuracy and calibration of these forecasts compared to reported deaths. Results: Analysis of 2,512 ensemble forecasts made April 27 to July 20 with outcomes observed in the weeks ending May 23 through July 25, 2020 revealed precise short-term forecasts, with accuracy deteriorating at longer prediction horizons of up to four weeks. At all prediction horizons, the prediction intervals were well calibrated with 92-96% of observations falling within the rounded 95% prediction intervals. Conclusions: This analysis demonstrates that real-time, publicly available ensemble forecasts issued in April-July 2020 provided robust short-term predictions of reported COVID-19 deaths in the United States. With the ongoing need for forecasts of impacts and resource needs for the COVID-19 response, the results underscore the importance of combining multiple probabilistic models and assessing forecast skill at different prediction horizons. Careful development, assessment, and communication of ensemble forecasts can provide reliable insight to public health decision makers.
Importance: Efforts to track the severity and public health impact of coronavirus disease 2019 (COVID-19) in the United States have been hampered by state-level differences in diagnostic test availability, differing strategies for prioritization of individuals for testing, and delays between testing and reporting. Evaluating unexplained increases in deaths due to all causes or attributed to nonspecific outcomes, such as pneumonia and influenza, can provide a more complete picture of the burden of COVID-19. Objective: To estimate the burden of all deaths related to COVID-19 in the United States from March to May 2020. Design, Setting, and Population: This observational study evaluated the numbers of US deaths from any cause and deaths from pneumonia, influenza, and/or COVID-19 from March 1 through May 30, 2020, using public data of the entire US population from the National Center for Health Statistics (NCHS). These numbers were compared with those from the same period of previous years. All data analyzed were accessed on June 12, 2020. Main Outcomes and Measures: Increases in weekly deaths due to any cause or deaths due to pneumonia/influenza/COVID-19 above a baseline, which was adjusted for time of year, influenza activity, and reporting delays. These estimates were compared with reported deaths attributed to COVID-19 and with testing data. Results: There were approximately 781 000 total deaths in the United States from March 1 to May 30, 2020, representing 122 300 (95% prediction interval, 116 800-127 000) more deaths than would typically be expected at that time of year. There were 95 235 reported deaths officially attributed to COVID-19 from March 1 to May 30, 2020. The number of excess all-cause deaths was 28% higher than the official tally of COVID-19–reported deaths during that period. In several states, these deaths occurred before increases in the availability of COVID-19 diagnostic tests and were not counted in official COVID-19 death records. There was substantial variability between states in the difference between official COVID-19 deaths and the estimated burden of excess deaths. Conclusions and Relevance: Excess deaths provide an estimate of the full COVID-19 burden and indicate that official tallies likely undercount deaths due to the virus. The mortality burden and the completeness of the tallies vary markedly between states.
Forecasts support decision making in a variety of applications. Statistical models can produce accurate forecasts given abundant training data, but when data is sparse or rapidly changing, statistical models may not be able to make accurate predictions. Expert judgmental forecasts—models that combine expert‐generated predictions into a single forecast—can make predictions when training data is limited by relying on human intuition. Researchers have proposed a wide array of algorithms to combine expert predictions into a single forecast, but there is no consensus on an optimal aggregation model. This review surveyed recent literature on aggregating expert‐elicited predictions. We gathered common terminology, aggregation methods, and forecasting performance metrics, and offer guidance to strengthen future work that is growing at an accelerated pace.
Forecasting transmission of infectious diseases, especially for vector-borne diseases, poses unique challenges for researchers. Behaviors of and interactions between viruses, vectors, hosts, and the environment each play a part in determining the transmission of a disease. Public health surveillance systems and other sources provide valuable data that can be used to accurately forecast disease incidence. However, many aspects of common infectious disease surveillance data are imperfect: cases may be reported with a delay or in some cases not at all, data on vectors may not be available, and case data may not be available at high geographical or temporal resolution. In the face of these challenges, researchers must make assumptions to either account for these underlying processes in a mechanistic model or to justify their exclusion altogether in a statistical model. Whether a model is mechanistic or statistical, researchers should evaluate their model using accepted best practices from the emerging field of infectious disease forecasting while adopting conventions from other fields that have been developing forecasting methods for decades. Accounting for assumptions and properly evaluating models will allow researchers to generate forecasts that have the potential to provide valuable insights for public health officials. This chapter provides a background to the practice of forecasting in general, discusses the biological and statistical models used for infectious disease forecasting, presents technical details about making and evaluating forecasting models, and explores the issues in communicating forecasting results in a public health context.
Background: A novel human coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was identified in China in December 2019. There is limited support for many of its key epidemiologic features, including the incubation period for clinical disease (coronavirus disease 2019 [COVID-19]), which has important implications for surveillance and control activities. Objective: To estimate the length of the incubation period of COVID-19 and describe its public health implications. Design: Pooled analysis of confirmed COVID-19 cases reported between 4 January 2020 and 24 February 2020. Setting: News reports and press releases from 50 provinces, regions, and countries outside Wuhan, Hubei province, China. Participants: Persons with confirmed SARS-CoV-2 infection outside Hubei province, China. Measurements: Patient demographic characteristics and dates and times of possible exposure, symptom onset, fever onset, and hospitalization. Results: There were 181 confirmed cases with identifiable exposure and symptom onset windows to estimate the incubation period of COVID-19. The median incubation period was estimated to be 5.1 days (95% CI, 4.5 to 5.8 days), and 97.5% of those who develop symptoms will do so within 11.5 days (CI, 8.2 to 15.6 days) of infection. These estimates imply that, under conservative assumptions, 101 out of every 10 000 cases (99th percentile, 482) will develop symptoms after 14 days of active monitoring or quarantine. Limitation: Publicly reported cases may overrepresent severe cases, the incubation period for which may differ from that of mild cases. Conclusion: This work provides additional evidence for a median incubation period for COVID-19 of approximately 5 days, similar to SARS. Our results support current proposals for the length of quarantine or active monitoring of persons potentially exposed to SARS-CoV-2, although longer monitoring periods might be justified in extreme cases. Primary Funding Source: U.S. Centers for Disease Control and Prevention, National Institute of Allergy and Infectious Diseases, National Institute of General Medical Sciences, and Alexander von Humboldt Foundation.
Health planners from global to local levels must anticipate year‐to‐year and week‐to‐ week variation in seasonal influenza activity when planning for and responding to epidemics to mitigate their impact. To help with this, countries routinely collect incidence of mild and severe respiratory illness and virologic data on circulating subtypes and use these data for situational awareness, burden of disease estimates and severity assessments. Advanced analytics and modelling are increasingly used to aid planning and response activities by describing key features of influenza activity for a given location and generating forecasts that can be translated to useful actions such as enhanced risk communications, and informing clinical supply chains. Here, we describe the formation of the Influenza Incidence Analytics Group (IIAG), a coordinated global effort to apply advanced analytics and modelling to public influenza data, both epidemiological and virologic, in real‐time and thus provide additional insights to countries who provide routine surveillance data to WHO. Our objectives are to systematically increase the value of data to health planners by applying advanced analytics and forecasting and for results to be immediately reproducible and deployable using an open repository of data and code. We expect the resources we develop and the associated community to provide an attractive option for the open analysis of key epidemiological data during seasonal epidemics and the early stages of an influenza pandemic.
We developed a contactless syndromic surveillance platform FluSense that aims to expand the current paradigm of influenza-like illness (ILI) surveillance by capturing crowd-level bio-clinical signals directly related to physical symptoms of ILI from hospital waiting areas in an unobtrusive and privacy-sensitive manner. FluSense consists of a novel edge-computing sensor system, models and data processing pipelines to track crowd behaviors and influenza-related indicators, such as coughs, and to predict daily ILI and laboratory-confirmed influenza caseloads. FluSense uses a microphone array and a thermal camera along with a neural computing engine to passively and continuously characterize speech and cough sounds along with changes in crowd density on the edge in a real-time manner. We conducted an IRB-approved 7 month-long study from December 10, 2018 to July 12, 2019 where we deployed FluSense in four public waiting areas within the hospital of a large university. During this period, the FluSense platform collected and analyzed more than 350,000 waiting room thermal images and 21 million non-speech audio samples from the hospital waiting areas. FluSense can accurately predict daily patient counts with a Pearson correlation coefficient of 0.95. We also compared signals from FluSense with the gold standard laboratory-confirmed influenza case data obtained in the same facility and found that our sensor-based features are strongly correlated with laboratory- confirmed influenza trends.
Estimation of epidemic onset timing is an important component of controlling the spread of seasonal infectious diseases within community healthcare sites. The Above Local Elevated Respiratory Illness Threshold (ALERT) algorithm uses a threshold-based approach to suggest incidence levels that historically have indicated the transition from endemic to epidemic activity. In this paper, we present the first detailed overview of the computational approach underlying the algorithm. In the motivating example section, we evaluate the performance of ALERT in determining the onset of increased respiratory virus incidence using laboratory testing data from the Children’s Hospital of Colorado. At a threshold of 10 cases per week, ALERT-selected intervention periods performed better than the observed hospital site periods (2004/2005-2012/2013) and a CUSUM method. Additional simulation studies show how data properties may effect ALERT performance on novel data. We found that the conditions under which ALERT showed ideal performance generally included high seasonality and low off-season incidence.
Seasonal influenza results in substantial annual morbidity and mortality in the United States and worldwide. Accurate forecasts of key features of influenza epidemics, such as the timing and severity of the peak incidence in a given season, can inform public health response to outbreaks. As part of ongoing efforts to incorporate data and advanced analytical methods into public health decision-making, the United States Centers for Disease Control and Prevention (CDC) has organized seasonal influenza forecasting challenges since the 2013/2014 season. In the 2017/2018 season, 22 teams participated. A subset of four teams created a research consortium called the FluSight Network in early 2017. During the 2017/2018 season they worked together to produce a collaborative multi-model ensemble that combined 21 separate component models into a single model using a machine learning technique called stacking. This approach creates a weighted average of predictive densities where the weight for each component is based on that component's forecast accuracy in past seasons. In the 2017/2018 influenza season, one of the largest seasonal outbreaks in the last 15 years, this multi-model ensemble performed better on average than all individual component models and placed second overall in the CDC challenge. It also outperformed the baseline multi-model ensemble created by the CDC that took a simple average of all models submitted to the forecasting challenge. This project shows that collaborative efforts between research teams to develop ensemble forecasting approaches can bring measurable improvements in forecast accuracy and important reductions in the variability of performance from year to year. Efforts such as this, that emphasize real-time testing and evaluation of forecasting models and facilitate the close collaboration between public health officials and modeling researchers, are essential to improving our understanding of how best to use forecasts to improve public health response to seasonal and emerging epidemic threats.
A wide range of research has promised new tools for forecasting infectious disease dynamics, but little of that research is currently being applied in practice, because tools do not address key public health needs, do not produce probabilistic forecasts, have not been evaluated on external data, or do not provide sufficient forecast skill to be useful. We developed an open collaborative forecasting challenge to assess probabilistic forecasts for seasonal epidemics of dengue, a major global public health problem. Sixteen teams used a variety of methods and data to generate forecasts for 3 epidemiological targets (peak incidence, the week of the peak, and total incidence) over 8 dengue seasons in Iquitos, Peru and San Juan, Puerto Rico. Forecast skill was highly variable across teams and targets. While numerous forecasts showed high skill for midseason situational awareness, early season skill was low, and skill was generally lowest for high incidence seasons, those for which forecasts would be most valuable. A comparison of modeling approaches revealed that average forecast skill was lower for models including biologically meaningful data and mechanisms and that both multimodel and multiteam ensemble forecasts consistently outperformed individual model forecasts. Leveraging these insights, data, and the forecasting framework will be critical to improve forecast skill and the application of forecasts in real time for epidemic preparedness and response. Moreover, key components of this project—integration with public health needs, a common forecasting framework, shared and standardized data, and open participation—can help advance infectious disease forecasting beyond dengue.
We often seek to estimate the causal effect of an exposure on a particular outcome in both randomized and observational settings. One such estimation method is the covariate-adjusted residuals estimator, which was designed for individually or cluster randomized trials. In this manuscript, we study the properties of this estimator and develop a new estimator that utilizes both covariate adjustment and inverse probability weighting. We support our theoretical results with a simulation study and an application in an infectious disease setting. The covariate-adjusted residuals estimator is an efficient and unbiased estimator of the average treatment effect in randomized trials; however, it is not guaranteed to be unbiased in observational studies. Our novel estimator, the covariate-adjusted residuals estimator with inverse probability weighting, is unbiased in randomized and observational settings, under a reasonable set of assumptions. Furthermore, when these assumptions hold, it provides efficiency gains over inverse probability weighting in observational studies. The covariate-adjusted residuals estimator is valid for use in randomized trials, but should not be used in observational studies. The covariate-adjusted residuals estimator with inverse probability weighting provides an efficient alternative for use in randomized and observational settings.
Arianna Kazemi, Connor Kennedy and Gabri Silverman, undergraduate winners of the ASA Public Health Data Challenge, and their advisor Nicholas G. Reich, explore differences in the death, arrest and reoffending rates for opioid users in the USA.
Evaluating probabilistic forecasts in the context of a real-time public health surveillance system is a complicated business. We agree with Bracher’s (1) observations that the scores established by the US Centers for Disease Control and Prevention (CDC) and used to evaluate our forecasts of seasonal influenza in the United States are not “proper” by definition (2). We thank him for raising this important issue. A key advantage of proper scoring is that it incentivizes forecasters to provide their best probabilistic estimates of the fundamental unit of prediction. In the case of the FluSight competition targets, the units are intervals or bins containing dates or values representing influenza-like illness (ILI) activity. A forecast assigns probabilities to each bin. During the evolution of the FluSight challenge, the organizers at CDC made a conscious decision to use a “moving window” or “multibin” score that rewards forecasts for assigning substantial probability to values within a window of the eventually observed value. This decision was driven by the need to find a balance between 1) strictly proper scoring and high-resolution binning (e.g., at 0.1% increments for ILI values) and 2) the need for coarser categorizations for communication and decision-making purposes. Because final observations from a surveillance system are only estimates of an underlying “ground truth” measure of disease activity, a wider window for evaluating accuracy was considered. In the end, CDC elected to allow nearby “windows” of the truth to be considered accurate (e.g., within ±0.5% of the observed ILI value), understanding that there was a downside to not using a proper score. Given the increasing visibility and public availability of infectious disease forecasts, such as those from the FluSight challenge (3), forecasts are being used and interpreted for multiple purposes by more end users than when the challenge was originally conceived. Using a proper logarithmic score would require that forecasts be evaluated at a fixed resolution, e.g., for prespecified bins of 0.1% or 0.5%. Even if forecasts were optimized for and formally evaluated at one specific resolution, this use would not preclude the transformation of forecast outputs to a variety of resolutions appropriate for the specific decision or communication. Therefore, Bracher’s (1) letter raises an interesting and timely question about whether to institute a proper scoring rule for evaluating these public health forecasts. Regarding the impact of the impropriety of the score on the results in our original paper, we confirm that none of the forecasts presented in our original paper were manipulated in the way that Bracher shows is possible (4). Furthermore, evaluating forecasts by the proper logarithmic score metric does not substantially change the quality of the component models relative to each other (Fig. 1). Bracher’s (1) letter contributes to an existing and robust dialogue among quantitative modelers and public health decision makers about how to meaningfully evaluate probabilistic forecasts and support effective real-time decision making. We welcome this ongoing public discussion of both scientific and public policy considerations in the evaluation of forecasts.
Forecasting is beginning to be integrated into decision-making processes for infectious disease outbreak response. We discuss how technologies could accelerate the adoption of forecasting among public health practitioners, improve epidemic management, save lives, and reduce the economic impact of outbreaks.
Seasonal influenza infects between 10 and 50 million people in the United States every year, overburdening hospitals during weeks of peak incidence. Named by the CDC as an important tool to fight the damaging effects of these epidemics, accurate forecasts of influenza and influenza-like illness (ILI) forewarn public health officials about when, and where, seasonal influenza outbreaks will hit hardest. Multi-model ensemble forecasts---weighted combinations of component models---have shown positive results in forecasting. Ensemble forecasts of influenza outbreaks have been static, training on all past ILI data at the beginning of a season, generating a set of optimal weights for each model in the ensemble, and keeping the weights constant. We propose an adaptive ensemble forecast that (i) changes model weights week-by-week throughout the influenza season, (ii) only needs the current influenza season's data to make predictions, and (iii) by introducing a prior distribution, shrinks weights toward the reference equal weighting approach and adjusts for observed ILI percentages that are subject to future revisions. We investigate the prior's ability to impact adaptive ensemble performance and, after finding an optimal prior via a cross-validation approach, compare our adaptive ensemble's performance to equal-weighted and static ensembles. Applied to forecasts of short-term ILI incidence at the regional and national level in the US, our adaptive model outperforms a naive equal-weighted ensemble, and has similar or better performance to the static ensemble, which requires multiple years of training data. Adaptive ensembles are able to quickly train and forecast during epidemics, and provide a practical tool to public health officials looking for forecasts that can conform to unique features of a specific season.
Infectious disease modeling has played a prominent role in recent outbreaks, yet integrating these analyses into public health decision-making has been challenging.
We recommend establishing ‘outbreak science’ as an inter-disciplinary
field to improve applied epidemic modeling.
Influenza infects an estimated 9 to 35 million individuals each year in the United States and is a contributing cause for between 12,000 and 56,000 deaths annually. Seasonal outbreaks of influenza are common in temperate regions of the world, with highest incidence typically occurring in colder and drier months of the year. Real-time forecasts of influenza transmission can inform public health response to outbreaks. We present the results of a multi-institution collaborative effort to standardize the collection and evaluation of forecasting models for influenza in the US for the 2010/2011 through 2016/2017 influenza seasons. For these seven seasons, we assembled weekly real-time forecasts of 7 targets of public health interest from 22 different models. We compared forecast accuracy of each model relative to a historical baseline seasonal average. Across all regions of the US, over half of the models showed consistently better performance than the historical baseline when forecasting incidence of influenza-like illness 1, 2 and 3 weeks ahead of available data and when forecasting the timing and magnitude of the seasonal peak. In some regions, delays in data reporting were strongly and negatively associated with forecast accuracy. More timely reporting and an improved overall accessibility to novel and traditional data sources are needed to improve forecasting accuracy and its integration with real-time public health decision-making.
McGowan C, Biggerstaff M, Johansson M, Apfeldorf K, Ben-Nun M, Brooks L, Convertino M, Erraguntla M, Farrow D, Freeze J, Ghosh S, Hyun S, Kandula S, Lega J, Liu Y, Michaud N, Morita H, Niemi J, Ramakrishnan N, Ray EL, Reich NG, Riley P, Shaman J, Tibshirani R, Vespignani A, Zhang Q, Reed C (2019). Sci Rep, 9(683).
Since 2013, the Centers for Disease Control and Prevention (CDC) has hosted an annual influenza season forecasting challenge. The 2015–2016 challenge consisted of weekly probabilistic forecasts of multiple targets, including fourteen models submitted by eleven teams. Forecast skill was evaluated using a modified logarithmic score. We averaged submitted forecasts into a mean ensemble model and compared them against predictions based on historical trends. Forecast skill was highest for seasonal peak intensity and short-term forecasts, while forecast skill for timing of season onset and peak week was generally low. Higher forecast skill was associated with team participation in previous influenza forecasting challenges and utilization of ensemble forecasting techniques. The mean ensemble consistently performed well and outperformed historical trend predictions. CDC and contributing teams will continue to advance influenza forecasting and work to improve the accuracy and reliability of forecasts to facilitate increased incorporation into public health response efforts.
Dengue hemorrhagic fever (DHF), a severe manifestation of dengue viral infection that can cause severe bleeding, organ impairment, and even death, affects between 15,000 and 105,000 people each year in Thailand. While all Thai provinces experience at least one DHF case most years, the distribution of cases shifts regionally from year to year. Accurately forecasting where DHF outbreaks occur before the dengue season could help public health officials prioritize public health activities. We develop statistical models that use biologically plausible covariates, observed by April each year, to forecast the cumulative DHF incidence for the remainder of the year. We perform cross-validation during the training phase (2000–2009) to select the covariates for these models. A parsimonious model based on preseason incidence outperforms the 10-y median for 65% of province-level annual forecasts, reduces the mean absolute error by 19%, and successfully forecasts outbreaks (area under the receiver operating characteristic curve = 0.84) over the testing period (2010–2014). We find that functions of past incidence contribute most strongly to model performance, whereas the importance of environmental covariates varies regionally. This work illustrates that accurate forecasts of dengue risk are possible in a policy-relevant timeframe.
Accurate and reliable predictions of infectious disease dynamics can be valuable to public health organizations that plan interventions to decrease or prevent disease transmission. A great variety of models have been developed for this task, using different model structures, covariates, and targets for prediction. Experience has shown that the performance of these models varies; some tend to do better or worse in different seasons or at different points within a season. Ensemble methods combine multiple models to obtain a single prediction that leverages the strengths of each model. We considered a range of ensemble methods that each form a predictive density for a target of interest as a weighted sum of the predictive densities from component models. In the simplest case, equal weight is assigned to each component model; in the most complex case, the weights vary with the region, prediction target, week of the season when the predictions are made, a measure of component model uncertainty, and recent observations of disease incidence. We applied these methods to predict measures of influenza season timing and severity in the United States, both at the national and regional levels, using three component models. We trained the models on retrospective predictions from 14 seasons (1997/1998 - 2010/2011) and evaluated each model's prospective, out-of-sample performance in the five subsequent influenza seasons. In this test phase, the ensemble methods showed overall performance that was similar to the best of the component models, but offered more consistent performance across seasons than the component models. Ensemble methods offer the potential to deliver more reliable predictions to public health decision makers.
Preprints—manuscripts posted openly online prior to peer review—offer an opportunity to accelerate the dissemination of scientific findings to support responses to infectious disease outbreaks. Preprints posted during the Ebola and Zika outbreaks included novel analyses and new data, and most of those that were matched to peer-reviewed publications were available more than 100 days before publication. Despite the advantages of preprints and the endorsement of journals and funders in the context of outbreaks, less than 5% of Ebola and Zika journal articles were posted as preprints prior to publication in journals. With broader adoption by scientists, journals, and funding agencies, preprints can complement peer-reviewed publication and ensure the early, open, and transparent dissemination of science relevant to the prevention and control of disease outbreaks.
During outbreaks of deadly emerging pathogens (e.g., Ebola, MERS-CoV) and bioterror threats (e.g., smallpox), actively monitoring potentially infected individuals aims to limit disease transmission and morbidity. Guidance issued by CDC on active monitoring was a cornerstone of its response to the West Africa Ebola outbreak. There are limited data on how to balance the costs and performance of this important public health activity. We present a framework that estimates the risks and costs of specific durations of active monitoring for pathogens of significant public health concern. We analyze data from New York City's Ebola active monitoring program over a 16-month period in 2014-2016. For monitored individuals, we identified unique durations of active monitoring that minimize expected costs for those at “low (but not zero) risk” and “some or high risk”: 21 and 31 days, respectively. Extending our analysis to smallpox and MERS-CoV, we found that the optimal length of active monitoring relative to the median incubation period was reduced compared to Ebola due to less variable incubation periods. Active monitoring can save lives but is expensive. Resources can be most effectively allocated by using exposure-risk categories to modify the duration or intensity of active monitoring.
Objective: To determine the effect of mandatory and nonmandatory influenza vaccination policies on vaccination rates and symptomatic absenteeism among healthcare personnel (HCP). Design: Retrospective observational cohort study. Setting: This study took place at 3 university medical centers with mandatory influenza vaccination policies and 4 Veterans Affairs (VA) healthcare systems with nonmandatory influenza vaccination policies. Participants: The study included 2,304 outpatient HCP at mandatory vaccination sites and 1,759 outpatient HCP at nonmandatory vaccination sites. Methods: To determine the incidence and duration of absenteeism in outpatient settings, HCP participating in the Respiratory Protection Effectiveness Clinical Trial at both mandatory and nonmandatory vaccination sites over 3 viral respiratory illness (VRI) seasons (2012–2015) reported their influenza vaccination status and symptomatic days absent from work weekly throughout a 12-week period during the peak VRI season each year. The adjusted effects of vaccination and other modulating factors on absenteeism rates were estimated using multivariable regression models. Results: The proportion of participants who received influenza vaccination was lower each year at nonmandatory than at mandatory vaccination sites (odds ratio [OR], 0.09; 95% confidence interval [CI], 0.07–0.11). Among HCP who reported at least 1 sick day, vaccinated HCP had lower symptomatic days absent compared to unvaccinated HCP (OR for 2012–2013 and 2013–2014, 0.82; 95% CI, 0.72–0.93; OR for 2014–2015, 0.81; 95% CI, 0.69–0.95). Conclusions: These data suggest that mandatory HCP influenza vaccination policies increase influenza vaccination rates and that HCP symptomatic absenteeism diminishes as rates of influenza vaccination increase. These findings should be considered in formulating HCP influenza vaccination policies.
Creating statistical models that generate accurate predictions of infectious disease incidence over multiple time points is a challenging problem whose solution could benefit public health decision makers. We develop a new approach to this problem using kernel conditional density estimation (KCDE) and copulas. We obtain predictive distributions for incidence in individual weeks using KCDE and tie those distributions together into joint distributions using copulas. This strategy enables us to create predictions for the timing of and incidence in the peak week of the season. Our implementation of KCDE incorporates two novel kernel components: a periodic component that captures seasonality in disease incidence, and a component that allows for a full parameterization of the bandwidth matrix with discrete variables. We demonstrate via simulation that a fully parameterized bandwidth matrix can be beneficial for estimating conditional densities. We apply the method to predicting dengue fever and influenza, and compare to a seasonal autoregressive integrated moving average (SARIMA) model and a previously published generalized linear model for infectious disease incidence known as HHH4. KCDE outperforms the baseline methods for predictions of dengue incidence in individual weeks. KCDE also offers more consistent performance than the baseline models for predictions of incidence in the peak week, and is comparable to the baseline models on the other prediction targets. Using the periodic kernel function led to better predictions of incidence. Our approach and extensions of it could yield improved predictions for public health decision makers, particularly in diseases with heterogeneous seasonal dynamics such as dengue fever.
The rapid emergence of infectious disease outbreaks from both new and known pathogens remains a critical concern of health officials worldwide. Improving communication between teams of scientific researchers who assemble forecasts of outbreaks before and during epidemics, and policy makers who could integrate these data into decicion-making has been identified as a critical area for innovation.(Chretien et al. 2015) In an attempt to address this issue, we have developed flusight, a tool for visualizing infectious disease forecasts. It provides an interactive interface for real-time comparison, exploration, and evaluation of infectious disease forecast models over time and geographic regions. A version is live here, with forecasts of influenza in the US that are updated weekly during the US influenza season. Flusight uses D3 (Bostock 2016) for generating visualizations from a single static file that summarizes the entities to be visualized (such as, predicted and actual weekly influenza incidence, predicted week with the peak incidence for the season, etc...). It is written to keep hosting overhead minimal and pre-generates the data file by parsing model predictions and live influenza data from delphi-API (undefx 2016). All content is bundled into a static web page. The data collection step can be replaced to visualize data and forecasts from custom sources instead of the ones used in the current repository. This allows future users to plug in similar time-series-based disease prediction models for visualization. This application has potential to be widely used by infectious disease forecasters who generate forecasts in real-time. In this way, we hope that flusight will facilitate dissemination, comparison, and standardized evaluation of outbreak predictions.
Confidence intervals provide a way to determine plausible values for a population parameter. They are omnipresent in research articles involving statistical analyses. Appropriately, a key statistical literacy learning objective is the ability to interpret and understand confidence intervals in a wide range of settings. As instructors, we devote a considerable amount of time and effort to ensure that students master this topic in introductory courses and beyond. Yet, studies continue to find that confidence intervals are commonly misinterpreted and that even experts have trouble calibrating their individual confidence levels. In this article, we present a ten-minute trivia game-based activity that addresses these misconceptions by exposing students to confidence intervals from a personal perspective. We describe how the activity can be integrated into a statistics course as a one-time activity or with repetition at intervals throughout a course, discuss results of using the activity in class, and present possible extensions.
Epidemics of communicable diseases place a huge burden on public health infrastructures across the world. Producing accurate and actionable forecasts of infectious disease incidence at short and long time scales will improve public health response to outbreaks. However, scientists and public health officials face many obstacles in trying to create such real-time forecasts of infectious disease incidence. Dengue is a mosquito-borne virus that annually infects over 400 million people worldwide. We developed a real-time forecasting model for dengue hemorrhagic fever in the 77 provinces of Thailand. We created a practical computational infrastructure that generated multi-step predictions of dengue incidence in Thai provinces every two weeks throughout 2014. These predictions show mixed performance across provinces, out-performing seasonal baseline models in over half of provinces at a 1.5 month horizon. Additionally, to assess the degree to which delays in case reporting make long-range prediction a challenging task, we compared the performance of our real-time predictions with predictions made with fully reported data. This paper provides valuable lessons for the implementation of real-time predictions in the context of public health decision making.
Dengue viruses, which infect millions of people per year worldwide, cause large epidemics that strain healthcare systems. Despite diverse efforts to develop forecasting tools including autoregressive time series, climate-driven statistical, and mechanistic biological models, little work has been done to understand the contribution of different components to improved prediction. We developed a framework to assess and compare dengue forecasts produced from different types of models and evaluated the performance of seasonal autoregressive models with and without climate variables for forecasting dengue incidence in Mexico. Climate data did not significantly improve the predictive power of seasonal autoregressive models. Short-term and seasonal autocorrelation were key to improving short-term and long-term forecasts, respectively. Seasonal autoregressive models captured a substantial amount of dengue variability, but better models are needed to improve dengue forecasting. This framework contributes to the sparse literature of infectious disease prediction model evaluation, using state-of-the-art validation techniques such as out-of-sample testing and comparison to an appropriate reference model.
Statistical prediction models inform decision-making processes in many real-world settings. Prior to using predictions in practice, one must rigorously test and validate candidate models to ensure that the proposed predictions have sufficient accuracy to be used in practice. In this article, we present a framework for evaluating time series predictions, which emphasizes computational simplicity and an intuitive interpretation using the relative mean absolute error metric. For a single time series, this metric enables comparisons of candidate model predictions against naïve reference models, a method that can provide useful and standardized performance benchmarks. Additionally, in applications with multiple time series, this framework facilitates comparisons of one or more models’ predictive performance across different sets of data. We illustrate the use of this metric with a case study comparing predictions of dengue hemorrhagic fever incidence in two provinces of Thailand. This example demonstrates the utility and interpretability of the relative mean absolute error metric in practice, and underscores the practical advantages of using relative performance metrics when evaluating predictions.
OBJECTIVE: To estimate the timing of key events in the natural history of Zika virus infection. METHODS: In February 2016, we searched PubMed, Scopus and the Web of Science for publications containing the term Zika. By pooling data, we estimated the incubation period, the time to seroconversion and the duration of viral shedding. We estimated the risk of Zika virus contaminated blood donations. FINDINGS: We identified 20 articles on 25 patients with Zika virus infection. The median incubation period for the infection was estimated to be 5.9 days (95% credible interval, CrI: 4.4-7.6), with 95% of people who developed symptoms doing so within 11.2 days (95% CrI: 7.6-18.0) after infection. On average, seroconversion occurred 9.1 days (95% CrI: 7.0-11.6) after infection. The virus was detectable in blood for 9.9 days (95% CrI: 6.9-21.4) on average. Without screening, the estimated risk that a blood donation would come from an infected individual increased by approximately 1 in 10 000 for every 1 per 100 000 person-days increase in the incidence of Zika virus infection. Symptom-based screening may reduce this rate by 7% (relative risk, RR: 0.93; 95% CrI: 0.89-0.99) and antibody screening, by 29% (RR: 0.71; 95% CrI: 0.28-0.88). CONCLUSION: Neither symptom- nor antibody-based screening for Zika virus infection substantially reduced the risk that blood donations would be contaminated by the virus. Polymerase chain reaction testing should be considered for identifying blood safe for use in pregnant women in high-incidence areas.
Although N95 filtering facepiece respirators and medical masks are commonly used for protection against respiratory infections in healthcare settings, more clinical evidence is needed to understand the optimal settings and exposure circumstances for healthcare personnel to use these devices. A lack of clinically germane research has led to equivocal, and occasionally conflicting, healthcare respiratory protection recommendations from public health organizations, professional societies, and experts. The Respiratory Protection Effectiveness Clinical Trial (ResPECT) is a prospective comparison of respiratory protective equipment to be conducted at multiple U.S. study sites. Healthcare personnel who work in outpatient settings will be cluster-randomized to wear N95 respirators or medical masks for protection against infections during respiratory virus season. Outcome measures will include laboratory-confirmed viral respiratory infections, acute respiratory illness, and influenza-like illness. Participant exposures to patients, coworkers, and others with symptoms and signs of respiratory infection, both within and beyond the workplace, will be recorded in daily diaries. Adherence to study protocols will be monitored by the study team. ResPECT is designed to better understand the extent to which N95s and MMs reduce clinical illness among healthcare personnel. A fully successful study would produce clinically relevant results that help clinician-leaders make reasoned decisions about protection of healthcare personnel against occupationally acquired respiratory infections and prevention of spread within healthcare systems.
Early, accurate predictions of the onset of influenza season enable targeted implementation of control efforts. Our objective was to develop a tool to assist public health practitioners, researchers, and clinicians in defining the community-level onset of seasonal influenza epidemics. Using recent surveillance data on virologically confirmed infections of influenza, we developed the Above Local Elevated Respiratory Illness Threshold (ALERT) algorithm, a method to identify the period of highest seasonal influenza activity. We used data from 2 large hospitals that serve Baltimore, Maryland and Denver, Colorado, and the surrounding geographic areas. The data used by ALERT are routinely collected surveillance data: weekly case counts of laboratory-confirmed influenza A virus. The main outcome is the percentage of prospective seasonal influenza cases identified by the ALERT algorithm. When ALERT thresholds designed to capture 90% of all cases were applied prospectively to the 2011–2012 and 2012–2013 influenza seasons in both hospitals, 71%–91% of all reported cases fell within the ALERT period. The ALERT algorithm provides a simple, robust, and accurate metric for determining the onset of elevated influenza activity at the community level. This new algorithm provides valuable information that can impact infection prevention recommendations, public health practice, and healthcare delivery.
The frequency of cluster-randomized trials (CRTs) in peer-reviewed literature has increased exponentially over the past two decades. CRTs are a valuable tool for studying interventions that cannot be effectively implemented or randomized at the individual level. However, some aspects of the design and analysis of data from CRTs are more complex than those for individually randomized controlled trials. One of the key components to designing a successful CRT is calculating the proper sample size (i.e. number of clusters) needed to attain an acceptable level of statistical power. In order to do this, a researcher must make assumptions about the value of several variables, including a fixed mean cluster size. In practice, cluster size can often vary dramatically. Few studies account for the effect of cluster size variation when assessing the statistical power for a given trial. We conducted a simulation study to investigate how the statistical power of CRTs changes with variable cluster sizes. In general, we observed that increases in cluster size variability lead to a decrease in power.
Dengue, a mosquito-borne virus of humans, infects over 50 million people annually. Infection with any of the four dengue serotypes induces protective immunity to that serotype, but does not confer long-term protection against infection by other serotypes. The immunological interactions between serotypes are of central importance in understanding epidemiological dynamics and anticipating the impact of dengue vaccines. We analysed a 38-year time series with 12 197 serotyped dengue infections from a hospital in Bangkok, Thailand. Using novel mechanistic models to represent different hypothesized immune interactions between serotypes, we found strong evidence that infection with dengue provides substantial short-term cross-protection against other serotypes (approx. 1–3 years). This is the first quantitative evidence that short-term cross-protection exists since human experimental infection studies performed in the 1950s. These findings will impact strategies for designing dengue vaccine studies, future multi-strain modelling efforts, and our understanding of evolutionary pressures in multi-strain disease systems.
In recent years, the number of studies using a cluster-randomized design has grown dramatically. In addition, the cluster-randomized crossover design has been touted as a methodological advance that can increase efficiency of cluster-randomized studies in certain situations. While the cluster-randomized crossover trial has become a popular tool, standards of design, analysis, reporting and implementation have not been established for this emergent design. We address one particular aspect of cluster-randomized and cluster-randomized crossover trial design: estimating statistical power. We present a general framework for estimating power via simulation in cluster-randomized studies with or without one or more crossover periods. We have implemented this framework in the clusterPower software package for R, freely available online from the Comprehensive R Archive Network. Our simulation framework is easy to implement and users may customize the methods used for data analysis. We give four examples of using the software in practice. The clusterPower package could play an important role in the design of future cluster-randomized and cluster-randomized crossover studies. This work is the first to establish a universal method for calculating power for both cluster-randomized and cluster-randomized clinical trials. More research is needed to develop standardized and recommended methodology for cluster-randomized crossover studies.
Knowing which populations are most at risk for severe outcomes from an emerging infectious disease is crucial in deciding the optimal allocation of resources during an outbreak response. The case fatality ratio (CFR) is the fraction of cases that die after contracting a disease. The relative CFR is the factor by which the case fatality in one group is greater or less than that in a second group. Incomplete reporting of the number of infected individuals, both recovered and dead, can lead to biased estimates of the CFR. We define conditions under which the CFR and the relative CFR are identifiable. Furthermore, we propose an estimator for the relative CFR that controls for time‐varying reporting rates. We generalize our methods to account for elapsed time between infection and death. To demonstrate the new methodology, we use data from the 1918 influenza pandemic to estimate relative CFRs between counties in Maryland. A simulation study evaluates the performance of the methods in outbreak scenarios. An R software package makes the methods and data presented here freely available. Our work highlights the limitations and challenges associated with estimating absolute and relative CFRs in practice. However, in certain situations, the methods presented here can help identify vulnerable subpopulations early in an outbreak of an emerging pathogen such as pandemic influenza.
The incubation period, the time between infection and disease onset, is important in the surveillance and control of infectious diseases but is often coarsely observed. Coarse data arises because the time of infection, the time of disease onset or both are not known precisely. Accurate estimates of an incubation period distribution are useful in real-time outbreak investigations and in modeling public health interventions. We compare two methods of estimating such distributions. The first method represents the data as doubly interval-censored. The second introduces a data reduction technique that makes the computation more tractable. In a simulation study, the methods perform similarly when estimating the median, but the first method yields more reliable estimates of the distributional tails. We conduct a sensitivity analysis of the two methods to violations of model assumption and we apply these methods to historical incubation period data on influenza A and respiratory syncytial virus. The analysis of reduced data is less computationally intensive and performs well for estimating the median under a wide range of conditions. However for estimation of the tails of the distribution, the doubly interval-censored analysis is the recommended procedure.