A Dirichlet process model for classifying and forecasting epidemic curves
Authored by Madhav V Marathe, Elaine O Nsoesie, Scotland C Leman
Date Published: 2014
DOI: 10.1186/1471-2334-14-12
Sponsors:
Department of Interior National Business Center
United States Defense Threat Reduction Agency (DTRA)
United States National Institutes of Health (NIH)
United States National Science Foundation (NSF)
Platforms:
R
Model Documentation:
Other Narrative
Flow charts
Pseudocode
Mathematical description
Model Code URLs:
Model code not found
Abstract
Background: A forecast can be defined as an endeavor to quantitatively
estimate a future event or probabilities assigned to a future
occurrence. Forecasting stochastic processes such as epidemics is
challenging since there are several biological, behavioral, and
environmental factors that influence the number of cases observed at
each point during an epidemic. However, accurate forecasts of epidemics
would impact timely and effective implementation of public health
interventions. In this study, we introduce a Dirichlet process (DP)
model for classifying and forecasting influenza epidemic curves.
Methods: The DP model is a nonparametric Bayesian approach that enables
the matching of current influenza activity to simulated and historical
patterns, identifies epidemic curves different from those observed in
the past and enables prediction of the expected epidemic peak time. The
method was validated using simulated influenza epidemics from an
individual-based model and the accuracy was compared to that of the
tree-based classification technique, Random Forest (RF), which has been
shown to achieve high accuracy in the early prediction of epidemic
curves using a classification approach. We also applied the method to
forecasting influenza outbreaks in the United States from 1997-2013
using influenza-like illness (ILI) data from the Centers for Disease
Control and Prevention (CDC).
Results: We made the following observations. First, the DP model
performed as well as RF in identifying several of the simulated
epidemics. Second, the DP model correctly forecasted the peak time
several days in advance for most of the simulated epidemics. Third, the
accuracy of identifying epidemics different from those already observed
improved with additional data, as expected. Fourth, both methods
correctly classified epidemics with higher reproduction numbers (R) with
a higher accuracy compared to epidemics with lower R values. Lastly, in
the classification of seasonal influenza epidemics based on ILI data
from the CDC, the methods' performance was comparable.
Conclusions: Although RF requires less computational time compared to
the DP model, the algorithm is fully supervised implying that epidemic
curves different from those previously observed will always be
misclassified. In contrast, the DP model can be unsupervised, semi-supervised or fully supervised. Since both methods have their
relative merits, an approach that uses both RF and the DP model could be
beneficial.
Tags
Influenza
Populations
Line