BioScience Trends. 2017;11(5):533-541. (DOI: 10.5582/bst.2017.01257)

A comparative study on predicting influenza outbreaks.

Zhang J, Nawata K


Worldwide, influenza is estimated to result in approximately 3 to 5 million annual cases of severe illness and approximately 250,000 to 500,000 deaths. We need an accurate time-series model to predict the number of influenza patients. Although time-series models with different time lags as feature spaces could lead to varied accuracy, past studies simply adopted a time lag in their models without comparing or selecting an appropriate number of time lags. We investigated the performance of adopting 6 different time lags in 6 different models: Auto-Regressive Integrated Moving Average (ARIMA), Support Vector Regression (SVR), Random Forest (RF), Gradient Boosting (GB), Artificial Neural Network (ANN), and Long Short Term Memory (LSTM) with hyperparameter adjustment. To the best of our knowledge, this is the first time that LSTM has been used to predict influenza outbreaks. As a result, we found that the time lag of 52 weeks led to the lowest Mean Absolute Percentage Error (MAPE) in the ARIMA, ANN and LSTM, while the machine learning models (SVR, RF, GB) achieved the lowest MAPEs with a time lag of 4 weeks. We also found that the MAPEs of the machine learning models were less than ARIMA, and the MAPEs of the deep learning models (ANN, LSTM) were less than those of the machine learning models. In all the models, the LSTM model of 4 layers reached the lowest MAPE of 5.4%, and the LSTM model of 5 layers with regularization reached the lowest root mean squared error (RMSE) of 0.00210.

KEYWORDS: Time series, Influenza-Like Illness, time lag, Long Short Term Memory (LSTM)

Full Text: