Predict the Influenza Season
Centers for Disease Control and Prevention (CDC)
Each year influenza epidemics occur in the United States but vary in their timing and intensity. The CDC tracks influenza activity through nationwide surveillance systems, but these systems lag behind real-time flu activity. Infectious disease forecasting, a promising new approach, could provide a more-timely and forward-looking tool that predicts rather than monitors flu activity so that health officials can be more proactive in their influenza prevention and control strategies.
To better understand influenza forecasts and improve their usefulness to public health decision-making, CDC organized the Predict the Influenza Season Challenge, a competition designed to foster innovation in flu modeling and prediction. The challenge sought models which would successfully predict the timing, peak and intensity of the 2013-2014 flu season using digital data (e.g., Twitter, Internet search data, web surveys) and novel methodological techniques. With this challenge, CDC hoped to encourage exploration into how digital data can be used to predict flu activity and complement CDC's routine systems for monitoring flu.Academics, scientists in private industry and experts in big data participated in the challenge. Eight teams that completed the challenge were composed of individuals from multiple universities, and one team was from a private company. By hosting this challenge, CDC was able to receive and evaluate 13 influenza-season forecasts based on a variety of digital data sources and methodologies—a high number that stands in contrast to what CDC would have likely received using more traditional means of outside engagement.
Challenge goals were to:
- Improve the utility of influenza forecasts to CDC and connect forecasts to public health decision-making
- Increase interest in influenza forecasting and encourage researchers to utilize novel sources of digital surveillance data
- Examine the accuracy of influenza forecasts
The total cash prize for the Predict the Influenza Season Challenge was $75,000.
Nonmonetary incentives were also used to motivate participants. Participating teams were invited to travel to Atlanta (one member of each team received travel funds) and present on their methodology and results to other participating teams and CDC staff. CDC acknowledged the challenge winner on its website. Members of participating teams also had the option of being an author on a peer-reviewed manuscript describing the challenge, the results and the conclusions.
This challenge represents the first nationally coordinated effort to forecast an influenza season in the United States. Planning for the challenge began in August 2013, and the challenge was announced November 2013. Sixteen individuals or teams initially registered for the challenge, 15 entered at least one forecast and nine submitted at least one forecast every other week for all four required forecasting milestones. After a review of the forecasts, the team led by Dr. Jeff Shaman, an assistant professor in the Department of Environmental Health Sciences at the Mailman School of Public Health at Columbia University, was named the winner on June 18, 2014. Dr. Shaman's submissions stood out because he determined in real time the reliability of his forecasts and presented the results in a similar manner to how a meteorologist provides the chance of rain for each day's weather forecast. This approach helped communicate flu forecasting in a way that was meaningful to both public health officials and the public.
The results of the challenge indicated that forecasting has become technically feasible and reasonably accurate forecasts can be made in the short term. CDC continues to work with the researchers who participated in the challenge to forecast subsequent influenza seasons and determine the best uses of forecasts to inform public health decision-making. A single cash prize of $75,000 was awarded to the challenge winner.
Areas of Excellence
Area of Excellence #1: "1.4 Determine if a Prize is Appropriate"
By hosting this challenge, CDC was able to create a sense of urgency and excitement around influenza forecasting, ignite a sense of competitiveness and receive and evaluate 13 influenza-season forecasts based on a variety of digital data sources and methodologies. These forecasts were submitted by teams that were affiliated with a diverse set of organizations including universities and private industry and were from both highly published investigators and investigators who are relatively new to the field. The high number of forecasts received from a diverse set of organizations and individuals is in contrast to the number of forecasts that likely would have been received if a more traditional method of outside engagement, like a contract or grant, had been used. The challenge mechanism also allowed CDC to seek solutions for accurately forecasting influenza season milestones without having to choose the forecasting methodologies and allowed CDC to evaluate forecasts for accuracy and quality prior to awarding of the challenge prize.
Area of Excellence #2: "3.2 Accept Solutions"
The challenge was data intensive for both the participating teams and CDC. Teams were required to submit biweekly forecasts for the start, peak, length and intensity of the 2013–2014 influenza season from Dec. 1, 2013 to March 27, 2014. Forecasts were required for the national level, but teams could also submit up to 10 additional forecasts for the Health and Human Service Regions. The selection of the winner for this challenge was based on an evaluation of the methodology used to make the forecasts and the accuracy of the forecasts. Contestant submissions were judged by a panel of reviewers that included two CDC staff outside the Influenza Division and one faculty member from a noncompeting university. Judges scored submissions on a scale of 0 to 100 points using the following criteria:
- the strength of the methodology (25 points), which assessed how clearly the results and uncertainty in the forecasts were presented and how the data sources and forecast methodology were described
- the accuracy, timeliness and reliability of the forecasts for the start, peak week and intensity of the influenza season (65 points)
- the scope of the geography (U.S. plus one or more HHS Regions) that the source data represented (10 points)
Up to 50 bonus points were awarded to any contestant that submitted forecasts for the 10 HHS Regions. Solutions submitted by the teams varied in format and complexity as no common standard existed for receiving and evaluating the accuracy of influenza forecasts, making comparison and interpretation by the judges difficult. CDC challenge management worked with the judges to provide a comprehensive and clear compilation of individual team forecasts. The challenges in data management and judging were noted, and standardized forecasting formats and accuracy assessments were developed for subsequent challenges.
Area of Excellence #3: "5.2 Document the Challenge"
Participating teams were invited to travel to Atlanta to present on their methodology and results and discuss lessons learned and the next steps, including participation in future forecasting challenges. Participating teams used the opportunity to share datasets and forecasting methodologies and provided valuable input that helped shape subsequent forecasting challenges.
CDC also coordinated a scientific manuscript documenting the challenge, the results and the lessons learned to ensure that the information was captured and available to the public in the open-access, peer reviewed journal "BioMed Central." Summarizing the challenge in a scientific manuscript was chosen because authorship provided an additional incentive to participating teams and allowed the results and conclusions of the challenge to be reviewed by experts in the field, increasing the credibility of the findings.
CDC hosted this challenge to spur innovation in the development of mathematical and statistical models to predict the timing, peak and intensity of the influenza season. This challenge required the development of forecasting models that used open-access data from existing CDC surveillance systems, including the U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet) and Internet-derived data on influenza activity (e.g., Twitter data, Internet search term data, Internet-based surveys), which have been shown to have correlation with influenza activity.
Because of the various data sources utilized, the challenge encouraged a strong connection between forecasters, subject matter experts and public health decision makers. Forecasters needed support understanding the nuances of CDC's surveillance data while public health decision makers needed support understanding the different digital data sources and forecasting methodologies. This challenge identified a number of areas that need further research before forecasting can be routinely incorporated into decision-making, including the best metrics to assess forecast accuracy, the best way to communicate forecast uncertainty and the types of decisions best aided by forecasts. To help fill these research gaps, CDC has built upon the success of the original challenge to host additional challenges to predict subsequent influenza seasons.
America COMPETES Act