Synthetic Health Data Challenge
Create and test novel solutions to further cultivate Synthea™ capabilities and the synthetic data it generates for healthcare and research purposes.
Department of Health & Human Services - Office of National Coordinator for Health Information Technology
Type of Challenge: Software and apps
Submission Start: 01/19/2021 09:00 AM ET
Submission End: 07/13/2021 05:00 PM ET
** Welcome to the Synthetic Health Data Challenge Webpage! Watch this space for important updates. **
REMINDER: The deadline for submitting Phase II prototypes/solutions is Tuesday, July 13, 2021, at 5:00 p.m. ET. Participants are competing for $100,000 in total awards. Here are the Phase II competitors and their Proposals for Innovative Models:
- Battellion: A Generic Quality Construct Module for Integrated Testing of eCQM using Synthea
- CodeRx: Medication Diversification Tool
- Generalistas: Virtual Generalist
- LMI: On Improving Realism of Disease Modules in Synthea: Social Determinant-Based Enhancements to Conditional Transition Logic
- Menrva.AI: Incorporating SDOH Data to Predict Diabetes Progression in Patients with Laboratory-Defined Prediabetes
- Particle Health: The Necessity of Realistic Synthetic Health Data Development Environments
- Team TeMa #1: Empirical Inference of Underlying Condition Probabilities Using Synthea-Generated Synthetic Health Data
- Team TeMa #2: Modification and Use of Synthea to Account for Patient Vaccination Choice
- UI Health: Spatiotemporal Big Data Analysis of Opioid Epidemic in Illinois
UPDATED FAQs (4/20/21): The FAQs have been updated to include questions from the Phase II Informational Webinar. Visit the Resources section below.
The Office of the National Coordinator for Health Information Technology (ONC), a division of the Department of Health and Human Services, has led and collaborated on many projects supporting the adoption and implementation of a patient-centered outcomes research (PCOR) data infrastructure. Projects funded by the Patient-Centered Outcomes Research Trust Fund, administered by the Assistant Secretary for Planning and Evaluation (ASPE), support the development of data capacity and infrastructure that can engage patients in health care decision-making and incorporate their responses into research. The Synthetic Health Data Challenge (Challenge) is an important component of the Synthetic Health Data Generation to Accelerate PCOR Project, through which ONC seeks to accelerate PCOR by furthering the development of Synthea™, a synthetic health data engine. The Challenge invites providers, researchers, and technology developers to develop innovative tools and resources that support validation and novel uses of synthetic data for PCOR researchers and/or health IT developers.
Clinical data are critical for patient-centered outcomes research (PCOR), which focuses on the effectiveness of prevention and treatment options. However, high-quality health care data are often difficult to access due to cost, patient consent, privacy concerns, or other legal or institutional review board (IRB) restrictions.
Synthetic health data can augment the PCOR infrastructure by providing researchers with a low risk, readily available, synthetic data source to complement their use of real clinical data. Early access to synthetic data while researchers await access to real clinical data may enhance their ability to test rigorous analyses and/or software systems that may generate relevant findings to inform health and treatment decisions.
Synthea is an open-source synthetic patient generator that models the medical history of synthetic patients. The resulting data are free from cost and privacy- and security restrictions and have the potential to support a variety of academia, research, industry, and government initiatives. Synthea can use publicly available health statistics and other research sources. Because the software uses publicly available statistical data to generate synthetic data sets, the barriers to resource availability and privacy concerns are lower than for other synthetic data generation technologies that rely on manipulating actual patient data. The software includes a temporal model that covers a patient’s entire lifetime instead of focusing on one health problem or disease recorded at any singular point in time. Similar to other synthetic data sets, the synthetic data generated by Synthea must be validated to ensure they are clinically relevant and realistic.
The Challenge seeks a wide array of innovators, researchers, and technology developers to create and test innovative and novel solutions that will further cultivate the capabilities of Synthea and the synthetic data it generates.
The Challenge will be conducted in two (2) phases:
- Phase I – Proposal for Innovative Models: Participants will submit a written proposal describing their proposed solution, including methodology and intended outcomes. Selected Phase I proposals will proceed to Phase II. There is no limit to the number of qualified proposals that may be selected to move to Phase II; however, a minimum of four (4) qualified proposals are required for the Challenge to proceed to Phase II.
- Phase II – Prototype/Solution Development: Phase I proposals that are selected to proceed to Phase II will develop their prototype/solution at this stage.
Participants will propose a solution in one of two (2) Challenge categories.
- Category I – Enhancements to Synthea: Solutions in this category include, but are not limited to, development and/or enhancement of Synthea modules and development of solutions that enhance or address limitations of Synthea.
- Category II – Novel Uses of Synthea Generated Synthetic Data: Solutions in this category include, but are not limited to, novel uses of Synthea generated data for research and technology development.
- Frequently Asked Questions Document (updated 04/20/2021)
- Technical Guidance and Tips
Example Modules, Module Companion Guides
- Synthetic Health Data Challenge Registration Form
Phase I Informational Webinar (Feb 2, 2021 12:00 PM ET)
- Phase I Submission Period Opens: 01/19/2021 09:00 AM ET
- Phase I Informational Webinar: 02/02/2021 12:00 PM ET
- Phase I Submission Period Closes: 03/02/2021 05:00 PM ET
- Phase I Finalists Announced: 03/23/2021 09:00 AM ET
- Phase II Submission Period Opens: 03/23/2021 09:00 AM ET
- Phase II Informational Webinar: 04/06/2021 12:00 PM ET
- Phase II Submission Period Closes: 07/13/2021 05:00 PM ET
- Award Announcement: 09/21/2021 09:00 AM ET
- Winning Solutions Webinar: 10/19/2021 12:00 PM ET
Total Cash Prize Pool: $100,000
Up to six (6) winners will be selected for prizes ranging from $10,000 – $50,000.
- First place winner(s) will receive $25,000 – $50,000
- Second place winner(s) will receive $15,000 – $30,000
- Third place winner(s) will receive $10,000 – $20,000
Honorable Mentions may be awarded but will not receive a monetary prize.
Each winning entry and Honorable Mention will be invited to present during the Winning Solutions Webinar on October 19, 2021 at 12:00 PM ET.
- Shall have registered to participate in the Challenge under the rules promulgated by ONC.
- Shall have complied with all the stated requirements of the appropriate Stage of the “Synthetic Health Data Challenge.”
- In the case of an entity, shall be incorporated in and maintained a primary place of business in the United States, and in the case of an individual, whether participating singly or in a group, shall be a citizen or permanent resident of the United States.
- Shall not be an HHS employee.
- May not be a federal entity or federal employee acting within the scope of their employment. We recommend that all non-HHS federal employees consult with their agency Ethics Official to determine whether the federal ethics rules will limit or prohibit the acceptance of a prize under this prize competition.
- Federal grantees may not use federal funds to participate in this prize competition unless such participation is consistent with the purpose of their grant award.
- Federal contractors may not use federal funds from a contract to participate in this prize competition or to fund efforts in support of a submission for this prize competition.
- All individual members of a team must meet the eligibility requirements.
- An individual or entity shall not be deemed ineligible because the individual or entity used federal facilities or consulted with federal employees during a prize competition if the facilities and employees are made available to all individuals and entities participating in the prize competition on an equitable basis.
- Participants must agree to assume any and all risks and waive claims against the federal government and its related entities, except in the case of willful misconduct, for any injury, death, damage, or loss of property, revenue, or profits, whether direct, indirect, or consequential, arising from participation in this prize contest, whether the injury, death, damage, or loss arises through negligence or otherwise.
- Participants shall be financially responsible for claims by (A) any third party for death, bodily injury, or property damage, or loss resulting from an activity carried out in connection with participation in the prize competition and all registered participants agree to indemnify the federal government against third party claims for damages arising from or related to their prize competition activities; and (B) the federal government for damage or loss to government property resulting from such an activity.
Terms and Conditions
- No HHS or ONC logo – The product must not use HHS’ or ONC’s logos or official seals and must not claim endorsement.
- Functionality/Accuracy – A product may be disqualified if it fails to function as expressed in the description provided, or if it provides inaccurate or incomplete information.
- Security - Submissions must be free of malware. Participants must agree that ONC may conduct testing on the product to determine whether malware or other security threats may be present. ONC may disqualify the submission if, in ONC’s judgement, it may damage government or others’ equipment or operating environment.
Technical reviewers with expertise relevant to the Challenge will evaluate the Solutions based on their ability to achieve the criteria listed below. The Solutions and evaluation statements from the technical reviewers will then be reviewed by federal employees serving as judges, who will select up to six (6) Challenge winners as well as any honorable mentions, subject to a final decision by the Award Approving Official.
The Award Approving Official will be Dr. Micky Tripathi, the National Coordinator for Health Information Technology.
Basis Upon Which Phase I Proposals will be Evaluated
- The proposal complies with all Phase I submission requirements.
- The proposal clearly articulates the Challenge category it plans to address.
- The proposal supports enhancements of Synthea and/or Synthea generated synthetic data.
- The proposal includes a validation component.
- The proposal addresses one of the following use cases: Opioids, Pediatrics, and/or Complex Care.
- The participant agrees to submit non-proprietary source code as part of their Phase II submission.
Basis Upon Which Phase II Solutions Will be Evaluated
Evaluation Criterion 1: Impact and Innovation (15 points)
- The extent to which the solution is novel, groundbreaking, and/or a creative application of an existing approach.
- The extent to which the solution supports enhancements of Synthea and/or Synthea generated synthetic data.
- The extent to which the solution may impact the field of research.
- The extent to which the solution provides new insights on the use of synthetic data for research.
- The extent to which the solution may encourage the use of Synthea and/or Synthea generated synthetic data.
Evaluation Criterion 2: Functionality and Implementation (20 points)
- The extent to which the solution enhances Synthea functionality and/or addresses known limitations of Synthea.
- The extent to which other developers and/or implementers can reproduce and/or reuse the solution.
- The extent to which the demonstration of the solution functionality (via YouTube video) aligns with the defined objectives of the solution.
Evaluation Criterion 3: Validation (15 points)
- The extent to which the solution supports and/or improves validation capabilities of Synthea.
- The extent to which the solution improves the clinical relevance of Synthea generated synthetic data.
- The extent to which other developers and/or implementers can reproduce and/or reuse the validation methods produced by the solution.
Bonus Points (up to 5 points)
May be awarded for going beyond the Challenge requirements to demonstrate or provide recommendations for:
- Overcoming the limitations of Synthea,
- Addressing validation anomalies, and/or
- Other recommendations for solutions.
How To Enter
Each Phase I Proposal must be accompanied by a completed Synthetic Health Data Challenge Registration Form. The form can be submitted with the Phase I Proposal as a separate attachment. If participants submit as a team or as an entity, they must identify a team leader who will serve as a point of contact and submit the Phase I Proposal on behalf of the team.
Phase I Submission Requirements
Only complete and correctly formatted Phase I Proposals will be reviewed. Each Phase I Proposal must include:
- a registration form; and
- a cover page; and
- a written description of the solution, including approach and utility.
Detailed instructions on the content of the Phase I Proposal are listed below.
Phase I Proposal Packages must be submitted as a single PDF file to SyntheticDataChallenge@govhealth.com by 05:00 PM ET on March 2, 2021. Late submissions will not be accepted.
- The Phase I Proposal must consist of a single PDF file with at least 1-inch margins. Font size must be no smaller than 11-point font. All Proposals must be written in English and not exceed six (6) pages, single spaced.
- The Phase I Proposal must contain a Cover Page that includes the solution name, organization and contact information of the submitter(s), title, Challenge category that your solution will address, and an abstract that describes the solution. Explain how researchers and/or technology developers can benefit from your solution and why the approach is innovative. Limit one (1) page.
- The Phase I Proposal must contain a Methods section. Clearly describe your solution to the Challenge. Explain the methods used to meet the Challenge requirements. Cite appropriate references to support the work. Include figures/illustrations where appropriate (figures/illustrations will count toward the page limit). Limit five (5) pages.
- Solutions must include a synthetic data validation component.
- Solutions must use Synthea, Synthea modules, and/or Synthea generated synthetic data.
- Solutions must address one of the following use cases: Opioids, Pediatrics, and/or Complex Care.
- References will not count toward the page limit.
- Participants must plan to submit non-proprietary source code as part of their Phase II submission and contribute the source code to the open-source community.
Phase II Submission Requirements
Participants selected to proceed to Phase II will design and test the solution defined in the approved Phase I Proposal. Each submission requires a complete “Phase II Solution Package.” If participants submit as a team or as an entity, they should identify a team leader who will serve as a point of contact and submit the Solution Package on behalf of the team. Only complete and correctly formatted Solutions Packages will be reviewed. Detailed instructions on the content of the Solution Package are listed below. Participants must submit non-proprietary source code as part of their Phase II submission and contribute the source code to the open-source community. Phase II Solution Packages must be submitted to SyntheticDataChallenge@govhealth.com by 05:00 PM ET on July 13, 2021. Late submissions will not be accepted.
The Phase II Solution Package must contain:
- A cover page that includes the title of the final solution, organization and contact information of the submitter(s), and the Challenge category that your solution will address. Limit one (1) page.
- A final paper describing the solution. The paper must be a single PDF file with at least 1-inch margins. Font size must be no smaller than 11-point font. The paper must be written in English and not exceed six (6) pages, single spaced. The paper must contain a title, abstract, a clear description of the solution, a summary of methods, evidence of validation, and how the solution meets the Challenge requirements. Include figures/illustrations where appropriate; figures/illustrations will count toward the six (6) page limit. Provide references as appropriate; references do not count toward the page limit.
- A five (5) minute video that demonstrates the solution and evidence of validation, uploaded to YouTube. URL to the video should be included in the final paper.
- If necessary, evidence of validation may be supported by supplemental documentation or attachments and will not count toward the final paper page limit.
- All non-proprietary source code developed as part of the solution should be uploaded to GitHub. URL to the source code should be included in the final paper.
Payment of the Prize
Prizes awarded under this competition will be paid by electronic funds and may be subject to Federal income taxes. Awardees will need to provide an institutional bank account and routing information to receive the award funds. Payments will comply with the Internal Revenue Service withholding and reporting requirements, where applicable. ONC reserves the right, at its sole discretion, to (a) cancel, suspend, or modify this prize competition, or any part of it, for any reason, and/or (b) not award any prizes if no submissions are deemed worthy.
Point of Contact
Have feedback or questions about this challenge? Send the challenge manager an email