Since research began in 1975, the overall mortality rate of childhood cancer patients has dropped 53 percent from 5.1 deaths per year to 2.4 in 2012 (per 100,000 people per year). Overall, the 5 year survival rate of children diagnosed with cancer is 84 percent, as compared to just 58 percent in 1975 (American Cancer Institute). The average 5-year survival rate for people with breast cancer is 89 percent. The 10-year rate is 83 percent, and the 15-year rate is 78 percent. If the cancer is located only in the breast, the 5-year relative survival rate of people with breast cancer is 99 percent (http://www.cancer.net/cancer-types/breast-cancer/statistics).
Over the last 10 years however, the 5-year survival rates for all types of cancer has stagnated around 90 percent (American Cancer Institute). Using the patent data, can insights be revealed to help fund projects to change this?
An important step to making educated funding decisions is to understand where federal cancer-related research funding is currently going, and what type of research it is being used for. The interactive created for this project combines several different data sets: the original USPTO data set provided for the Cancer Moonshot Challenge, Federal RePORTER fundings granted by the National Cancer Institute (NCI) from 2011-2015, NIH ExPORTER fundings granted to clinical trials from 2001-2016, and clinical trials data available through clinicaltrials.gov. All data was queried and sorted by the same cancer keywords and common MeSH terms, as determined by the query parameters for the USPTO data set.
Given the USPTO data set identifying patents related to cancer, only about ~1% of all patents filed have come as a result of NIH funding since 2011 (Figure 1). So where are NIH grants going, and what are they being spent on?
USPTO Patent data reveals that from 2001-2010, most of the patents granted by NIH grant recipients were Drug inventive patents (Figure 2.a), and while that remained true from patents granted by NIH recipients from 2011-2016, a small shift occurred, where Genetic inventive patents saw about a 2 percent increase (Figure 2.b).
Using the institutions that received NIH grants and filed patents, a connection to clinical trials could be established to try and determine patterns of study done by those institutions. Clinical trials data backs up the patent trend from 2011-2016, where most of the interventions in clinical trials were drug interventions (Figure 2.c).
As federal grants by the NCI has decreased yearly since 2011 (Figure 3), so has the number of patents filed/granted by NIH grant recipients, as well as the number of conducted clinical trials. It was difficult to extract the exact form of cancer a patent was related to, but it was very easy to be able to cluster cancer types studied from 2011-2016 in clinical trials using the NIH grant institution information (Figure 4.a). This cluster shows a decrease in the number of clinical trials conducted for the top 10 (by number of trials conducted) forms of cancer year over year. A cluster can also be created for the the top 10 MeSH terms found in that same clinical trials data set to look at what drugs and chemical compounds are most present (Figure 4.b). This provides a slightly more powerful insight as to how clinical trials have shifted study focus from 2011-2016.
A year over year plot for 2016 and beyond can be established using the NIH grant institution data to look into what forms of cancer grant recipient institutions have planned to study for the upcoming years (Figure 5).
Performance of NIH grant recipient institutions can also be measured using this data. Figures 7.a through 7.x show each of the top 25 recipients of NCI grants by funding, how many clinical trials those institutions conducted, how many patents were filed, and how many patents were granted. If the goal is to fund research projects and clinical trials that lead to patents, relatively few grants resulted in such, even from the top funded institutions.
What stands out in this data to me, is that while patent production rates out of the most funded grant recipient institutions is relatively low, they are playing a vital role in research and patient intervention testing. The shift from drug inventions in most patent filings to seeing a slight increase in genetic and biological patent filings, however small, could be vital to increasing the 5-year survival rate of most cancers, as maybe drug interventions for patients have reached peak effectiveness at treating cancers, and grant recipient institutions are paving the way over the next few years in solving cancer at the genetic and biological level.
Creating a way to connect clinical trials data to patent filings would provide greater insight into the patent life cycle, and the effectiveness of federal dollars. While I tried to do that here, this data provides a large overview of the funding by institution year over year, and does not provide direct insight into individual forms of cancer research and treatment. An area of improvement in the patent filings process, or registration of clinical trials would be to tie clinical trials conducted for a patent together. This would provide deeper insight into the exact spending of NIH grants.
Access and Testing
Access to the final presentation can be found here:
And the final dataset produced can be found here:
To avoid re-inventing the wheel, plotly was used to contain the presentation, and related data for the graphs. While a more powerful and interactive experience could have been produced with a tool such as Tableau, Plotly is free to use and easily publicly shareable. All data sets used in the presentation are also publicly viewable under my profile on Plotly.
To view how this data came to be and the process behind making it, there is an ipython notebooks hosted on my github account, which you can access here:
Only the cleaned data sets are included, as the original downloads were too large. The data sets used were:
NIH ExPORTER – https://exporter.nih.gov/
Federal RePORTER – https://federalreporter.nih.gov/ (FY full data sets, 2011-2015)
Clinical Trials – https://clinicaltrials.gov/ (full database export – 2015)
The data was filtered on the same set of keywords and terms provided by the USPTO query documentation, and limited to data only from the United States. Please see the ipython notebooks on my github to view my filtering and thought process, and different outputs for the sets of data.