This series features mid-course projects for our Data Science Bootcamp. Students were tasked with asking an interesting data question and finding a dataset to answer the question. Next, they spent time cleaning, wrangling, and exploring the data, before designing and building an interactive Shiny app.
Early diagnosis is key for surviving cancer. In his research Ashutosh Singhal found, “[the] five-year survival rate for diagnosis at early-stage lung cancer is around 56 percent, which means people diagnosed early will live for at least five years after diagnosis. However, the five-year survival rate for people diagnosed with late-stage lung cancer is only 5 percent.”
“Studies show that both biological and social determinants limit the early diagnosis of cancer,” explained Ashutosh. “One of the most important factors from the socio-economic factors that impact the life-expectancy of cancer patients is the health insurance status/type.”
The Data Question
For his mid-course project he hypothesized, “Status and type of health insurance affect the stage of the cancer diagnosis and thereby life-expectancy.”
Cleaning The Data
At first, Ashutosh was not expecting any issues with the data. However, he discovered an issue in filtering with dplyr. One of the columns of data included characters that only worked with the string_detect method from the stringr package. He needed the data to work with the filter() method to create his Shiny app, so he created a new column using a function to change the data to a usable format.
Visualizing The Data
Explore Ashutosh’s shiny app Cancer Diagnosis & Health Insurance.
To create his visualizations, Ashutosh used ggplot2 and plotly packages to create two types of visualizations:
-
Mosaic chart - 2D bar chart to visualize the percentage of health insurance types and the percentage of cancer stages in those health insurance types
-
Cleveland dot plot - to explore differences in the relationship between cancer stage and health insurance type before and after the Affordable Care Act (ACA).
Through his visualizations, Ashutosh was able to compare five health insurance types and seven different cancer stages.
The Results
The data confirmed Ashutosh’s hypothesis: those who have health insurance experience a higher rate of early detection of cancer. Additionally, those with private insurance experienced a higher rate of early detection over those with government insurance like Medicare or Medicaid.
Cancer Stage Diagnosis By Health Insurance Type in 2016
He also discovered that post-ACA there was an increase in early detection across all health insurance types. The government-sponsored (Other/Govt) saw a significant increase, about 10%, in patients diagnosed with early-stage cancer (stage 1).
Cleveland Plot of Percentage Difference In Cancer Stage After ACA Went Into Effect for “Other/Govt” Insurance Types