Kaggle is a predictive modeling, analytics, and data sharing site where more than 500,000 users compete and collaborate to explore datasets made available by business partners who believe crowdsourced analytics may bring them greater insight. Cash prizes are awarded to top performers, and the community learns from one another by sharing strategy and code. More than 200 competitions have been run since Kaggle launched in 2010, and Kaggle has become a playground of sorts for data scientists and others interested in honing their analytic skills.
This year Kaggle conducted its first ‘State of Data Science & Machine Learning’ survey, and received input from more than 16,000 respondents. The full survey data along with visualizations of some responses can be found at kaggle.com.
Highlights of the survey are as follows:
- Education – of 15,015 respondents, almost 42% have a master’s degree; 32% have a bachelor’s degree; 15.6% have doctorates.
- Tools – of 7,955 respondents, 76% use Python; 59% use R; 40% use Jupyter notebooks.
- Algorithms/Methods – of 7,301 respondents, 63.5% use logistic regression; about 50% use decision trees; 46.3% use random forests; 37.6% use neural networks.
- Data – of 8,024 respondents, 65.5% work with relational data; 53% work with text data; 18% work with image data.
- Challenges – of 7,376 respondents, 49.4% cited dirty data; 41.6% cited a lack of data science talent; 37.2% cited a lack of management and/or financial support, and 30.4% cited the lack of a clear question to answer.
Our Data Science Bootcamp is aligned with the results of this survey in terms of the tools and methods we teach. Our first cohort is currently learning to write Python inside Jupyter Notebooks and will be learning R this spring. Additionally, we practice and add to the skills we’ve learned by focusing each new unit on understanding and answering a specific data question. We agree that understanding the question at hand is a key part of the data science process. Fortunately, we’ve also acquired some real-life messy data to help students get acquainted with the challenges of data munging. And we aspire to help grow the talent pool in Nashville and demonstrate how data science can add value to a business as the data science bootcamp continues to evolve.
As the field of data science grows, particularly here in Nashville, we’ll be curious to see how these challenges change.