Blog

Data Science Cohort 2 Dives Into Data Cleaning and Exploration

Written by Mary van Valkenburg | Sep 10, 2018

Last week Data Science Cohort 2 started work on their first team project. The data for Data Question 2 holds information about computing jobs that were submitted over the course of a year at ACCRE, Vanderbilt’s Advanced Computing Center for Research and Education. Dr. Will French, Director of Research Computing Operations at ACCRE, helped the class understand the world of high performance computing and presented the questions he hopes students will be able to answer as they practice skills in data cleaning and exploration.

After introducing students to ACCRE’s co-op like approach to sharing resources so that intensive computing jobs can be completed more effectively and more efficiently, Will gave an overview of SLURM (Simple Linux Utility for Resource Management) and the world of scheduling jobs. He posed three questions he hopes teams can answer:

  1. How do the type of requested resource and max wall time impact queue time for a job?
  2. Are there particular nodes that are resulting in job failures more frequently than others?
  3. Are there users running large numbers (> 500) of very short (< 5 minutes) jobs within a 4-hour window? Are any of them repeat offenders?
The first challenge? How to read a 3GB text file into a pandas DataFrame!

Will returns to class on September 20 to see what students uncovered. We are grateful to our first education data partner for the second cohort for an engaging presentation and interesting question.