At Nashville Software School, we provide our students and graduates who are still on the search for their first job in tech with opportunities to dive deeper and continue their education while on the job hunt. These graduates are referred to as Seekers.
This blog post is adapted from a talk Gaurav Mittal, data science manager at Thermo Fisher Scientific, presented to our Seekers.
Ever wonder how Amazon knows exactly what else you might want to buy? When you add items to your cart and see "suggested items" that are relevant to you, you're experiencing data science in action. Behind this seemingly simple feature lies a sophisticated system of predictive analysis algorithms processing your shopping data in real-time.
Figure 1: Relationship Venn Diagram for data science, machine learning, and artificial intelligence shared during Gaurav Mittal's presentation.
Data Science, machine learning (ML), and artificial intelligence (AI) are closely connected and often overlap, but they each have distinct roles.
Data Science encompasses the entire process of extracting insights from data. It involves data cleaning, analysis, and interpretation using various tools and techniques.
Machine Learning is a subset of capabilities within data science and AI, focusing on algorithms that can learn from and make predictions based on data. Common applications include text classification and predictive analytics.
Artificial Intelligence is the broadest field, with machine learning as one of its components. AI includes the development of systems that can perform tasks requiring human-like intelligence. It handles complex tasks like face recognition and image processing through deep learning models using libraries like Keras and TensorFlow.
Turning raw data into actionable insights isn't magic. It's a careful process that happens in three essential steps. It begins with data processing and cleaning, often considered the most crucial phase. Just like you need a strong foundation to build a house, you need clean data to derive meaningful insights. Data scientists spend significant time handling duplicate entries, managing missing values, and removing outliers to ensure their data is reliable and consistent.
Once the data is clean, the analysis and modeling phase begins. This is where machine learning models come into play, powered by programming languages like Python or R. Data scientists select and apply various models, from basic regression analysis to sophisticated deep learning algorithms, depending on what questions they're trying to answer and what patterns they're hoping to uncover.
The final step transforms findings into clear, actionable insights through interpretation and visualization. Using tools like PowerBI and Tableau, data scientists and analysts create intuitive dashboards and visual representations that help non-technical team members grasp the implications of the data and make informed decisions.
So let’s revisit the Amazon cart example. Every time you add an item to your cart, data engineers capture and process that information, feeding it into machine learning models that analyze patterns in shopping behavior. Models like recommendation systems or collaborative filtering make predictions about what products you might want to buy next. Visualizations provide insights about your buying behavior to help different departments understand everything from sales trends to product performance. When you see those eerily accurate product recommendations, you're witnessing the result of these technologies working in concert to create a personalized shopping experience.
Consider a common business challenge: managing high volumes of customer support emails from multiple global clients. Without automation, staff must manually review, categorize, and route each incoming message – a time-consuming process prone to delays and human error. “Help desk teams are always short of staff, and their [timeline to respond are] very tight," Gaurav explains. "[For example,] if customers are facing a login error, it's a P1 issue. You have to provide an answer immediately."
A solution to this challenge that Gaurav worked on combines multiple AI and machine learning approaches to automate the classification of these emails, starting with Natural Language Processing (NLP). NLP acts as a sophisticated reader, scanning through email content to extract key information, determine the nature and urgency of requests, and classify issues by type and priority. Think of it as a highly efficient digital assistant that can instantly understand whether an email contains a critical system outage report or a routine feature request.
Figure 2: NLP Process Flow Diagram for email classification shared during Gaurav Mittal's presentation. This diagram illustrates the four main stages of Natural Language Processing that Gaurav discussed: Data Collection (involving web scraping and language detection), Data Preprocessing (including removing HTML tags, tokenization, converting to lowercase, removing stopwords, and stemming), Text Mining Techniques (featuring sentiment analysis, named entity recognition, and topic modeling), and finally ML Models (using tools like TfidfVectorizer). The flow shows how text from emails progresses through these stages to ultimately classify messages by their nature and urgency.
Gaurav’s team then used image recognition technology to identify and verify client logos within email signatures and letterheads. This feature proves particularly valuable when dealing with global organizations that have region-specific branding. For instance, the system can distinguish between Walmart USA and Walmart Canada based on subtle differences in their logos, ensuring messages are routed to the appropriate regional support teams. Sentiment analysis adds another layer of intelligence to the solution. By evaluating the urgency and tone of messages, it can quickly identify high-priority issues that require immediate attention or if a user is just providing feedback.
This automated prioritization ensures that critical problems receive rapid response while routine inquiries are handled in an orderly fashion. The system then routes each request to the appropriate team, creating a streamlined workflow that maximizes efficiency and minimizes response times.
Figure 3: Automated bug triage workflow diagram from Gaurav's presentation. This flowchart illustrates the automated system for processing customer support emails. When a bug report or email is received, the system first checks if it contains the client's name. If not, it uses web scraping to extract relevant information. For emails containing images, an image classification model trained on client logos helps identify the specific client. By combining text analysis and image recognition, it can route issues to the appropriate teams without manual intervention.
The email classification system described above illustrates a crucial point about modern data science: the most valuable applications often combine multiple technologies to solve real business problems. While terms like AI and machine learning might sound abstract, they're already transforming everyday business operations. From analyzing shopping patterns on Amazon to automatically routing urgent customer support emails, these technologies are creating tangible improvements in efficiency and customer experience.
What makes these implementations successful isn't just the sophistication of the technology, but the careful consideration of practical business needs. For instance, the email classification system works not because it uses AI, but because it addresses specific pain points: overworked help desk teams, the need for rapid response to urgent issues, and the complexity of supporting global clients. So before you start talking about models, make sure you understand the problem you’re trying to solve first.