Understanding the DevOps Pipeline: Insights from a Data Science Manager

Sep 11, 2024
Mandy Arola

At Nashville Software School, we provide our students and graduates who are still on the search for their first job in tech with opportunities to dive deeper and continue their education while on the job hunt. These graduates are referred to as Seekers.

In a recent Seeker meeting, Gaurav Mittal, a data science manager at Thermo Fisher Scientific, shared valuable insights on DevOps pipelines and their benefits.

Understanding DevOps, CI, and CD

DevOps-Pipeline

DevOPs, CI and CD are often used together, but they are distinct terms.

  • DevOps is an overarching approach that covers the entire software development lifecycle, from coding to deployment and monitoring.
  • Continuous Integration (CI) frequently merges code changes into a central repository where it is QA’d by developer-written unit test or an automated test to ensure the new code is not going to negatively impact the existing code.
  • Continuous Delivery or Continuous Deployment (CD) involves the actual release of code by deploying it to production. With continuous deployment, you push your changes to the QA environment and there is manual intervention. With continuous delivery, you push your changes and it immediately goes to production. Most companies want the manual intervention as the final check before the new code goes live.

The Evolution of Testing

Manual testing is very time intensive, but DevOps pipelines can take testing efficiency to the next level. As an example Gaurav shares that manual testing may have taken 1000 hours. With  automated testing, that effort might be reduced to 100 hours. By integrating automated tests into CI/CD pipelines, teams can save significant time and resources. However, automation tests have their challenges. Because automation tests require the screen to be continuously monitored by QA engineers, you still need a large portion of their time for testing. Virtual sessions can reduce this commitment even further. 

Virtual Sessions Testing in the Background

With a CI/CD tool like GitHub Actions, you are able to run the automation test in a virtual session that is able to continuously monitor the scanning. Engineers can schedule a job which will trigger the automation test, read it, execute it, and email a report. “Everything is automated end to end,” explains Gaurav. This means the engineers can have their 100 hours back to use for other important projects

GitHub Actions Pipeline in Action

GitHub Actions integrates with other tools to create a robust CI/CD pipeline. The process begins when a developer pushes code to GitHub. GitHub Actions, triggered by this push event, initiates a series of predefined steps outlined in a YAML file. YAML, which stands for “yet another markup language” or "YAML ain't markup language" (depending who you ask), is a human-readable data serialization format commonly used for configuration files in DevOps tools. In GitHub Actions, YAML files define the workflow, specifying what actions should be taken, when they should occur, and under what conditions. These steps can include running tests, building the application, and preparing it for deployment. GitHub Actions only defines the steps. It answers “what do you want to do based on some actions.” But who will do it? Enter the runner.

DevOps-Runner

The runner is a crucial component of this pipeline and executes the jobs defined in GitHub Actions. Runners determine where and how these actions are performed. Gaurav cautioned that runners can be a significant cost factor, such as AWS EC2 instances that are billed based on usage, but companies that don’t need the scale of AWS can save costs by setting up Windows 365 virtual machines which have a fixed cost. After performing the GitHub Actions, the pipeline then interfaces with various data storage and processing tools, like Amazon S3 and Databricks, depending on the project's needs. Additionally, the pipeline can trigger automation tests to ensure the new build works correctly with existing systems.

Security in DevOps

Gaurav emphasized the importance of integrating security checks into the pipeline, recommending tools like CodeQL for static code analysis. It can catch if there is sensitive information in your code, like you are printing an access token or password. 

A Three-Pipeline Architecture

Putting this all together, Gaurav outlined a robust three-pipeline architecture that forms the backbone of their DevOps process:

DevOps-Final-Architecture

Pipeline 1: Code Push and Basic Checks

The first pipeline is triggered on each pull request and serves as the initial quality gate. It begins with static code analysis, utilizing tools like Pylint for Python or SQLFluff for SQL, to identify potential issues such as SQL injections. This step is crucial for catching common errors early in the development process. Following the static analysis, the pipeline runs unit tests to ensure that new functions and features work as intended. This pipeline is designed for speed, typically completing within seconds, to provide developers with quick feedback on their code changes.

Pipeline 2: Weekly Security Scans

The second pipeline is scheduled to run weekly and focuses on more intensive security checks using tools like CodeQL. Gaurav emphasized the importance of separating this pipeline from the main workflow due to its resource-intensive nature. CodeQL scans can take between 3 to 7 minutes to complete, which would significantly slow down the regular development process if run on every code push. By scheduling these scans weekly, the team can thoroughly analyze the codebase for security vulnerabilities without impacting daily development velocity. This approach allows developers to address any identified security issues collectively each week.

Pipeline 3: Automation and Regression Testing

The final pipeline is dedicated to comprehensive automation and regression testing. This pipeline is triggered after the code has passed the initial checks and been merged into a target branch. It runs a suite of about 200 tests covering various features and typically takes 2 to 3 hours to complete. The purpose of this pipeline is to ensure that new code changes don't introduce regressions or break existing functionality. By running these extensive tests separately from the main development pipeline, the team can maintain a balance between thorough testing and development speed. This pipeline serves as the final quality assurance step before code is considered ready for production deployment.

This three-pipeline architecture has been successfully implemented across more than 90 project repositories in Gaurav's organization, providing a robust framework for maintaining code quality, security, and functionality throughout the development lifecycle.

DevOps pipelines offer numerous benefits, from increased efficiency to improved security. By adopting these practices and tools, development teams can streamline their processes and deliver higher quality software more rapidly. 

For junior developers, data analysts, and data scientists, understanding the DevOps pipeline helps you become an effective collaborator across teams–from coding to deployment. Knowledge of these processes allows you to write more maintainable code, catch errors early, and contribute to a more efficient development cycle. DevOps is also a path you can pursue as you grow in your career.

Remember, the key to successful DevOps implementation is continuous improvement. Start with what works for your team and iterate as you grow.


Gaurav_MittalGaurav Mittal is a seasoned IT Manager with 15+ years of leadership experience, adept at guiding teams in developing and deploying cutting-edge technology solutions. Specializing in strategic IT planning, budget management, and project execution, he excels in AWS Cloud, security protocols, and container technologies. Gaurav is skilled in Java, Python, Node.js, and CI/CD pipelines, with a robust background in database management (Aurora, Redshift, DynamoDB). His achievements include substantial cost savings through innovative solutions and enhancing operational efficiency. Gaurav is recognized for his leadership, problem-solving abilities, and commitment to delivering exceptional IT services aligned with organizational goals.

Topics: Analytics + Data Science, Technology Insights, Web Development, Software Engineering