DevOps Big Picture (On-Premises)

Mahdi Mallaki
ITNEXT
Published in
5 min readJun 18, 2023

--

An overview of DevOps best practices and tools for on-premises environments

Introduction

Recently, I conducted an internet search to find an overview of DevOps, aiming to assess the existing tools and practices within our company. Despite the abundance of DevOps tools and practices, I could not locate a comprehensive overview. Therefore, I embarked on a study of the DevOps ecosystem and best practices to create a holistic view, allowing us to evaluate our current state and plan for a better future.

CI (Continues Integration)

The CI section of this diagram encompasses the following components:

  • Code Repository: In the diagram, I utilized GitLab as the source control and code repository due to its user-friendly interface for repository management. It allows for hierarchical creation of groups and subgroups, providing effective control over team structures.
  • Build Tool: GitLab was also employed as the build tool in the diagram. It offers a wide range of features for writing pipelines as code and supports templating.
  • Automated Test: While there are numerous end-to-end test frameworks available, Cypress currently holds the position as the most popular one in the community. For automated security tests, you can also utilize Gitlab, which offers a comprehensive toolset that you can make use of
  • Artifact Repository: For storing docker images or helm charts, I integrated Harbor as the artifact repository. Although there are cloud-based options, using a tool like Harbor is necessary in air-gapped environments. You can refer to my other article that compares Harbor with Nexus for storing docker images:

CD (Continues Delivery)

I separated the CD repository from the source code repository due to the requirement of multiple environments for multiple clients. However, if you don’t have multiple environments for each of your products, you can combine them into a single repository.

Infrastructure as Code: To provision infrastructure (VM) and platform (Kubernetes), it is essential to use a tool like Terraform, which enables easy creation. While there are other options like Ansible or Puppet available, these tools do not support the declarative format. I highly recommend utilizing Terraform along with Gitlab for storing the state of your IAC.

Deployment Service: I have employed Gitlab as a deployment service to store environment configuration files for each application. You can create a Git repository within GitLab, store your configuration files, and define a pipeline to deploy a Helm chart to the Kubernetes cluster. Although there are other options like Spinnaker, I find it to be quite complex with numerous features that may not be necessary for your use case.

CM (Continues Monitoring)

The CM (Continuous Monitoring) section consists of the following components and relationships:

Metric Server: In the diagram, I employed Prometheus as the metric server to collect and store metrics from applications, platforms, and infrastructure.

Logs Server: I utilized the ELK stack ( Elasticsearch + Logstash + Kibana), which is highly popular in the community, for collecting and storing logs. It provides extensive capabilities for enhancing analytics dashboards based on the collected logs.

Tracing Server: For the tracing server, I opted for Jaeger. While an alternative option, Zipkin, exists, I personally recommend Jaeger due to it being a newer project with a larger community. If you’d like to learn more about how to send traces from an application to Jaeger, you can check out my other post on the topic:

Infrastructure Monitoring: There are numerous tools available for infrastructure monitoring, each with its own set of advantages and disadvantages. However, I selected Zabbix because it is an open-source project with comprehensive monitoring abilities. It is an agent-based tool, although there are also agent-less alternatives. Some companies opt for SolarWinds as an alternative.

Auto-Scaler: The Keda project is specifically designed for auto-scaling pods based on different metrics in Kubernetes. It supports various types of applications and collects metrics from them to facilitate auto-scaling. Additionally, there are other tools available for auto-scaling infrastructure and platform resources (such as VM count or Kubernetes worker nodes) based on collected metrics by Prometheus.

Alert Manager: The alert manager tool should be capable of collecting and deduplicating alerts from different systems. Alertmanager, a tool developed by the Prometheus open-source team, can receive alerts from various monitoring tools such as Prometheus, Zabbix, and Elasticsearch. It is capable of grouping, deduplicating, and filtering these alerts based on predefined rules and configurations. Moreover, it supports various notification mechanisms to deliver alerts to the support team, including email, PagerDuty, Slack, and other custom integrations.

Conclusion

the overall picture is like this:

These systems work together to ensure the reliability and resiliency of the production environment. The combination of CI + CD + CM promotes better collaboration among different teams. If you are following an agile methodology, the CI part can be handled by the development team, the CD part by the operations team, and the CM part by the monitoring team. These teams collaborate to ensure the service’s reliability.

Github

To facilitate this process, I have shared some .xml file and an open-source diagram of this overview on the following GitHub repository:

Feedback

If you have any feedback or suggestions for improving my code, please leave a comment on this post or send me a message on my LinkedIn. I would greatly appreciate your contributions to help make this article better. If you enjoyed this post, be sure to follow me to stay updated on my latest articles.

buy me a coffee at https://buymeacoffee.com/mlkmhd

--

--