DevOps Big Picture (On-Premises)
An overview of DevOps best practices and tools for on-premises environments
Introduction
Recently, I conducted an internet search to find an overview of DevOps, aiming to assess the existing tools and practices within our company. Despite the abundance of DevOps tools and practices, I could not locate a comprehensive overview. Therefore, I embarked on a study of the DevOps ecosystem and best practices to create a holistic view, allowing us to evaluate our current state and plan for a better future.
CI (Continues Integration)
The CI section of this diagram encompasses the following components:
- Code Repository: In the diagram, I utilized
GitLab
as the source control and code repository due to its user-friendly interface for repository management. It allows for hierarchical creation of groups and subgroups, providing effective control over team structures. - Build Tool:
GitLab
was also employed as the build tool in the diagram. It offers a wide range of features for writing pipelines as code and supports templating. - Automated Test: While there are numerous end-to-end test frameworks available,
Cypress
currently holds the position as the most popular one in the community. For automated security tests, you can also utilizeGitlab
, which offers a comprehensive toolset that you can make use of - Artifact Repository: For storing docker images or helm charts, I integrated
Harbor
as the artifact repository. Although there are cloud-based options, using a tool likeHarbor
is necessary in air-gapped environments. You can refer to my other article that comparesHarbor
withNexus
for storing docker images:
CD (Continues Delivery)
I separated the CD repository from the source code repository due to the requirement of multiple environments for multiple clients. However, if you don’t have multiple environments for each of your products, you can combine them into a single repository.
Infrastructure as Code: To provision infrastructure (VM) and platform (Kubernetes), it is essential to use a tool like Terraform
, which enables easy creation. While there are other options like Ansible
or Puppet
available, these tools do not support the declarative format. I highly recommend utilizing Terraform
along with Gitlab
for storing the state of your IAC.
Deployment Service: I have employed Gitlab
as a deployment service to store environment configuration files for each application. You can create a Git repository within GitLab
, store your configuration files, and define a pipeline to deploy a Helm chart to the Kubernetes cluster. Although there are other options like Spinnaker
, I find it to be quite complex with numerous features that may not be necessary for your use case.
CM (Continues Monitoring)
The CM (Continuous Monitoring) section consists of the following components and relationships:
Metric Server: In the diagram, I employed Prometheus
as the metric server to collect and store metrics from applications, platforms, and infrastructure.
Logs Server: I utilized the ELK
stack ( Elasticsearch
+ Logstash
+ Kibana
), which is highly popular in the community, for collecting and storing logs. It provides extensive capabilities for enhancing analytics dashboards based on the collected logs.
Tracing Server: For the tracing server, I opted for Jaeger
. While an alternative option, Zipkin
, exists, I personally recommend Jaeger
due to it being a newer project with a larger community. If you’d like to learn more about how to send traces from an application to Jaeger, you can check out my other post on the topic:
Infrastructure Monitoring: There are numerous tools available for infrastructure monitoring, each with its own set of advantages and disadvantages. However, I selected Zabbix
because it is an open-source project with comprehensive monitoring abilities. It is an agent-based tool, although there are also agent-less alternatives. Some companies opt for SolarWinds
as an alternative.
Auto-Scaler: The Keda
project is specifically designed for auto-scaling pods based on different metrics in Kubernetes. It supports various types of applications and collects metrics from them to facilitate auto-scaling. Additionally, there are other tools available for auto-scaling infrastructure and platform resources (such as VM count or Kubernetes worker nodes) based on collected metrics by Prometheus
.
Alert Manager: The alert manager tool should be capable of collecting and deduplicating alerts from different systems. Alertmanager
, a tool developed by the Prometheus open-source team, can receive alerts from various monitoring tools such as Prometheus
, Zabbix
, and Elasticsearch
. It is capable of grouping, deduplicating, and filtering these alerts based on predefined rules and configurations. Moreover, it supports various notification mechanisms to deliver alerts to the support team, including email, PagerDuty
, Slack
, and other custom integrations.
Conclusion
the overall picture is like this:
These systems work together to ensure the reliability and resiliency of the production environment. The combination of CI + CD + CM promotes better collaboration among different teams. If you are following an agile methodology, the CI part can be handled by the development team, the CD part by the operations team, and the CM part by the monitoring team. These teams collaborate to ensure the service’s reliability.
Github
To facilitate this process, I have shared some .xml
file and an open-source diagram of this overview on the following GitHub repository:
Feedback
If you have any feedback or suggestions for improving my code, please leave a comment on this post or send me a message on my LinkedIn. I would greatly appreciate your contributions to help make this article better. If you enjoyed this post, be sure to follow me to stay updated on my latest articles.