DevOps and Deployment Guide: Modern Infrastructure Management
DevOps practices bridge development and operations, enabling faster, more reliable software delivery. This guide
covers essential DevOps concepts from CI/CD pipelines to container orchestration, helping you build resilient,
scalable infrastructure for modern applications.
1. CI/CD Pipeline Design
Pipeline Fundamentals
Design efficient CI/CD pipelines that automate testing, building, and deployment processes:
CI Best Practices
- Commit early and often
- Run tests automatically on every commit
- Fail fast with quick feedback
- Keep builds under 10 minutes
- Use branch protection rules
- Implement quality gates
CD Best Practices
- Automate deployment to staging
- Use blue-green deployments
- Implement rollback strategies
- Monitor deployment metrics
- Use feature flags
- Practice canary releases
Understanding CI/CD Pipeline
Structure
A CI/CD pipeline automates the software delivery process from code commit to production deployment. Let's break
down each component and understand how they work together to create a robust automated workflow.
Pipeline Stages Explained
1. Trigger Events
Define when the pipeline should run (push to main, pull requests, scheduled runs)
2. Build & Test
Install dependencies, run linters, execute unit/integration tests, generate coverage reports
3. Security Checks
Scan for vulnerabilities, check dependencies, validate security configurations
4. Build Artifacts
Create deployable artifacts (Docker images, compiled binaries, packages)
5. Deploy
Deploy to staging/production environments with rollback capabilities
GitHub Actions Workflow
Implementation
Here's a comprehensive GitHub Actions workflow with detailed explanations for each section. This example
demonstrates a Node.js application pipeline with multiple environments and security checks.
# .github/workflows/ci-cd.yml
# This workflow demonstrates a complete CI/CD pipeline for a Node.js application
name: CI/CD Pipeline
# TRIGGER CONFIGURATION
# Define when this workflow should run
on:
push:
branches: [ main, develop ] # Run on pushes to main and develop branches
pull_request:
branches: [ main ] # Run on pull requests targeting main
# ENVIRONMENT VARIABLES
# These are available to all jobs in the workflow
env:
NODE_VERSION: '18' # Node.js version to use
REGISTRY: ghcr.io # GitHub Container Registry
IMAGE_NAME: ${{ github.repository }} # Use repo name as image name
jobs:
# JOB 1: TESTING AND QUALITY CHECKS
# This job runs our tests and quality checks in parallel for faster feedback
test:
runs-on: ubuntu-latest # Use latest Ubuntu runner
steps:
- uses: actions/checkout@v4 # Download source code
# SETUP NODE.JS ENVIRONMENT
# Cache npm dependencies for faster subsequent runs
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'npm' # Automatically cache npm dependencies
# INSTALL DEPENDENCIES
# Use 'npm ci' for faster, reproducible builds in CI
- name: Install dependencies
run: npm ci
# CODE QUALITY CHECKS
# Run linting to catch code style and potential issues
- name: Run linting
run: npm run lint
# AUTOMATED TESTING
# Run different types of tests to ensure code quality
- name: Run unit tests
run: npm run test:unit
- name: Run integration tests
run: npm run test:integration
# COVERAGE REPORTING
# Generate and upload test coverage reports
- name: Generate coverage report
run: npm run test:coverage
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
# JOB 2: SECURITY SCANNING
# Separate job for security checks - can run in parallel with tests
security-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# DEPENDENCY VULNERABILITY SCANNING
# Check for known vulnerabilities in dependencies
- name: Run security audit
run: npm audit --audit-level moderate
# THIRD-PARTY SECURITY SCANNING
# Use Snyk for comprehensive vulnerability detection
- name: Run dependency scan
uses: snyk/actions/node@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }} # Store sensitive tokens in secrets
# JOB 3: BUILD AND CONTAINERIZATION
# Only runs if tests and security checks pass
build:
needs: [test, security-scan] # Wait for previous jobs to complete successfully
runs-on: ubuntu-latest
outputs:
# Make build outputs available to deployment jobs
image-tag: ${{ steps.meta.outputs.tags }}
image-digest: ${{ steps.build.outputs.digest }}
steps:
- uses: actions/checkout@v4
# DOCKER SETUP
# Configure Docker Buildx for advanced build features
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
# CONTAINER REGISTRY AUTHENTICATION
# Login to GitHub Container Registry using built-in token
- name: Login to Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }} # GitHub username
password: ${{ secrets.GITHUB_TOKEN }} # Automatically provided token
# DOCKER IMAGE METADATA
# Generate tags and labels for the Docker image
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=ref,event=branch # Tag with branch name
type=ref,event=pr # Tag with PR number
type=sha,prefix={{branch}}- # Tag with commit SHA
type=raw,value=latest,enable={{is_default_branch}} # 'latest' for main branch
# BUILD AND PUSH CONTAINER IMAGE
# Multi-platform build with caching for efficiency
- name: Build and push
id: build
uses: docker/build-push-action@v5
with:
context: .
platforms: linux/amd64,linux/arm64 # Support multiple architectures
push: true # Push to registry
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha # Use GitHub Actions cache
cache-to: type=gha,mode=max # Save cache for next build
# JOB 4: STAGING DEPLOYMENT
# Deploy to staging environment for testing
deploy-staging:
if: github.ref == 'refs/heads/develop' # Only deploy develop branch to staging
needs: build
runs-on: ubuntu-latest
environment: staging # Use GitHub environment for approvals/secrets
steps:
- name: Deploy to staging
run: |
echo "Deploying ${{ needs.build.outputs.image-tag }} to staging"
# Here you would add your actual deployment script
# Examples: kubectl apply, helm upgrade, docker-compose up, etc.
# JOB 5: PRODUCTION DEPLOYMENT
# Deploy to production environment (main branch only)
deploy-production:
if: github.ref == 'refs/heads/main' # Only deploy main branch to production
needs: build
runs-on: ubuntu-latest
environment: production # Production environment with protection rules
steps:
- name: Deploy to production
run: |
echo "Deploying ${{ needs.build.outputs.image-tag }} to production"
# Add your production deployment script here
# Consider blue-green or canary deployment strategies
Pipeline Benefits
- Fast Feedback:Parallel jobs provide quick results
- Quality Gates:Deployment only happens after tests pass
- Security First:Automated vulnerability scanning
- Multi-Environment:Separate staging and production deployments
- Efficient Builds:Caching reduces build times
- Rollback Ready:Tagged images enable easy rollbacks
2. Containerization with Docker
Dockerfile Best Practices
Create efficient, secure Docker images with these optimization techniques:
# Multi-stage Dockerfile for Node.js application
FROM node:18-alpine AS base
WORKDIR /app
COPY package*.json ./
# Install all dependencies (including dev dependencies)
RUN npm ci
# Development stage
FROM base AS development
ENV NODE_ENV=development
COPY . .
EXPOSE 3000
CMD ["npm", "run", "dev"]
# Build stage
FROM base AS build
COPY . .
# Run tests and build
RUN npm run test && npm run build
# Production stage
FROM node:18-alpine AS production
WORKDIR /app
# Create non-root user
RUN addgroup -g 1001 -S nodejs && \
adduser -S nextjs -u 1001
# Install only production dependencies
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force
# Copy built application
COPY --from=build --chown=nextjs:nodejs /app/dist ./dist
COPY --from=build --chown=nextjs:nodejs /app/public ./public
# Security hardening
RUN apk --no-cache add dumb-init && \
apk --no-cache del .build-deps
# Use non-root user
USER nextjs
# Add health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:3000/health || exit 1
EXPOSE 3000
ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "dist/server.js"]
# .dockerignore
node_modules
npm-debug.log
Dockerfile
.dockerignore
.git
.gitignore
README.md
.env
.nyc_output
coverage
.cache
Understanding Docker Compose
for Development
Docker Compose solves the challenge of managing multiple interconnected services in development. Instead of
manually starting databases, cache servers, and your application separately, Docker Compose orchestrates
everything with a single command.
Why Docker Compose?
Without Compose
- Install PostgreSQL locally
- Configure Redis server
- Set up Nginx manually
- Manage different versions
- Complex environment setup
With Compose
- Single
docker-compose up
- Isolated environments
- Version-controlled setup
- Team consistency
- Easy cleanup and reset
# docker-compose.yml
# This file defines a complete development environment with multiple services
# COMPOSE FILE VERSION
# Version 3.8 supports the latest Docker features and networking
version: '3.8'
# SERVICES DEFINITION
# Each service represents a different component of your application stack
services:
# APPLICATION SERVICE
# This is your main application container
app:
# BUILD CONFIGURATION
# Build from local Dockerfile with specific target stage
build:
context: . # Use current directory as build context
target: development # Use 'development' stage from multi-stage Dockerfile
# NETWORK PORTS
# Map host port 3000 to container port 3000
ports:
- "3000:3000" # Format: "host_port:container_port"
# VOLUME MOUNTS
# Enable live code reloading during development
volumes:
- .:/app # Mount current directory to /app in container
- /app/node_modules # Prevent overriding node_modules in container
# ENVIRONMENT VARIABLES
# Configure the application runtime environment
environment:
- NODE_ENV=development # Set Node.js environment
- DATABASE_URL=postgresql://user:pass@db:5432/myapp # Database connection
- REDIS_URL=redis://redis:6379 # Redis connection (using service name)
# SERVICE DEPENDENCIES
# Wait for database to be healthy before starting app
depends_on:
db:
condition: service_healthy # Wait for health check to pass
redis:
condition: service_started # Just wait for service to start
# NETWORK CONFIGURATION
# Connect to custom network for service communication
networks:
- app-network
# DATABASE SERVICE
# PostgreSQL database for persistent data storage
db:
# USE OFFICIAL IMAGE
# Alpine version is smaller and more secure
image: postgres:15-alpine
# DATABASE CONFIGURATION
# Set up initial database and user credentials
environment:
POSTGRES_DB: myapp # Initial database name
POSTGRES_USER: user # Database username
POSTGRES_PASSWORD: pass # Database password (use secrets in production!)
# PERSISTENT DATA STORAGE
# Mount volumes for data persistence and initialization
volumes:
- postgres_data:/var/lib/postgresql/data # Persist database data
- ./scripts/init.sql:/docker-entrypoint-initdb.d/init.sql # Run init script
# PORT EXPOSURE
# Expose database port for external tools (pgAdmin, etc.)
ports:
- "5432:5432"
# HEALTH CHECK CONFIGURATION
# Ensure database is ready before dependent services start
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user -d myapp"] # Check if DB accepts connections
interval: 10s # Check every 10 seconds
timeout: 5s # Timeout after 5 seconds
retries: 5 # Try 5 times before marking as unhealthy
# NETWORK CONFIGURATION
networks:
- app-network
# REDIS CACHE SERVICE
# In-memory data store for caching and session storage
redis:
# OFFICIAL REDIS IMAGE
image: redis:7-alpine
# PORT EXPOSURE
ports:
- "6379:6379" # Standard Redis port
# DATA PERSISTENCE
# Persist Redis data between container restarts
volumes:
- redis_data:/data # Mount named volume for data persistence
# NETWORK CONFIGURATION
networks:
- app-network
# REVERSE PROXY SERVICE
# Nginx for serving static files and proxying requests
nginx:
# OFFICIAL NGINX IMAGE
image: nginx:alpine
# PORT EXPOSURE
# Standard HTTP and HTTPS ports
ports:
- "80:80" # HTTP traffic
- "443:443" # HTTPS traffic (if SSL is configured)
# CONFIGURATION FILES
# Mount custom nginx configuration and SSL certificates
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro # Custom nginx config (read-only)
- ./ssl:/etc/nginx/ssl:ro # SSL certificates (read-only)
# SERVICE DEPENDENCIES
# Nginx needs the app service to be running to proxy requests
depends_on:
- app
# NETWORK CONFIGURATION
networks:
- app-network
# NAMED VOLUMES
# Persistent storage that survives container recreation
volumes:
postgres_data: # Database files persist here
redis_data: # Redis snapshots and AOF files persist here
# CUSTOM NETWORKS
# Isolated network for secure service communication
networks:
app-network:
driver: bridge # Use bridge network driver for container communication
Key Concepts Explained
Service Names as Hostnames:
Inside containers, you can reach 'db' service using hostname 'db'
Volume Mounts:
Code changes on your host immediately reflect inside containers
Health Checks:
Ensures services start in correct order and are actually ready
Named Volumes:
Data survives when you recreate containers with 'docker-compose down/up'
Common Commands
# Start all services in background
docker-compose up -d
# View logs from all services
docker-compose logs -f
# Stop and remove all containers
docker-compose down
# Rebuild and restart services
docker-compose up --build
# Execute command in running service
docker-compose exec app npm test
3. Kubernetes Orchestration
Understanding Kubernetes
Resource Types
Kubernetes uses different resource types to manage applications in production. Each resource serves a specific
purpose in the application lifecycle. Let's understand what each resource does and why you need it.
Kubernetes Resource Hierarchy
Namespace
Isolation boundary
ConfigMap/Secret
Configuration data
Deployment
Application pods
Service
Internal networking
Ingress
External access
# k8s/namespace.yaml
# NAMESPACE: Creates an isolated environment for your application
# Think of it as a folder that groups related resources together
apiVersion: v1
kind: Namespace
metadata:
name: myapp-production # Name of the isolated environment
labels:
env: production # Label to identify environment type
---
# CONFIGMAP: Stores non-sensitive configuration data
# Application settings that can change between environments
apiVersion: v1
kind: ConfigMap
metadata:
name: myapp-config
namespace: myapp-production # Must match the namespace above
data:
# Key-value pairs that become environment variables in your app
NODE_ENV: "production" # Application environment
API_URL: "https://api.example.com" # External API endpoint
LOG_LEVEL: "info" # Logging verbosity level
---
# SECRET: Stores sensitive data (passwords, API keys, certificates)
# Similar to ConfigMap but values are base64 encoded and encrypted at rest
apiVersion: v1
kind: Secret
metadata:
name: myapp-secrets
namespace: myapp-production
type: Opaque # Generic type for arbitrary user data
stringData: # stringData automatically encodes to base64
DATABASE_URL: "postgresql://user:[email protected]:5432/myapp"
JWT_SECRET: "your-jwt-secret-here" # Never hardcode in production!
API_KEY: "your-api-key-here"
---
# DEPLOYMENT: Manages a set of identical pods running your application
# Ensures desired number of replicas are always running
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-deployment
namespace: myapp-production
labels:
app: myapp # Label to identify this app
spec:
# REPLICA CONFIGURATION
replicas: 3 # Run 3 copies of your app for high availability
# SELECTOR: Tells deployment which pods it manages
selector:
matchLabels:
app: myapp # Must match the pod template labels below
# POD TEMPLATE: Defines how each pod should be created
template:
metadata:
labels:
app: myapp # Labels applied to each pod
spec:
containers:
- name: myapp
# CONTAINER IMAGE
image: ghcr.io/username/myapp:latest # Your application image
# NETWORKING
ports:
- containerPort: 3000 # Port your app listens on inside container
# ENVIRONMENT CONFIGURATION
env:
- name: PORT # Direct environment variable
value: "3000"
envFrom: # Load all key-value pairs from ConfigMap/Secret
- configMapRef:
name: myapp-config # Reference to ConfigMap created above
- secretRef:
name: myapp-secrets # Reference to Secret created above
# RESOURCE LIMITS
# Prevent one container from consuming all cluster resources
resources:
requests: # Guaranteed resources (reserved)
memory: "256Mi" # 256 megabytes of RAM
cpu: "250m" # 0.25 CPU cores (250 millicores)
limits: # Maximum resources (hard limit)
memory: "512Mi" # Cannot use more than 512MB RAM
cpu: "500m" # Cannot use more than 0.5 CPU cores
# HEALTH CHECKS
# Kubernetes uses these to manage pod lifecycle
livenessProbe: # Is the container alive?
httpGet:
path: /health # Your app should implement this endpoint
port: 3000
initialDelaySeconds: 30 # Wait 30s after container starts
periodSeconds: 10 # Check every 10 seconds
readinessProbe: # Is the container ready to receive traffic?
httpGet:
path: /ready # Endpoint that checks if app is ready
port: 3000
initialDelaySeconds: 5 # Check readiness after 5 seconds
periodSeconds: 5 # Check every 5 seconds
---
# SERVICE: Provides stable network access to your pods
# Even if pods are recreated, the service IP stays the same
apiVersion: v1
kind: Service
metadata:
name: myapp-service
namespace: myapp-production
spec:
# SELECTOR: Which pods should receive traffic from this service
selector:
app: myapp # Matches pods with label app=myapp
# PORT CONFIGURATION
ports:
- protocol: TCP
port: 80 # Port exposed by the service
targetPort: 3000 # Port on the pods (matches containerPort)
# SERVICE TYPE
type: ClusterIP # Only accessible within the cluster
---
# INGRESS: Exposes your service to the internet
# Routes external HTTP/HTTPS traffic to your service
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-ingress
namespace: myapp-production
annotations:
# INGRESS CONTROLLER CONFIGURATION
kubernetes.io/ingress.class: "nginx" # Use nginx ingress controller
cert-manager.io/cluster-issuer: "letsencrypt-prod" # Automatic SSL certificates
nginx.ingress.kubernetes.io/rate-limit: "100" # Rate limiting: 100 req/sec
spec:
# TLS/SSL CONFIGURATION
tls:
- hosts:
- myapp.example.com # Your domain name
secretName: myapp-tls # Kubernetes secret containing SSL certificate
# ROUTING RULES
rules:
- host: myapp.example.com # Domain that should route to your app
http:
paths:
- path: / # All paths under this domain
pathType: Prefix # Match all paths that start with "/"
backend:
service:
name: myapp-service # Route to the service we created above
port:
number: 80 # Port on the service
---
# HORIZONTAL POD AUTOSCALER: Automatically scales pods based on metrics
# Increases/decreases replica count based on CPU and memory usage
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
namespace: myapp-production
spec:
# TARGET DEPLOYMENT
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp-deployment # Scale the deployment we created above
# SCALING LIMITS
minReplicas: 3 # Never scale below 3 pods
maxReplicas: 20 # Never scale above 20 pods
# SCALING METRICS
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Scale up if average CPU > 70%
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80 # Scale up if average memory > 80%
Why Each Resource Matters
Namespace:
Isolates your app from others, like having a separate apartment in a
building
ConfigMap:
Non-sensitive settings that vary between dev/staging/production
Secret:
Encrypted storage for passwords, API keys, and certificates
Deployment:
Ensures your app keeps running even if servers fail
Service:
Stable network address for your app, like a phone number that doesn't
change
Ingress:
The front door that lets internet traffic reach your app
HPA:
Automatically adds more servers during traffic spikes, removes them when
quiet
4. Infrastructure as Code
Terraform Configuration
Manage cloud infrastructure declaratively with Terraform:
# terraform/main.tf
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "myapp-terraform-state"
key = "infrastructure/terraform.tfstate"
region = "us-west-2"
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Environment = var.environment
Project = var.project_name
ManagedBy = "terraform"
}
}
}
# VPC and networking
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.project_name}-vpc"
}
}
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.project_name}-public-subnet-${count.index + 1}"
Type = "public"
}
}
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.project_name}-private-subnet-${count.index + 1}"
Type = "private"
}
}
# EKS Cluster
resource "aws_eks_cluster" "main" {
name = "${var.project_name}-cluster"
role_arn = aws_iam_role.eks_cluster_role.arn
version = var.kubernetes_version
vpc_config {
subnet_ids = concat(aws_subnet.public[*].id, aws_subnet.private[*].id)
endpoint_private_access = true
endpoint_public_access = var.enable_public_access
public_access_cidrs = var.public_access_cidrs
}
encryption_config {
provider {
key_arn = aws_kms_key.eks.arn
}
resources = ["secrets"]
}
depends_on = [
aws_iam_role_policy_attachment.eks_cluster_policy,
aws_iam_role_policy_attachment.eks_vpc_resource_controller,
]
}
# EKS Node Group
resource "aws_eks_node_group" "main" {
cluster_name = aws_eks_cluster.main.name
node_group_name = "${var.project_name}-nodes"
node_role_arn = aws_iam_role.eks_node_role.arn
subnet_ids = aws_subnet.private[*].id
instance_types = var.node_instance_types
scaling_config {
desired_size = var.node_desired_size
max_size = var.node_max_size
min_size = var.node_min_size
}
update_config {
max_unavailable = 1
}
# Ensure that IAM Role permissions are created before and deleted after EKS Node Group handling.
depends_on = [
aws_iam_role_policy_attachment.eks_worker_node_policy,
aws_iam_role_policy_attachment.eks_cni_policy,
aws_iam_role_policy_attachment.eks_container_registry_policy,
]
}
Ansible Configuration
Management
Automate server configuration with Ansible playbooks:
# ansible/site.yml
---
- name: Configure web servers
hosts: web_servers
become: yes
vars:
app_name: myapp
app_user: "{{ app_name }}"
app_dir: "/opt/{{ app_name }}"
tasks:
- name: Update system packages
apt:
update_cache: yes
upgrade: yes
- name: Install required packages
apt:
name:
- docker.io
- docker-compose
- nginx
- fail2ban
- ufw
state: present
- name: Create application user
user:
name: "{{ app_user }}"
system: yes
shell: /bin/bash
home: "{{ app_dir }}"
create_home: yes
- name: Configure firewall
ufw:
rule: allow
port: "{{ item }}"
loop:
- "22" # SSH
- "80" # HTTP
- "443" # HTTPS
- name: Enable firewall
ufw:
state: enabled
policy: deny
direction: incoming
- name: Configure fail2ban
template:
src: jail.local.j2
dest: /etc/fail2ban/jail.local
backup: yes
notify:
- restart fail2ban
- name: Setup SSL certificates
include_tasks: ssl_setup.yml
when: ssl_enabled | default(false)
- name: Deploy application
include_tasks: deploy_app.yml
handlers:
- name: restart fail2ban
systemd:
name: fail2ban
state: restarted
5. Monitoring and Observability
The Three Pillars of
Observability
Implement comprehensive monitoring with metrics, logs, and traces:
Metrics
- Response time and throughput
- Error rates and status codes
- Resource utilization (CPU, memory)
- Business metrics (conversions, users)
Logs
- Structured logging (JSON format)
- Centralized log aggregation
- Log correlation with traces
- Security and audit logs
Traces
- Distributed tracing
- Request flow visualization
- Performance bottleneck identification
- Service dependency mapping
Prometheus and Grafana Setup
Monitor your applications with the industry-standard Prometheus and Grafana stack:
# prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "rules/*.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'myapp'
metrics_path: '/metrics'
static_configs:
- targets: ['app:3000']
- job_name: 'nginx'
static_configs:
- targets: ['nginx-exporter:9113']
# Application metrics endpoint (Node.js example)
const express = require('express');
const promClient = require('prom-client');
const app = express();
// Create a Registry to register the metrics
const register = new promClient.Registry();
// Add a default label which is added to all metrics
register.setDefaultLabels({
app: 'myapp'
});
// Enable the collection of default metrics
promClient.collectDefaultMetrics({ register });
// Create custom metrics
const httpRequestsTotal = new promClient.Counter({
name: 'http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status_code'],
});
const httpRequestDuration = new promClient.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.1, 0.5, 1, 2, 5],
});
register.registerMetric(httpRequestsTotal);
register.registerMetric(httpRequestDuration);
// Middleware to collect metrics
app.use((req, res, next) => {
const start = Date.now();
res.on('finish', () => {
const duration = (Date.now() - start) / 1000;
const labels = {
method: req.method,
route: req.route?.path || req.path,
status_code: res.statusCode,
};
httpRequestsTotal.inc(labels);
httpRequestDuration.observe(labels, duration);
});
next();
});
// Metrics endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', register.contentType);
const metrics = await register.metrics();
res.end(metrics);
});
Optimize Your DevOps Workflow
Use our tools to streamline your development and deployment
processes.