DevOps and Deployment Guide: Modern Infrastructure Management

DevOps practices bridge development and operations, enabling faster, more reliable software delivery. This guide covers essential DevOps concepts from CI/CD pipelines to container orchestration, helping you build resilient, scalable infrastructure for modern applications.

1. CI/CD Pipeline Design

Pipeline Fundamentals

Design efficient CI/CD pipelines that automate testing, building, and deployment processes:

CI Best Practices

  • Commit early and often
  • Run tests automatically on every commit
  • Fail fast with quick feedback
  • Keep builds under 10 minutes
  • Use branch protection rules
  • Implement quality gates

CD Best Practices

  • Automate deployment to staging
  • Use blue-green deployments
  • Implement rollback strategies
  • Monitor deployment metrics
  • Use feature flags
  • Practice canary releases

Understanding CI/CD Pipeline Structure

A CI/CD pipeline automates the software delivery process from code commit to production deployment. Let's break down each component and understand how they work together to create a robust automated workflow.

Pipeline Stages Explained

1. Trigger Events

Define when the pipeline should run (push to main, pull requests, scheduled runs)

2. Build & Test

Install dependencies, run linters, execute unit/integration tests, generate coverage reports

3. Security Checks

Scan for vulnerabilities, check dependencies, validate security configurations

4. Build Artifacts

Create deployable artifacts (Docker images, compiled binaries, packages)

5. Deploy

Deploy to staging/production environments with rollback capabilities

GitHub Actions Workflow Implementation

Here's a comprehensive GitHub Actions workflow with detailed explanations for each section. This example demonstrates a Node.js application pipeline with multiple environments and security checks.

# .github/workflows/ci-cd.yml
# This workflow demonstrates a complete CI/CD pipeline for a Node.js application

name: CI/CD Pipeline

# TRIGGER CONFIGURATION
# Define when this workflow should run
on:
  push:
    branches: [ main, develop ]  # Run on pushes to main and develop branches
  pull_request:
    branches: [ main ]          # Run on pull requests targeting main

# ENVIRONMENT VARIABLES
# These are available to all jobs in the workflow
env:
  NODE_VERSION: '18'           # Node.js version to use
  REGISTRY: ghcr.io           # GitHub Container Registry
  IMAGE_NAME: ${{ github.repository }}  # Use repo name as image name

jobs:
  # JOB 1: TESTING AND QUALITY CHECKS
  # This job runs our tests and quality checks in parallel for faster feedback
  test:
    runs-on: ubuntu-latest     # Use latest Ubuntu runner
    steps:
      - uses: actions/checkout@v4  # Download source code
      
      # SETUP NODE.JS ENVIRONMENT
      # Cache npm dependencies for faster subsequent runs
      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: 'npm'          # Automatically cache npm dependencies
      
      # INSTALL DEPENDENCIES
      # Use 'npm ci' for faster, reproducible builds in CI
      - name: Install dependencies
        run: npm ci
      
      # CODE QUALITY CHECKS
      # Run linting to catch code style and potential issues
      - name: Run linting
        run: npm run lint
      
      # AUTOMATED TESTING
      # Run different types of tests to ensure code quality
      - name: Run unit tests
        run: npm run test:unit
        
      - name: Run integration tests
        run: npm run test:integration
        
      # COVERAGE REPORTING
      # Generate and upload test coverage reports
      - name: Generate coverage report
        run: npm run test:coverage
        
      - name: Upload coverage to Codecov
        uses: codecov/codecov-action@v3

  # JOB 2: SECURITY SCANNING
  # Separate job for security checks - can run in parallel with tests
  security-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      # DEPENDENCY VULNERABILITY SCANNING
      # Check for known vulnerabilities in dependencies
      - name: Run security audit
        run: npm audit --audit-level moderate
        
      # THIRD-PARTY SECURITY SCANNING
      # Use Snyk for comprehensive vulnerability detection
      - name: Run dependency scan
        uses: snyk/actions/node@master
        env:
          SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}  # Store sensitive tokens in secrets

  # JOB 3: BUILD AND CONTAINERIZATION
  # Only runs if tests and security checks pass
  build:
    needs: [test, security-scan]  # Wait for previous jobs to complete successfully
    runs-on: ubuntu-latest
    outputs:
      # Make build outputs available to deployment jobs
      image-tag: ${{ steps.meta.outputs.tags }}
      image-digest: ${{ steps.build.outputs.digest }}
    steps:
      - uses: actions/checkout@v4
      
      # DOCKER SETUP
      # Configure Docker Buildx for advanced build features
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      
      # CONTAINER REGISTRY AUTHENTICATION
      # Login to GitHub Container Registry using built-in token
      - name: Login to Container Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}      # GitHub username
          password: ${{ secrets.GITHUB_TOKEN }}  # Automatically provided token
      
      # DOCKER IMAGE METADATA
      # Generate tags and labels for the Docker image
      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=ref,event=branch           # Tag with branch name
            type=ref,event=pr               # Tag with PR number
            type=sha,prefix={{branch}}-     # Tag with commit SHA
            type=raw,value=latest,enable={{is_default_branch}}  # 'latest' for main branch
      
      # BUILD AND PUSH CONTAINER IMAGE
      # Multi-platform build with caching for efficiency
      - name: Build and push
        id: build
        uses: docker/build-push-action@v5
        with:
          context: .
          platforms: linux/amd64,linux/arm64  # Support multiple architectures
          push: true                          # Push to registry
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha                # Use GitHub Actions cache
          cache-to: type=gha,mode=max         # Save cache for next build

  # JOB 4: STAGING DEPLOYMENT
  # Deploy to staging environment for testing
  deploy-staging:
    if: github.ref == 'refs/heads/develop'  # Only deploy develop branch to staging
    needs: build
    runs-on: ubuntu-latest
    environment: staging                    # Use GitHub environment for approvals/secrets
    steps:
      - name: Deploy to staging
        run: |
          echo "Deploying ${{ needs.build.outputs.image-tag }} to staging"
          # Here you would add your actual deployment script
          # Examples: kubectl apply, helm upgrade, docker-compose up, etc.

  # JOB 5: PRODUCTION DEPLOYMENT
  # Deploy to production environment (main branch only)
  deploy-production:
    if: github.ref == 'refs/heads/main'     # Only deploy main branch to production
    needs: build
    runs-on: ubuntu-latest
    environment: production                 # Production environment with protection rules
    steps:
      - name: Deploy to production
        run: |
          echo "Deploying ${{ needs.build.outputs.image-tag }} to production"
          # Add your production deployment script here
          # Consider blue-green or canary deployment strategies

Pipeline Benefits

  • Fast Feedback:Parallel jobs provide quick results
  • Quality Gates:Deployment only happens after tests pass
  • Security First:Automated vulnerability scanning
  • Multi-Environment:Separate staging and production deployments
  • Efficient Builds:Caching reduces build times
  • Rollback Ready:Tagged images enable easy rollbacks

2. Containerization with Docker

Dockerfile Best Practices

Create efficient, secure Docker images with these optimization techniques:

# Multi-stage Dockerfile for Node.js application
FROM node:18-alpine AS base
WORKDIR /app
COPY package*.json ./
# Install all dependencies (including dev dependencies)
RUN npm ci

# Development stage
FROM base AS development
ENV NODE_ENV=development
COPY . .
EXPOSE 3000
CMD ["npm", "run", "dev"]

# Build stage
FROM base AS build
COPY . .
# Run tests and build
RUN npm run test && npm run build

# Production stage
FROM node:18-alpine AS production
WORKDIR /app

# Create non-root user
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nextjs -u 1001

# Install only production dependencies
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force

# Copy built application
COPY --from=build --chown=nextjs:nodejs /app/dist ./dist
COPY --from=build --chown=nextjs:nodejs /app/public ./public

# Security hardening
RUN apk --no-cache add dumb-init && \
    apk --no-cache del .build-deps

# Use non-root user
USER nextjs

# Add health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:3000/health || exit 1

EXPOSE 3000
ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "dist/server.js"]

# .dockerignore
node_modules
npm-debug.log
Dockerfile
.dockerignore
.git
.gitignore
README.md
.env
.nyc_output
coverage
.cache

Understanding Docker Compose for Development

Docker Compose solves the challenge of managing multiple interconnected services in development. Instead of manually starting databases, cache servers, and your application separately, Docker Compose orchestrates everything with a single command.

Why Docker Compose?

Without Compose
  • Install PostgreSQL locally
  • Configure Redis server
  • Set up Nginx manually
  • Manage different versions
  • Complex environment setup
With Compose
  • Single docker-compose up
  • Isolated environments
  • Version-controlled setup
  • Team consistency
  • Easy cleanup and reset
# docker-compose.yml
# This file defines a complete development environment with multiple services

# COMPOSE FILE VERSION
# Version 3.8 supports the latest Docker features and networking
version: '3.8'

# SERVICES DEFINITION
# Each service represents a different component of your application stack
services:

  # APPLICATION SERVICE
  # This is your main application container
  app:
    # BUILD CONFIGURATION
    # Build from local Dockerfile with specific target stage
    build:
      context: .              # Use current directory as build context
      target: development     # Use 'development' stage from multi-stage Dockerfile
    
    # NETWORK PORTS
    # Map host port 3000 to container port 3000
    ports:
      - "3000:3000"          # Format: "host_port:container_port"
    
    # VOLUME MOUNTS
    # Enable live code reloading during development
    volumes:
      - .:/app               # Mount current directory to /app in container
      - /app/node_modules    # Prevent overriding node_modules in container
    
    # ENVIRONMENT VARIABLES
    # Configure the application runtime environment
    environment:
      - NODE_ENV=development           # Set Node.js environment
      - DATABASE_URL=postgresql://user:pass@db:5432/myapp  # Database connection
      - REDIS_URL=redis://redis:6379   # Redis connection (using service name)
    
    # SERVICE DEPENDENCIES
    # Wait for database to be healthy before starting app
    depends_on:
      db:
        condition: service_healthy    # Wait for health check to pass
      redis:
        condition: service_started    # Just wait for service to start
    
    # NETWORK CONFIGURATION
    # Connect to custom network for service communication
    networks:
      - app-network

  # DATABASE SERVICE
  # PostgreSQL database for persistent data storage
  db:
    # USE OFFICIAL IMAGE
    # Alpine version is smaller and more secure
    image: postgres:15-alpine
    
    # DATABASE CONFIGURATION
    # Set up initial database and user credentials
    environment:
      POSTGRES_DB: myapp           # Initial database name
      POSTGRES_USER: user          # Database username
      POSTGRES_PASSWORD: pass     # Database password (use secrets in production!)
    
    # PERSISTENT DATA STORAGE
    # Mount volumes for data persistence and initialization
    volumes:
      - postgres_data:/var/lib/postgresql/data    # Persist database data
      - ./scripts/init.sql:/docker-entrypoint-initdb.d/init.sql  # Run init script
    
    # PORT EXPOSURE
    # Expose database port for external tools (pgAdmin, etc.)
    ports:
      - "5432:5432"
    
    # HEALTH CHECK CONFIGURATION
    # Ensure database is ready before dependent services start
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user -d myapp"]  # Check if DB accepts connections
      interval: 10s        # Check every 10 seconds
      timeout: 5s          # Timeout after 5 seconds
      retries: 5           # Try 5 times before marking as unhealthy
    
    # NETWORK CONFIGURATION
    networks:
      - app-network

  # REDIS CACHE SERVICE
  # In-memory data store for caching and session storage
  redis:
    # OFFICIAL REDIS IMAGE
    image: redis:7-alpine
    
    # PORT EXPOSURE
    ports:
      - "6379:6379"         # Standard Redis port
    
    # DATA PERSISTENCE
    # Persist Redis data between container restarts
    volumes:
      - redis_data:/data     # Mount named volume for data persistence
    
    # NETWORK CONFIGURATION
    networks:
      - app-network

  # REVERSE PROXY SERVICE
  # Nginx for serving static files and proxying requests
  nginx:
    # OFFICIAL NGINX IMAGE
    image: nginx:alpine
    
    # PORT EXPOSURE
    # Standard HTTP and HTTPS ports
    ports:
      - "80:80"             # HTTP traffic
      - "443:443"           # HTTPS traffic (if SSL is configured)
    
    # CONFIGURATION FILES
    # Mount custom nginx configuration and SSL certificates
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro        # Custom nginx config (read-only)
      - ./ssl:/etc/nginx/ssl:ro                       # SSL certificates (read-only)
    
    # SERVICE DEPENDENCIES
    # Nginx needs the app service to be running to proxy requests
    depends_on:
      - app
    
    # NETWORK CONFIGURATION
    networks:
      - app-network

# NAMED VOLUMES
# Persistent storage that survives container recreation
volumes:
  postgres_data:          # Database files persist here
  redis_data:            # Redis snapshots and AOF files persist here

# CUSTOM NETWORKS
# Isolated network for secure service communication
networks:
  app-network:
    driver: bridge        # Use bridge network driver for container communication

Key Concepts Explained

Service Names as Hostnames: Inside containers, you can reach 'db' service using hostname 'db'
Volume Mounts: Code changes on your host immediately reflect inside containers
Health Checks: Ensures services start in correct order and are actually ready
Named Volumes: Data survives when you recreate containers with 'docker-compose down/up'

Common Commands

# Start all services in background
docker-compose up -d

# View logs from all services
docker-compose logs -f

# Stop and remove all containers
docker-compose down

# Rebuild and restart services
docker-compose up --build

# Execute command in running service
docker-compose exec app npm test

3. Kubernetes Orchestration

Understanding Kubernetes Resource Types

Kubernetes uses different resource types to manage applications in production. Each resource serves a specific purpose in the application lifecycle. Let's understand what each resource does and why you need it.

Kubernetes Resource Hierarchy

Namespace
Isolation boundary
ConfigMap/Secret
Configuration data
Deployment
Application pods
Service
Internal networking
Ingress
External access
# k8s/namespace.yaml
# NAMESPACE: Creates an isolated environment for your application
# Think of it as a folder that groups related resources together
apiVersion: v1
kind: Namespace
metadata:
  name: myapp-production     # Name of the isolated environment
  labels:
    env: production          # Label to identify environment type

---
# CONFIGMAP: Stores non-sensitive configuration data
# Application settings that can change between environments
apiVersion: v1
kind: ConfigMap
metadata:
  name: myapp-config
  namespace: myapp-production  # Must match the namespace above
data:
  # Key-value pairs that become environment variables in your app
  NODE_ENV: "production"       # Application environment
  API_URL: "https://api.example.com"  # External API endpoint
  LOG_LEVEL: "info"           # Logging verbosity level

---
# SECRET: Stores sensitive data (passwords, API keys, certificates)
# Similar to ConfigMap but values are base64 encoded and encrypted at rest
apiVersion: v1
kind: Secret
metadata:
  name: myapp-secrets
  namespace: myapp-production
type: Opaque                  # Generic type for arbitrary user data
stringData:                   # stringData automatically encodes to base64
  DATABASE_URL: "postgresql://user:[email protected]:5432/myapp"
  JWT_SECRET: "your-jwt-secret-here"      # Never hardcode in production!
  API_KEY: "your-api-key-here"

---
# DEPLOYMENT: Manages a set of identical pods running your application
# Ensures desired number of replicas are always running
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-deployment
  namespace: myapp-production
  labels:
    app: myapp               # Label to identify this app
spec:
  # REPLICA CONFIGURATION
  replicas: 3                # Run 3 copies of your app for high availability
  
  # SELECTOR: Tells deployment which pods it manages
  selector:
    matchLabels:
      app: myapp             # Must match the pod template labels below
  
  # POD TEMPLATE: Defines how each pod should be created
  template:
    metadata:
      labels:
        app: myapp           # Labels applied to each pod
    spec:
      containers:
      - name: myapp
        # CONTAINER IMAGE
        image: ghcr.io/username/myapp:latest  # Your application image
        
        # NETWORKING
        ports:
        - containerPort: 3000  # Port your app listens on inside container
        
        # ENVIRONMENT CONFIGURATION
        env:
        - name: PORT           # Direct environment variable
          value: "3000"
        envFrom:               # Load all key-value pairs from ConfigMap/Secret
        - configMapRef:
            name: myapp-config   # Reference to ConfigMap created above
        - secretRef:
            name: myapp-secrets  # Reference to Secret created above
        
        # RESOURCE LIMITS
        # Prevent one container from consuming all cluster resources
        resources:
          requests:            # Guaranteed resources (reserved)
            memory: "256Mi"    # 256 megabytes of RAM
            cpu: "250m"        # 0.25 CPU cores (250 millicores)
          limits:              # Maximum resources (hard limit)
            memory: "512Mi"    # Cannot use more than 512MB RAM
            cpu: "500m"        # Cannot use more than 0.5 CPU cores
        
        # HEALTH CHECKS
        # Kubernetes uses these to manage pod lifecycle
        livenessProbe:         # Is the container alive?
          httpGet:
            path: /health      # Your app should implement this endpoint
            port: 3000
          initialDelaySeconds: 30  # Wait 30s after container starts
          periodSeconds: 10        # Check every 10 seconds
        
        readinessProbe:        # Is the container ready to receive traffic?
          httpGet:
            path: /ready       # Endpoint that checks if app is ready
            port: 3000
          initialDelaySeconds: 5   # Check readiness after 5 seconds
          periodSeconds: 5         # Check every 5 seconds

---
# SERVICE: Provides stable network access to your pods
# Even if pods are recreated, the service IP stays the same
apiVersion: v1
kind: Service
metadata:
  name: myapp-service
  namespace: myapp-production
spec:
  # SELECTOR: Which pods should receive traffic from this service
  selector:
    app: myapp               # Matches pods with label app=myapp
  
  # PORT CONFIGURATION
  ports:
  - protocol: TCP
    port: 80                 # Port exposed by the service
    targetPort: 3000         # Port on the pods (matches containerPort)
  
  # SERVICE TYPE
  type: ClusterIP            # Only accessible within the cluster

---
# INGRESS: Exposes your service to the internet
# Routes external HTTP/HTTPS traffic to your service
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-ingress
  namespace: myapp-production
  annotations:
    # INGRESS CONTROLLER CONFIGURATION
    kubernetes.io/ingress.class: "nginx"           # Use nginx ingress controller
    cert-manager.io/cluster-issuer: "letsencrypt-prod"  # Automatic SSL certificates
    nginx.ingress.kubernetes.io/rate-limit: "100"       # Rate limiting: 100 req/sec
spec:
  # TLS/SSL CONFIGURATION
  tls:
  - hosts:
    - myapp.example.com      # Your domain name
    secretName: myapp-tls    # Kubernetes secret containing SSL certificate
  
  # ROUTING RULES
  rules:
  - host: myapp.example.com  # Domain that should route to your app
    http:
      paths:
      - path: /              # All paths under this domain
        pathType: Prefix     # Match all paths that start with "/"
        backend:
          service:
            name: myapp-service  # Route to the service we created above
            port:
              number: 80         # Port on the service

---
# HORIZONTAL POD AUTOSCALER: Automatically scales pods based on metrics
# Increases/decreases replica count based on CPU and memory usage
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  namespace: myapp-production
spec:
  # TARGET DEPLOYMENT
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp-deployment   # Scale the deployment we created above
  
  # SCALING LIMITS
  minReplicas: 3             # Never scale below 3 pods
  maxReplicas: 20            # Never scale above 20 pods
  
  # SCALING METRICS
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70  # Scale up if average CPU > 70%
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80  # Scale up if average memory > 80%

Why Each Resource Matters

Namespace: Isolates your app from others, like having a separate apartment in a building
ConfigMap: Non-sensitive settings that vary between dev/staging/production
Secret: Encrypted storage for passwords, API keys, and certificates
Deployment: Ensures your app keeps running even if servers fail
Service: Stable network address for your app, like a phone number that doesn't change
Ingress: The front door that lets internet traffic reach your app
HPA: Automatically adds more servers during traffic spikes, removes them when quiet

4. Infrastructure as Code

Terraform Configuration

Manage cloud infrastructure declaratively with Terraform:

# terraform/main.tf
terraform {
  required_version = ">= 1.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
  
  backend "s3" {
    bucket = "myapp-terraform-state"
    key    = "infrastructure/terraform.tfstate"
    region = "us-west-2"
  }
}

provider "aws" {
  region = var.aws_region
  
  default_tags {
    tags = {
      Environment = var.environment
      Project     = var.project_name
      ManagedBy   = "terraform"
    }
  }
}

# VPC and networking
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true
  
  tags = {
    Name = "${var.project_name}-vpc"
  }
}

resource "aws_subnet" "public" {
  count = length(var.availability_zones)
  
  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index)
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true
  
  tags = {
    Name = "${var.project_name}-public-subnet-${count.index + 1}"
    Type = "public"
  }
}

resource "aws_subnet" "private" {
  count = length(var.availability_zones)
  
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
  availability_zone = var.availability_zones[count.index]
  
  tags = {
    Name = "${var.project_name}-private-subnet-${count.index + 1}"
    Type = "private"
  }
}

# EKS Cluster
resource "aws_eks_cluster" "main" {
  name     = "${var.project_name}-cluster"
  role_arn = aws_iam_role.eks_cluster_role.arn
  version  = var.kubernetes_version

  vpc_config {
    subnet_ids              = concat(aws_subnet.public[*].id, aws_subnet.private[*].id)
    endpoint_private_access = true
    endpoint_public_access  = var.enable_public_access
    public_access_cidrs     = var.public_access_cidrs
  }

  encryption_config {
    provider {
      key_arn = aws_kms_key.eks.arn
    }
    resources = ["secrets"]
  }

  depends_on = [
    aws_iam_role_policy_attachment.eks_cluster_policy,
    aws_iam_role_policy_attachment.eks_vpc_resource_controller,
  ]
}

# EKS Node Group
resource "aws_eks_node_group" "main" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "${var.project_name}-nodes"
  node_role_arn   = aws_iam_role.eks_node_role.arn
  subnet_ids      = aws_subnet.private[*].id
  instance_types  = var.node_instance_types

  scaling_config {
    desired_size = var.node_desired_size
    max_size     = var.node_max_size
    min_size     = var.node_min_size
  }

  update_config {
    max_unavailable = 1
  }

  # Ensure that IAM Role permissions are created before and deleted after EKS Node Group handling.
  depends_on = [
    aws_iam_role_policy_attachment.eks_worker_node_policy,
    aws_iam_role_policy_attachment.eks_cni_policy,
    aws_iam_role_policy_attachment.eks_container_registry_policy,
  ]
}

Ansible Configuration Management

Automate server configuration with Ansible playbooks:

# ansible/site.yml
---
- name: Configure web servers
  hosts: web_servers
  become: yes
  vars:
    app_name: myapp
    app_user: "{{ app_name }}"
    app_dir: "/opt/{{ app_name }}"
    
  tasks:
    - name: Update system packages
      apt:
        update_cache: yes
        upgrade: yes
      
    - name: Install required packages
      apt:
        name:
          - docker.io
          - docker-compose
          - nginx
          - fail2ban
          - ufw
        state: present
    
    - name: Create application user
      user:
        name: "{{ app_user }}"
        system: yes
        shell: /bin/bash
        home: "{{ app_dir }}"
        create_home: yes
    
    - name: Configure firewall
      ufw:
        rule: allow
        port: "{{ item }}"
      loop:
        - "22"   # SSH
        - "80"   # HTTP
        - "443"  # HTTPS
    
    - name: Enable firewall
      ufw:
        state: enabled
        policy: deny
        direction: incoming
    
    - name: Configure fail2ban
      template:
        src: jail.local.j2
        dest: /etc/fail2ban/jail.local
        backup: yes
      notify:
        - restart fail2ban
    
    - name: Setup SSL certificates
      include_tasks: ssl_setup.yml
      when: ssl_enabled | default(false)
    
    - name: Deploy application
      include_tasks: deploy_app.yml
      
  handlers:
    - name: restart fail2ban
      systemd:
        name: fail2ban
        state: restarted

5. Monitoring and Observability

The Three Pillars of Observability

Implement comprehensive monitoring with metrics, logs, and traces:

Metrics

  • Response time and throughput
  • Error rates and status codes
  • Resource utilization (CPU, memory)
  • Business metrics (conversions, users)

Logs

  • Structured logging (JSON format)
  • Centralized log aggregation
  • Log correlation with traces
  • Security and audit logs

Traces

  • Distributed tracing
  • Request flow visualization
  • Performance bottleneck identification
  • Service dependency mapping

Prometheus and Grafana Setup

Monitor your applications with the industry-standard Prometheus and Grafana stack:

# prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "rules/*.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'myapp'
    metrics_path: '/metrics'
    static_configs:
      - targets: ['app:3000']

  - job_name: 'nginx'
    static_configs:
      - targets: ['nginx-exporter:9113']

# Application metrics endpoint (Node.js example)
const express = require('express');
const promClient = require('prom-client');

const app = express();

// Create a Registry to register the metrics
const register = new promClient.Registry();

// Add a default label which is added to all metrics
register.setDefaultLabels({
  app: 'myapp'
});

// Enable the collection of default metrics
promClient.collectDefaultMetrics({ register });

// Create custom metrics
const httpRequestsTotal = new promClient.Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'route', 'status_code'],
});

const httpRequestDuration = new promClient.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.1, 0.5, 1, 2, 5],
});

register.registerMetric(httpRequestsTotal);
register.registerMetric(httpRequestDuration);

// Middleware to collect metrics
app.use((req, res, next) => {
  const start = Date.now();
  
  res.on('finish', () => {
    const duration = (Date.now() - start) / 1000;
    const labels = {
      method: req.method,
      route: req.route?.path || req.path,
      status_code: res.statusCode,
    };
    
    httpRequestsTotal.inc(labels);
    httpRequestDuration.observe(labels, duration);
  });
  
  next();
});

// Metrics endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  const metrics = await register.metrics();
  res.end(metrics);
});

Optimize Your DevOps Workflow

Use our tools to streamline your development and deployment processes.

YAML JSON Timestamp Converter Hash Generator