Infrastructure in a Box: Dev, Staging, and Production Deployment Pipeline

Modern infrastructure requires consistent, repeatable deployments across development, staging, and production environments. Learn how to build a complete "Infrastructure in a Box" solution with automated promotion pipelines.

The Infrastructure Pipeline

Environment Progression

Developer Laptop (Local Dev)
         ↓
Development Environment (Shared Dev)
         ↓
Staging Environment (Pre-Production)
         ↓
Production Environment (Live)

Architecture Overview

Complete Stack

Infrastructure Layers:
  Compute:
    - Kubernetes clusters (dev/staging/prod)
    - VM infrastructure (VMware/KVM)
    - Bare metal servers
  
  Networking:
    - VLANs per environment
    - Load balancers
    - Firewalls
    - DNS
  
  Storage:
    - Persistent volumes
    - Object storage (S3/MinIO)
    - Databases
  
  Observability:
    - Prometheus + Grafana
    - ELK Stack
    - Distributed tracing
  
  Security:
    - Vault for secrets
    - Network policies
    - RBAC
    - Certificate management

Environment Specifications

Development Environment

Purpose: Rapid iteration and testing
Scale: Minimal resources
Characteristics:
  - Shared by development team
  - Frequent deployments (10+ per day)
  - Short-lived feature branches
  - Relaxed security policies
  - Mock external services

Infrastructure:
  Kubernetes:
    Nodes: 3 (small VMs)
    CPU: 4 cores per node
    Memory: 16GB per node
    Storage: 100GB per node
  
  Databases:
    Type: Containerized
    Persistence: Optional
    Backups: None
  
  Networking:
    VLAN: 10
    Subnet: 10.10.0.0/24
    Internet: Restricted
  
  Cost: ~$500/month

Staging Environment

Purpose: Pre-production validation
Scale: Production-like
Characteristics:
  - Mirror of production
  - Automated testing
  - Performance testing
  - Integration testing
  - Security scanning

Infrastructure:
  Kubernetes:
    Nodes: 5 (medium VMs)
    CPU: 8 cores per node
    Memory: 32GB per node
    Storage: 500GB per node
  
  Databases:
    Type: Dedicated instances
    Persistence: Required
    Backups: Daily
  
  Networking:
    VLAN: 20
    Subnet: 10.20.0.0/24
    Internet: Controlled
  
  Cost: ~$2,000/month

Production Environment

Purpose: Live customer-facing services
Scale: Full redundancy
Characteristics:
  - High availability
  - Auto-scaling
  - Disaster recovery
  - Strict security
  - Full monitoring

Infrastructure:
  Kubernetes:
    Nodes: 10+ (large VMs/bare metal)
    CPU: 16+ cores per node
    Memory: 64GB+ per node
    Storage: 1TB+ per node
  
  Databases:
    Type: HA clusters
    Persistence: Required
    Backups: Hourly + continuous replication
  
  Networking:
    VLAN: 30
    Subnet: 10.30.0.0/24
    Internet: Full access (firewalled)
  
  Cost: ~$10,000+/month

Infrastructure as Code

Terraform Workspace Structure

# Directory structure
infrastructure/
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── terraform.tfvars
│   ├── staging/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── terraform.tfvars
│   └── production/
│       ├── main.tf
│       ├── variables.tf
│       └── terraform.tfvars
├── modules/
│   ├── kubernetes/
│   ├── networking/
│   ├── storage/
│   └── monitoring/
└── shared/
    └── backend.tf

# environments/dev/main.tf
terraform {
  backend "s3" {
    bucket = "terraform-state"
    key    = "dev/terraform.tfstate"
    region = "us-east-1"
  }
}

module "kubernetes" {
  source = "../../modules/kubernetes"
  
  environment     = "dev"
  cluster_name    = "dev-cluster"
  node_count      = 3
  node_size       = "small"
  node_cpu        = 4
  node_memory     = 16384
  
  network_cidr    = "10.10.0.0/24"
  vlan_id         = 10
}

module "monitoring" {
  source = "../../modules/monitoring"
  
  environment     = "dev"
  retention_days  = 7
  alert_channels  = ["slack-dev"]
}

# environments/dev/terraform.tfvars
environment = "dev"
region      = "datacenter-1"
cost_center = "engineering"

# Resource limits for dev
max_cpu_per_pod    = 2
max_memory_per_pod = 4096
max_pods_per_node  = 50

Environment-Specific Configurations

# modules/kubernetes/variables.tf
variable "environment" {
  description = "Environment name"
  type        = string
  validation {
    condition     = contains(["dev", "staging", "production"], var.environment)
    error_message = "Environment must be dev, staging, or production"
  }
}

variable "node_count" {
  description = "Number of Kubernetes nodes"
  type        = number
  default     = 3
}

# Environment-specific defaults
locals {
  env_config = {
    dev = {
      node_size       = "small"
      backup_enabled  = false
      ha_enabled      = false
      monitoring_tier = "basic"
    }
    staging = {
      node_size       = "medium"
      backup_enabled  = true
      ha_enabled      = true
      monitoring_tier = "standard"
    }
    production = {
      node_size       = "large"
      backup_enabled  = true
      ha_enabled      = true
      monitoring_tier = "premium"
    }
  }
  
  config = local.env_config[var.environment]
}

GitOps Deployment Pipeline

Repository Structure

gitops-infrastructure/
├── apps/
│   ├── dev/
│   │   ├── application-a/
│   │   ├── application-b/
│   │   └── kustomization.yaml
│   ├── staging/
│   │   ├── application-a/
│   │   ├── application-b/
│   │   └── kustomization.yaml
│   └── production/
│       ├── application-a/
│       ├── application-b/
│       └── kustomization.yaml
├── infrastructure/
│   ├── base/
│   │   ├── ingress/
│   │   ├── monitoring/
│   │   └── storage/
│   └── overlays/
│       ├── dev/
│       ├── staging/
│       └── production/
└── clusters/
    ├── dev-cluster.yaml
    ├── staging-cluster.yaml
    └── production-cluster.yaml

ArgoCD Application Definitions

# clusters/dev-cluster.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: dev-applications
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/company/gitops-infrastructure
    targetRevision: main
    path: apps/dev
  destination:
    server: https://dev-cluster.local
    namespace: default
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
    syncOptions:
      - CreateNamespace=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

---
# Application with environment-specific config
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: web-app-dev
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/company/web-app
    targetRevision: develop
    path: k8s/overlays/dev
    kustomize:
      images:
        - company/web-app:dev-latest
  destination:
    server: https://dev-cluster.local
    namespace: web-app
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

Kustomize Overlays

# apps/base/web-app/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web-app
        image: company/web-app:latest
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi

---
# apps/overlays/dev/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

bases:
  - ../../base/web-app

replicas:
  - name: web-app
    count: 1

images:
  - name: company/web-app
    newTag: dev-latest

configMapGenerator:
  - name: web-app-config
    literals:
      - ENVIRONMENT=development
      - LOG_LEVEL=debug
      - ENABLE_DEBUG=true

---
# apps/overlays/staging/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

bases:
  - ../../base/web-app

replicas:
  - name: web-app
    count: 2

images:
  - name: company/web-app
    newTag: staging-v1.2.3

configMapGenerator:
  - name: web-app-config
    literals:
      - ENVIRONMENT=staging
      - LOG_LEVEL=info
      - ENABLE_DEBUG=false

---
# apps/overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

bases:
  - ../../base/web-app

replicas:
  - name: web-app
    count: 5

images:
  - name: company/web-app
    newTag: v1.2.3

configMapGenerator:
  - name: web-app-config
    literals:
      - ENVIRONMENT=production
      - LOG_LEVEL=warn
      - ENABLE_DEBUG=false

patches:
  - path: production-resources.yaml
  - path: production-hpa.yaml

CI/CD Pipeline

GitLab CI Configuration

# .gitlab-ci.yml
stages:
  - build
  - test
  - deploy-dev
  - deploy-staging
  - deploy-production

variables:
  DOCKER_REGISTRY: registry.company.local
  APP_NAME: web-app

build:
  stage: build
  script:
    - docker build -t $DOCKER_REGISTRY/$APP_NAME:$CI_COMMIT_SHA .
    - docker push $DOCKER_REGISTRY/$APP_NAME:$CI_COMMIT_SHA
  only:
    - branches
    - tags

test:
  stage: test
  script:
    - docker run $DOCKER_REGISTRY/$APP_NAME:$CI_COMMIT_SHA npm test
    - docker run $DOCKER_REGISTRY/$APP_NAME:$CI_COMMIT_SHA npm run lint
  only:
    - branches
    - tags

deploy-dev:
  stage: deploy-dev
  script:
    # Update image tag in GitOps repo
    - git clone https://github.com/company/gitops-infrastructure
    - cd gitops-infrastructure
    - |
      cd apps/overlays/dev
      kustomize edit set image company/web-app:dev-$CI_COMMIT_SHA
    - git add .
    - git commit -m "Update dev to $CI_COMMIT_SHA"
    - git push
  only:
    - develop
  environment:
    name: development
    url: https://dev.company.local

deploy-staging:
  stage: deploy-staging
  script:
    - git clone https://github.com/company/gitops-infrastructure
    - cd gitops-infrastructure
    - |
      cd apps/overlays/staging
      kustomize edit set image company/web-app:staging-$CI_COMMIT_TAG
    - git add .
    - git commit -m "Update staging to $CI_COMMIT_TAG"
    - git push
  only:
    - tags
  when: manual
  environment:
    name: staging
    url: https://staging.company.local

deploy-production:
  stage: deploy-production
  script:
    - git clone https://github.com/company/gitops-infrastructure
    - cd gitops-infrastructure
    - |
      cd apps/overlays/production
      kustomize edit set image company/web-app:$CI_COMMIT_TAG
    - git add .
    - git commit -m "Update production to $CI_COMMIT_TAG"
    - git push
  only:
    - tags
  when: manual
  environment:
    name: production
    url: https://company.com
  before_script:
    # Require approval
    - echo "Deploying to production requires approval"

Promotion Strategy

Automated Promotion Flow

Development:
  Trigger: Push to develop branch
  Deployment: Automatic
  Testing: Unit tests, linting
  Approval: None required
  Rollback: Automatic on failure

Staging:
  Trigger: Git tag creation
  Deployment: Manual approval
  Testing:
    - Integration tests
    - Performance tests
    - Security scans
    - Smoke tests
  Approval: Tech lead
  Rollback: Manual

Production:
  Trigger: Staging validation passes
  Deployment: Manual approval
  Testing:
    - Canary deployment (10%)
    - Full deployment (100%)
  Approval: Engineering manager + Product owner
  Rollback: Automated on health check failure

Canary Deployment

# Production canary deployment
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: web-app
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  service:
    port: 8080
  analysis:
    interval: 1m
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99
        interval: 1m
      - name: request-duration
        thresholdRange:
          max: 500
        interval: 1m
  webhooks:
    - name: load-test
      url: http://flagger-loadtester/
      timeout: 5s
      metadata:
        cmd: "hey -z 1m -q 10 -c 2 http://web-app-canary:8080/"

Environment Parity

Configuration Management

# Shared configuration (all environments)
shared_config:
  app_name: web-app
  port: 8080
  health_check_path: /health
  metrics_path: /metrics

# Environment-specific overrides
dev_config:
  replicas: 1
  cpu_request: 100m
  memory_request: 128Mi
  cpu_limit: 500m
  memory_limit: 512Mi
  log_level: debug
  enable_profiling: true
  database_url: postgresql://dev-db:5432/app

staging_config:
  replicas: 2
  cpu_request: 500m
  memory_request: 512Mi
  cpu_limit: 2000m
  memory_limit: 2Gi
  log_level: info
  enable_profiling: false
  database_url: postgresql://staging-db:5432/app

production_config:
  replicas: 5
  cpu_request: 1000m
  memory_request: 1Gi
  cpu_limit: 4000m
  memory_limit: 4Gi
  log_level: warn
  enable_profiling: false
  database_url: postgresql://prod-db:5432/app
  autoscaling:
    min_replicas: 5
    max_replicas: 20
    target_cpu: 70

Secrets Management

Vault Integration

# Vault secrets per environment
vault/
├── dev/
│   ├── database-credentials
│   ├── api-keys
│   └── certificates
├── staging/
│   ├── database-credentials
│   ├── api-keys
│   └── certificates
└── production/
    ├── database-credentials
    ├── api-keys
    └── certificates

# Kubernetes External Secrets
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: app-secrets
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: vault-backend
    kind: SecretStore
  target:
    name: app-secrets
    creationPolicy: Owner
  data:
    - secretKey: database-password
      remoteRef:
        key: production/database-credentials
        property: password
    - secretKey: api-key
      remoteRef:
        key: production/api-keys
        property: external-api

Monitoring and Observability

Environment-Specific Dashboards

Grafana Dashboards:
  Development:
    - Application metrics
    - Error rates
    - Response times
    - Resource usage
  
  Staging:
    - All dev metrics
    - Load test results
    - Performance benchmarks
    - Cost tracking
  
  Production:
    - All staging metrics
    - SLA compliance
    - Business metrics
    - Capacity planning
    - Incident tracking

Alerting Strategy

Development:
  Alerts: Slack only
  Severity: Info
  On-call: None

Staging:
  Alerts: Slack + Email
  Severity: Warning
  On-call: Optional

Production:
  Alerts: PagerDuty + Slack + Email
  Severity: Critical
  On-call: Required (24/7)
  Escalation: 15 min → Manager → Director

Cost Management

Environment Cost Tracking

Monthly Infrastructure Costs:
  Development:
    Compute: $300
    Storage: $100
    Network: $50
    Monitoring: $50
    Total: ~$500
  
  Staging:
    Compute: $1,200
    Storage: $400
    Network: $200
    Monitoring: $200
    Total: ~$2,000
  
  Production:
    Compute: $6,000
    Storage: $2,000
    Network: $1,000
    Monitoring: $500
    DR/Backup: $500
    Total: ~$10,000

Cost Optimization:
  - Auto-shutdown dev/staging after hours
  - Spot instances for non-critical workloads
  - Reserved instances for production
  - Storage lifecycle policies

Best Practices

Infrastructure in a Box Principles

Consistency:
  - Same tools across all environments
  - Infrastructure as Code for everything
  - Automated testing at every stage
  - Version control for all configs

Security:
  - Least privilege access
  - Secrets in Vault, never in Git
  - Network segmentation
  - Regular security scans

Reliability:
  - Automated backups
  - Disaster recovery testing
  - Health checks and monitoring
  - Graceful degradation

Efficiency:
  - Resource right-sizing
  - Auto-scaling policies
  - Cost monitoring and alerts
  - Regular optimization reviews

Conclusion

Infrastructure in a Box provides a complete, repeatable deployment pipeline from development to production. By treating infrastructure as code and implementing GitOps workflows, teams can deploy confidently and consistently across all environments.

Key Benefits:

Consistent environments reduce "works on my machine" issues
Automated promotion reduces human error
GitOps provides audit trail and rollback capability
Environment parity ensures production-like testing
Infrastructure as Code enables rapid disaster recovery

Success Metrics:

Deployment frequency: 10+ per day (dev), 5+ per week (staging), daily (production)
Lead time: < 1 hour from commit to production
Change failure rate: < 5%
Mean time to recovery: < 15 minutes

References:

GitOps Principles
Terraform Best Practices
Kubernetes Production Patterns
The Twelve-Factor App
Site Reliability Engineering (Google)