Version: 3.20.0

How to Configure Kubernetes Health Checks with Watt

Problem

You're deploying Watt applications to Kubernetes and need robust health checking that:

Prevents traffic from reaching unhealthy pods
Automatically restarts failed containers
Handles complex health dependencies (databases, external services)
Provides proper startup time for initialization
Integrates with Kubernetes orchestration patterns

When to use this solution:

Production Kubernetes deployments
Applications with external dependencies that need health validation
Services requiring zero-downtime deployments
Complex multi-service applications where service health interdependencies matter

Solution Overview

This guide shows you how to implement comprehensive Kubernetes health checks using Watt's built-in health endpoints. You'll learn to:

Configure readiness and liveness probes properly
Implement custom health checks for your application dependencies
Set appropriate probe timing and thresholds
Handle startup scenarios and graceful shutdowns

Understanding Kubernetes Health Probes

Kubernetes uses probes to determine application health:

Readiness Probe: Determines if the pod is ready to receive traffic. Failed readiness removes the pod from service endpoints.
Liveness Probe: Determines if the container should be restarted. Failed liveness triggers container restart by Kubernetes.
Startup Probe: Provides extra time for slow-starting containers. Disables readiness and liveness probes until startup succeeds.

Prerequisites

Before implementing Kubernetes health checks, you need:

Node.js 22.19.0+ installed on your development machine (or later)
Docker for containerization
Kubernetes cluster access (local or cloud)
kubectl configured to access your cluster

Installation

1. Create a new Watt application, remeber to select a @platformatic/node called api:

$ npm create wattpm
Hello YOURNAME, welcome to Watt Utils 3.0.0!
? This folder seems to already contain a Node.js application. Do you want to wrap into Watt? no
? Where would you like to create your project? my-health-app
? Which kind of application do you want to create? @platformatic/node
✔ Installing @platformatic/node@^3.0.3 using pnpm ...
? What is the name of the application? api
? Do you want to create another application? no
? What port do you want to use? 3042

cd web/api; npm install fastify @fastify/postgresql @fastify/autoload; cd ..

Then replace the web/api/index.js file with:

import fastify from 'fastify'
import autoload from '@fastify/autoload'
import { join } from 'node:path'

export async function create () {
  const app = fastify({
    loggerIntance: globalThis.platformatic?.logger
  })

  // Register PostgreSQL plugin
  await app.register(import('@fastify/postgresql'), {
    connectionString: process.env.DATABASE_URL || 'postgres://postgres:password@postgres:5432/healthdb'
  })

  // Autoload routes
  await app.register(autoload, {
    dir: join(import.meta.dirname, 'routes')
  })

  app.get('/', async () => {
    const client = await app.pg.connect()
    try {
      const result = await client.query('SELECT NOW() as current_time')
      return { message: 'hello world', db_time: result.rows[0].current_time }
    } finally {
      client.release()
    }
  })

  return app
}

This created a Fastify app that will autoload the routes.

Platformatic Health Check APIs

Watt provides built-in health check endpoints through its metrics server. The metrics server exposes the following endpoints by default:

/ready (Readiness endpoint): Indicates if all services are started and ready to accept traffic
/status (Liveness endpoint): Indicates if all services are healthy and their custom health checks pass

Endpoint Customization

You can customize the health check endpoints in your Watt configuration:

{
  "metrics": {
    "hostname": "0.0.0.0",
    "port": 9090,
    "readiness": {
      "endpoint": "/health"
    },
    "liveness": {
      "endpoint": "/live"
    }
  }
}

Service Discovery and Autoload

By default, Watt automatically loads all services in the web folder via the autoload configuration. You don't need to manually specify each service in the configuration. Watt will:

Discover all valid Platformatic services in this directory
Automatically register them in the runtime
Include them in health check evaluations
Expose their metrics through the metrics server

This autoload behavior simplifies deployment and ensures all your services are automatically included in the health monitoring system.

Custom Health Check Functions

setCustomHealthCheck: Sets a custom liveness check function that runs on the /status (or custom liveness) endpoint
setCustomReadinessCheck: Sets a custom readiness check function that runs on the /ready (or custom readiness) endpoint

Both methods accept a function that returns:

A boolean value (true = healthy, false = unhealthy)
An object with:
- status: boolean indicating success/failure
- statusCode: optional HTTP status code (defaults to 200/500)
- body: optional response body

Implementation

1. Service Implementation with Custom Health Checks

Update your web/api/index.js to implements comprehensive health checks:

import fastify from 'fastify'
import autoload from '@fastify/autoload'
import { join } from 'node:path'

export async function create () {
  const app = fastify({
    loggerIntance: globalThis.platformatic?.logger
  })

  // Register PostgreSQL plugin
  await app.register(import('@fastify/postgresql'), {
    connectionString: process.env.DATABASE_URL || 'postgres://postgres:password@postgres:5432/healthdb'
  })

  // Autoload routes
  await app.register(autoload, {
    dir: join(import.meta.dirname, 'routes')
  })

  // Register custom liveness check (for /status endpoint)
  globalThis.platformatic.setCustomHealthCheck(async () => {
    try {
      // Check PostgreSQL database connectivity
      const client = await app.pg.connect()
      try {
        await client.query('SELECT 1')
      } finally {
        client.release()
      }

      return { status: true }
    } catch (err) {
      app.log.error({ err }, 'Health check failed')
      return {
        status: false,
        statusCode: 503,
        body: `Database health check failed: ${err.message}`
      }
    }
  })

  // Register custom readiness check (for /ready endpoint)
  globalThis.platformatic.setCustomReadinessCheck(async () => {
    try {
      // Check if PostgreSQL connection pool is ready
      if (!app.pg || !app.pg.pool) {
        return false
      }

      // Quick connection test
      const client = await app.pg.connect()
      try {
        await client.query('SELECT 1')
        return true
      } finally {
        client.release()
      }
    } catch (err) {
      app.log.error({ err }, 'Readiness check failed')
      return false
    }
  })

  // Add application routes
  app.get('/', async () => {
    const client = await app.pg.connect()
    try {
      const result = await client.query('SELECT NOW() as current_time')
      return { message: 'hello world', db_time: result.rows[0].current_time }
    } finally {
      client.release()
    }
  })

  return app
}

2. Watt Configuration

Configure the metrics server in your watt.json file:

{
  "metrics": {
    "hostname": "0.0.0.0",
    "port": 9090,
    "readiness": {
      "success": {
        "statusCode": 200,
        "body": "Ready"
      },
      "fail": {
        "statusCode": 503,
        "body": "Not Ready"
      }
    },
    "liveness": {
      "success": {
        "statusCode": 200,
        "body": "Healthy"
      },
      "fail": {
        "statusCode": 503,
        "body": "Unhealthy"
      }
    }
  }
}

3. PostgreSQL Database Setup

First, create a PostgreSQL deployment and service for your database:

postgres-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
  labels:
    app: postgres
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:15
          ports:
            - containerPort: 5432
          env:
            - name: POSTGRES_DB
              value: 'healthdb'
            - name: POSTGRES_USER
              value: 'postgres'
            - name: POSTGRES_PASSWORD
              value: 'password'
            - name: PGDATA
              value: '/var/lib/postgresql/data/pgdata'
          volumeMounts:
            - name: postgres-storage
              mountPath: /var/lib/postgresql/data
          readinessProbe:
            exec:
              command:
                - pg_isready
                - -U
                - postgres
                - -d
                - healthdb
            initialDelaySeconds: 10
            periodSeconds: 5
            timeoutSeconds: 3
          livenessProbe:
            exec:
              command:
                - pg_isready
                - -U
                - postgres
                - -d
                - healthdb
            initialDelaySeconds: 30
            periodSeconds: 30
            timeoutSeconds: 5
      volumes:
        - name: postgres-storage
          emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: postgres
  labels:
    app: postgres
spec:
  ports:
    - port: 5432
      targetPort: 5432
  selector:
    app: postgres

4. Kubernetes Application Configuration

Create a Kubernetes deployment configuration that defines the probes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: watt-health-app
  labels:
    app: watt-health-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: watt-health-app
  template:
    metadata:
      labels:
        app: watt-health-app
    spec:
      containers:
        - name: watt-app
          image: watt-health-app:latest
          ports:
            - containerPort: 3042
              name: service
            - containerPort: 9090
              name: metrics
          env:
            - name: PLT_SERVER_HOSTNAME
              value: '0.0.0.0'
            - name: DATABASE_URL
              value: 'postgres://postgres:password@postgres:5432/healthdb'
          readinessProbe:
            httpGet:
              path: /ready
              port: 9090
            initialDelaySeconds: 10
            periodSeconds: 10
            timeoutSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          livenessProbe:
            httpGet:
              path: /status
              port: 9090
            initialDelaySeconds: 30
            periodSeconds: 30
            timeoutSeconds: 10
            failureThreshold: 3
          startupProbe:
            httpGet:
              path: /ready
              port: 9090
            initialDelaySeconds: 10
            periodSeconds: 5
            timeoutSeconds: 5
            failureThreshold: 20 # Allow up to 100 seconds for startup
          resources:
            requests:
              memory: '256Mi'
              cpu: '250m'
            limits:
              memory: '512Mi'
              cpu: '500m'

Key configuration points:

Startup Probe: Allows up to 100 seconds for application initialization
Readiness Probe: Checks /ready endpoint every 10 seconds after startup
Liveness Probe: Checks /status endpoint every 30 seconds after startup
Environment Variables: PLT_SERVER_HOSTNAME=0.0.0.0 ensures the app binds to all interfaces

Important Timing Considerations:

Startup probe runs first and disables other probes until successful
Readiness probe has lower failure threshold for faster traffic removal
Liveness probe has higher failure threshold to avoid unnecessary restarts
Timeout values account for potential network latency

5. Docker Configuration

Create a Dockerfile for your Watt application:

FROM node:22-alpine

WORKDIR /app

# Copy package files
COPY package*.json ./
RUN npm ci --only=production

# Copy application code
COPY . .

# Expose ports
EXPOSE 3042 9090

# Set environment variables
ENV PLT_SERVER_HOSTNAME=0.0.0.0
ENV NODE_ENV=production

# Health check for Docker
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
  CMD curl -f http://localhost:9090/ready || exit 1

# Start the application
CMD ["npm", "start"]

Watt Internal Service Communication

Watt provides a built-in service mesh that enables zero-configuration communication between services using the .plt.local domain. This is crucial for implementing proper health checks in multi-service applications.

Architecture Overview

The following diagram illustrates how services communicate within a Watt application for health checks in Kubernetes:

graph TB
    subgraph "Kubernetes Pod"
        subgraph "Watt Runtime"
            subgraph "Service Mesh"
                Router[Internal Router]
                Discovery["Service Discovery<br/>(.plt.local)"]
            end

            subgraph "Services"
                Gateway["Gateway Service<br/>(Composer)<br/>:3001"]
                API["API Service<br/>(Backend)<br/>:3002"]
                Worker["Worker Service<br/>(Background)<br/>:3003"]
            end

            subgraph "Health Monitoring"
                Metrics["Metrics Server<br/>:9090"]
                Health["/ready, /status"]
            end
        end
    end

    subgraph "External"
        K8s[Kubernetes Probes]
        Client[External Clients]
    end

    %% Health check flows
    K8s --> |"GET /ready<br/>GET /status"| Metrics
    Metrics --> |"Check service health"| Gateway
    Metrics --> |"Check service health"| API
    Metrics --> |"Check service health"| Worker

    %% Internal service communication
    Gateway --> |"fetch('http://api.plt.local/health')"| Router
    Gateway --> |"fetch('http://worker.plt.local/health')"| Router
    Router --> API
    Router --> Worker

    %% External access
    Client --> |"External requests"| Gateway

    %% Service discovery
    Discovery -.-> |"Resolves .plt.local"| Router

    style Metrics fill:#e1f5fe
    style Health fill:#e8f5e8
    style Router fill:#fff3e0
    style Discovery fill:#fff3e0

Key Communication Patterns:

Kubernetes Health Probes → Metrics server (:9090/ready, :9090/status)
Metrics Server → Individual services for health verification
Inter-Service Health Checks → Via .plt.local domain (e.g., http://api.plt.local/health)
External Traffic → Gateway service (composer) for API aggregation

Internal Fetch with Automatic Service Discovery

Services within a Watt application can communicate with each other using the automatic service discovery:

// Health check for internal services using Watt's service mesh
globalThis.platformatic.setCustomHealthCheck(async () => {
  try {
    const healthChecks = await Promise.allSettled([
      // Database service health check
      fetch('http://api.plt.local/health', { timeout: 2000 }),

      // Background worker service health check
      fetch('http://worker.plt.local/health', { timeout: 2000 }),

      // Composer gateway health check
      fetch('http://gateway.plt.local/health', { timeout: 2000 })
    ])

    const allHealthy = healthChecks.every(result => result.status === 'fulfilled' && result.value.ok)

    return {
      status: allHealthy,
      body: JSON.stringify({
        service: 'healthy',
        dependencies: healthChecks.map((check, index) => ({
          service: ['api', 'worker', 'gateway'][index],
          status: check.status === 'fulfilled' && check.value.ok ? 'healthy' : 'unhealthy'
        }))
      })
    }
  } catch (error) {
    return {
      status: false,
      statusCode: 503,
      body: `Health check failed: ${error.message}`
    }
  }
})

Key Benefits of Watt's Internal Communication:

Zero Configuration: Services are automatically discoverable via {service-id}.plt.local
No Network Latency: Communication happens in-process via the service mesh
Automatic Load Balancing: Requests are distributed across service workers
Built-in Service Discovery: No need for external service registry

Verification and Testing

Test Health Endpoints Locally

1. Start your Watt application:

npm start
# or for development
npm run dev

2. Test health endpoints:

# Test readiness endpoint (includes database connectivity check)
curl -v http://localhost:9090/ready
# Expected: 200 OK "Ready" (or custom response)

# Test liveness endpoint (includes database query)
curl -v http://localhost:9090/status
# Expected: 200 OK "Healthy" (or custom response)

# Test the main application endpoint with database integration
curl http://localhost:3042/
# Expected: {"message":"hello world","db_time":"2024-01-01T12:00:00.000Z"}

# Check metrics endpoint
curl http://localhost:9090/metrics
# Expected: Prometheus metrics output

3. Test with failing health checks:

# Stop PostgreSQL to simulate database failure
docker stop postgres-dev  # if running locally with Docker
# or kubectl delete pod -l app=postgres  # if running in K8s

# Test health endpoints - should now fail
curl http://localhost:9090/status
# Expected: 503 Service Unavailable with database error message

curl http://localhost:9090/ready
# Expected: 503 Service Unavailable

Test in Kubernetes

1. Deploy to Kubernetes:

# Deploy PostgreSQL first
kubectl apply -f postgres-deployment.yaml

# Wait for PostgreSQL to be ready
kubectl wait --for=condition=ready pod -l app=postgres --timeout=300s

# Deploy the application
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml

2. Monitor pod health:

# Check pod status
kubectl get pods -l app=demo-readiness-liveness

# Watch pod events
kubectl describe pod <pod-name>

# Check probe results
kubectl get events --field-selector reason=Unhealthy

3. Test probe behavior:

# Test health endpoints from within the pod
kubectl exec <pod-name> -- curl -f http://localhost:9090/ready
kubectl exec <pod-name> -- curl -f http://localhost:9090/status

# Watch Kubernetes pod status in real-time
kubectl get pods -l app=watt-health-app -w

# Check pod events for probe failures
kubectl get events --field-selector involvedObject.name=<pod-name>

Verify Probe Configuration

Check probe timing is appropriate:

# Get current probe configuration
kubectl get deployment demo-readiness-liveness -o yaml | grep -A 10 Probe

Monitor probe metrics:

# Check probe success/failure rates
kubectl top pods
kubectl describe pod <pod-name> | grep -A 5 "Liveness\|Readiness"

Production Configuration Best Practices

Probe Timing Guidelines

Startup-dependent applications:

readinessProbe:
  httpGet:
    path: /ready
    port: 9090
  initialDelaySeconds: 10 # Short delay for quick apps
  periodSeconds: 5 # Frequent checks during startup
  timeoutSeconds: 5 # Allow time for health check
  successThreshold: 1 # Single success to mark ready
  failureThreshold: 3 # Allow some startup failures

livenessProbe:
  httpGet:
    path: /status
    port: 9090
  initialDelaySeconds: 30 # Longer delay after initial startup
  periodSeconds: 30 # Less frequent checks when running
  timeoutSeconds: 10 # More time for complex checks
  failureThreshold: 3 # Avoid restart on transient issues

Database-dependent applications:

startupProbe: # Use startup probe for slow initialization
  httpGet:
    path: /ready
    port: 9090
  initialDelaySeconds: 10
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 30 # Up to 5 minutes for startup

readinessProbe:
  httpGet:
    path: /ready
    port: 9090
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 1 # Quick removal from service if unhealthy

livenessProbe:
  httpGet:
    path: /status
    port: 9090
  initialDelaySeconds: 0 # Disabled until startup probe succeeds
  periodSeconds: 20
  timeoutSeconds: 10
  failureThreshold: 3

Troubleshooting

Pod Failing Readiness Checks

Problem: Pods remain in "Not Ready" state

Solutions:

# Check health endpoint directly
kubectl exec <pod-name> -- curl http://localhost:9090/ready

# Review application logs
kubectl logs <pod-name>

# Check probe configuration
kubectl describe pod <pod-name> | grep -A 10 Readiness

# Common fixes:
# - Increase initialDelaySeconds if app needs more startup time
# - Check that health dependencies are available
# - Verify metrics server is configured and running on correct port

Pod Continuously Restarting

Problem: Liveness probes causing restart loops

Solutions:

# Check restart count and reason
kubectl get pods -l app=your-app

# Review pod events
kubectl describe pod <pod-name>

# Check liveness endpoint
kubectl exec <pod-name> -- curl http://localhost:9090/status

# Common fixes:
# - Increase timeoutSeconds for slow health checks
# - Increase failureThreshold to avoid restarts on transient issues
# - Review custom health check logic for potential failures
# - Check if app is properly handling SIGTERM for graceful shutdown

Health Checks Always Failing

Problem: Health endpoints return 500/404 errors

Solutions:

# Verify metrics server is listening
kubectl exec <pod-name> -- netstat -tlnp | grep :9090

# Check Watt configuration
kubectl exec <pod-name> -- cat watt.json

# Test endpoints with verbose output
kubectl exec <pod-name> -- curl -v http://localhost:9090/ready
kubectl exec <pod-name> -- curl -v http://localhost:9090/status

# Check application logs for errors
kubectl logs <pod-name> --tail=100

# Verify container environment
kubectl exec <pod-name> -- env | grep -E "PLT_|DATABASE_"

# Test database connectivity directly
kubectl exec <pod-name> -- pg_isready -h postgres -p 5432 -U postgres -d healthdb

Common fixes:

Ensure metrics.hostname is "0.0.0.0" (not "127.0.0.1" or "localhost")
Verify metrics.port matches probe port configuration
Check that PLT_SERVER_HOSTNAME=0.0.0.0 environment variable is set
Verify DATABASE_URL environment variable is correctly formatted
Ensure PostgreSQL service is accessible from the application pod
Check that PostgreSQL credentials and database name are correct
Ensure custom health check functions handle database connection errors gracefully
Verify all Watt services are starting without errors

Slow Startup Times

Problem: Pods take too long to become ready

Solutions:

# Analyze startup time with timestamps
kubectl logs <pod-name> --timestamps --since=5m

# Check resource usage and limits
kubectl describe pod <pod-name> | grep -A 10 -B 5 "Limits\|Requests"
kubectl top pod <pod-name>

# Profile health check performance
kubectl exec <pod-name> -- time curl -f http://localhost:9090/ready

# Check Node.js startup time
kubectl exec <pod-name> -- ps aux | grep node

Common fixes:

Use startup probes for applications with slow initialization (database migrations, cache warming, etc.)
Optimize custom health checks - keep them lightweight and fast
Increase resources if CPU/memory constrained (check with kubectl top)
Remove expensive operations from readiness checks (use async background tasks instead)
Pre-build dependencies in Docker image rather than installing at runtime
Use Node.js production optimizations (NODE_ENV=production, --max-old-space-size)

How to Configure Kubernetes Health Checks with Watt

Problem

Solution Overview

Understanding Kubernetes Health Probes

Prerequisites

Installation

Platformatic Health Check APIs

Endpoint Customization

Service Discovery and Autoload

Custom Health Check Functions

Implementation

1. Service Implementation with Custom Health Checks

2. Watt Configuration

3. PostgreSQL Database Setup

4. Kubernetes Application Configuration

5. Docker Configuration

Watt Internal Service Communication

Architecture Overview

Key Communication Patterns:

Internal Fetch with Automatic Service Discovery

Key Benefits of Watt's Internal Communication:

Verification and Testing

Test Health Endpoints Locally

Test in Kubernetes

Verify Probe Configuration

Production Configuration Best Practices

Probe Timing Guidelines

Troubleshooting

Pod Failing Readiness Checks

Pod Continuously Restarting

Health Checks Always Failing

Slow Startup Times

Next Steps

References

Kubernetes Documentation

Platformatic Resources

Best Practices

Problem​

Solution Overview​

Understanding Kubernetes Health Probes​

Prerequisites​

Installation​

Platformatic Health Check APIs​

Endpoint Customization​

Service Discovery and Autoload​

Custom Health Check Functions​

Implementation​

1. Service Implementation with Custom Health Checks​

2. Watt Configuration​

3. PostgreSQL Database Setup​

4. Kubernetes Application Configuration​

5. Docker Configuration​

Watt Internal Service Communication​

Architecture Overview​

Key Communication Patterns:​

Internal Fetch with Automatic Service Discovery​

Key Benefits of Watt's Internal Communication:​

Verification and Testing​

Test Health Endpoints Locally​

Test in Kubernetes​

Verify Probe Configuration​

Production Configuration Best Practices​

Probe Timing Guidelines​

Troubleshooting​

Pod Failing Readiness Checks​

Pod Continuously Restarting​

Health Checks Always Failing​

Slow Startup Times​

Next Steps​

References​

Kubernetes Documentation​

Platformatic Resources​

Best Practices​

Problem

Solution Overview

Understanding Kubernetes Health Probes

Prerequisites

Installation

Platformatic Health Check APIs

Endpoint Customization

Service Discovery and Autoload

Custom Health Check Functions

Implementation

1. Service Implementation with Custom Health Checks

2. Watt Configuration

3. PostgreSQL Database Setup

4. Kubernetes Application Configuration

5. Docker Configuration

Watt Internal Service Communication

Architecture Overview

Key Communication Patterns:

Internal Fetch with Automatic Service Discovery

Key Benefits of Watt's Internal Communication:

Verification and Testing

Test Health Endpoints Locally

Test in Kubernetes

Verify Probe Configuration

Production Configuration Best Practices

Probe Timing Guidelines

Troubleshooting

Pod Failing Readiness Checks

Pod Continuously Restarting

Health Checks Always Failing

Slow Startup Times

Next Steps

References

Kubernetes Documentation

Platformatic Resources

Best Practices