How to Configure Kubernetes Health Checks with Watt
Problem
You're deploying Watt applications to Kubernetes and need robust health checking that:
- Prevents traffic from reaching unhealthy pods
- Automatically restarts failed containers
- Handles complex health dependencies (databases, external services)
- Provides proper startup time for initialization
- Integrates with Kubernetes orchestration patterns
When to use this solution:
- Production Kubernetes deployments
- Applications with external dependencies that need health validation
- Services requiring zero-downtime deployments
- Complex multi-service applications where service health interdependencies matter
Solution Overview
This guide shows you how to implement comprehensive Kubernetes health checks using Watt's built-in health endpoints. You'll learn to:
- Configure readiness and liveness probes properly
- Implement custom health checks for your application dependencies
- Set appropriate probe timing and thresholds
- Handle startup scenarios and graceful shutdowns
Understanding Kubernetes Health Probes
Kubernetes uses probes to determine application health:
- Readiness Probe: Determines if the pod is ready to receive traffic. Failed readiness removes the pod from service endpoints.
- Liveness Probe: Determines if the container should be restarted. Failed liveness triggers container restart by Kubernetes.
- Startup Probe: Provides extra time for slow-starting containers. Disables readiness and liveness probes until startup succeeds.
Platformatic Health Check APIs
Platformatic provides a built-in API for implementing readiness and liveness through its metrics server. The metrics server is configured in your Watt configuration file and exposes health check endpoints:
- The
/ready
endpoint indicates if the service is running and ready to accept traffic - The
/status
endpoint indicates if all services in the stack are reachable - Custom health checks can be added using the
setCustomHealthCheck
method available on theglobalThis.platformatic
object. The method receives a function that returns a boolean or an object with the following properties:status
: a boolean indicating if the health check is successfulstatusCode
: an optional HTTP status code to returnbody
: an optional body to return
- Custom readiness checks can be added using the
setCustomReadinessCheck
method available on theglobalThis.platformatic
object. The method receives a function that returns a boolean or an object with the following properties:status
: a boolean indicating if the readiness check is successfulstatusCode
: an optional HTTP status code to returnbody
: an optional body to return
Implementation
1. Service Implementation with Custom Health Checks
Create a Platformatic service that implements comprehensive health checks:
import fastify from 'fastify'
export function create () {
const app = fastify({
logger: true,
hostname: process.env.PLT_SERVER_HOSTNAME
})
// Register custom health check with Platformatic
globalThis.platformatic.setCustomHealthCheck(async () => {
try {
// Add your health checks here
// For example:
// await Promise.all([
// app.db?.query('SELECT 1'),
// fetch('https://external-service/health')
// ])
return true
} catch (err) {
app.log.error(err)
return false
}
})
// Register custom readiness check with Platformatic
globalThis.platformatic.setCustomReadinessCheck(async () => {
try {
// Add your readiness checks here
// For example:
// await Promise.all([
// app.db?.query('SELECT 1'),
// fetch('https://external-service/health')
// ])
return true
} catch (err) {
app.log.error(err)
return false
}
})
return app
}
2. Kubernetes Configuration
Create a Kubernetes deployment configuration that defines the probes:
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo-readiness-liveness
labels:
app: demo-readiness-liveness
spec:
replicas: 1
selector:
matchLabels:
app: demo-readiness-liveness
template:
metadata:
labels:
app: demo-readiness-liveness
spec:
containers:
- name: demo-readiness-liveness
image: demo-readiness-liveness:latest
ports:
- containerPort: 3001
name: service
- containerPort: 9090
name: metrics
readinessProbe:
httpGet:
path: /ready
port: 9090
initialDelaySeconds: 30
periodSeconds: 30
failureThreshold: 1
livenessProbe:
httpGet:
path: /status
port: 9090
initialDelaySeconds: 30
periodSeconds: 30
failureThreshold: 1
resources:
requests:
memory: "256Mi"
cpu: "500m"
limits:
memory: "512Mi"
cpu: "1000m"
Key configuration points:
- Readiness Probe: Checks
/ready
endpoint every 30 seconds - Liveness Probe: Checks
/status
endpoint every 30 seconds - Both probes:
initialDelaySeconds: 30
: Wait 30 seconds before first probeperiodSeconds: 30
: Check every 30 secondsfailureThreshold: 1
: Fail after 1 unsuccessful attempt
Please note these values are for demonstration purposes. In a production environment, you should set these values based on your application's characteristics and requirements.
3. Environment Configuration
Ensure your service binds to the correct network interface in Kubernetes:
env:
- name: PLT_SERVER_HOSTNAME
value: "0.0.0.0"
How It Works
-
Startup: When the pod starts, Kubernetes waits
initialDelaySeconds
before beginning health checks. -
Readiness Check:
- Kubernetes calls the
/ready
endpoint everyperiodSeconds
- The
watt
server checks that all the services are up and running - If successful, the pod is marked as ready to receive traffic; if it fails
failureThreshold
times, the pod is marked as not ready
- Kubernetes calls the
-
Liveness Check:
- Kubernetes calls the
/status
endpoint everyperiodSeconds
- The
watt
server checks that all the services are ready and perform the custom health check for each service - If successful, the container is considered healthy; if it fails
failureThreshold
times, Kubernetes restarts the container
- Kubernetes calls the
Project Structure
You can see a full working example in https://github.com/platformatic/k8s-readiness-liveness.
The example project structure demonstrates a Watt application with health checks:
├── app
│ ├── watt.json # Main Watt configuration
│ └── services
│ ├── main # Entry point service
│ │ └── platformatic.json
│ └── service-one # Example service with custom health check
│ ├── platformatic.json
│ └── app.js
├── k8s
│ ├── deployment.yaml # Kubernetes deployment with probes
│ └── service.yaml # Kubernetes service configuration
└── Dockerfile # Container image build
The watt.json
configuration exposes the metrics server on port 9090:
{
"metrics": {
"hostname": "{PLT_SERVER_HOSTNAME}",
"port": 9090
}
}
This configuration exposes health check endpoints available at /ready
and /status
on port 9090
and the service endpoints on port 3001
.
You can follow the README.md
in the demo/k8s-readiness-liveness to run the example.
Verification and Testing
Test Health Endpoints Locally
1. Start your Watt application:
npm run dev
2. Test health endpoints:
# Test readiness endpoint
curl http://localhost:9090/ready
# Test liveness endpoint
curl http://localhost:9090/status
# Expected responses should be 200 OK with health status
Test in Kubernetes
1. Deploy to Kubernetes:
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
2. Monitor pod health:
# Check pod status
kubectl get pods -l app=demo-readiness-liveness
# Watch pod events
kubectl describe pod <pod-name>
# Check probe results
kubectl get events --field-selector reason=Unhealthy
3. Test probe behavior:
# Force a health check failure (if your app supports it)
kubectl exec <pod-name> -- curl -X POST http://localhost:9090/fail-health
# Watch Kubernetes response
kubectl get pods -w
Verify Probe Configuration
Check probe timing is appropriate:
# Get current probe configuration
kubectl get deployment demo-readiness-liveness -o yaml | grep -A 10 Probe
Monitor probe metrics:
# Check probe success/failure rates
kubectl top pods
kubectl describe pod <pod-name> | grep -A 5 "Liveness\|Readiness"
Production Configuration Best Practices
Probe Timing Guidelines
Startup-dependent applications:
readinessProbe:
httpGet:
path: /ready
port: 9090
initialDelaySeconds: 10 # Short delay for quick apps
periodSeconds: 5 # Frequent checks during startup
timeoutSeconds: 5 # Allow time for health check
successThreshold: 1 # Single success to mark ready
failureThreshold: 3 # Allow some startup failures
livenessProbe:
httpGet:
path: /status
port: 9090
initialDelaySeconds: 30 # Longer delay after initial startup
periodSeconds: 30 # Less frequent checks when running
timeoutSeconds: 10 # More time for complex checks
failureThreshold: 3 # Avoid restart on transient issues
Database-dependent applications:
startupProbe: # Use startup probe for slow initialization
httpGet:
path: /ready
port: 9090
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 30 # Up to 5 minutes for startup
readinessProbe:
httpGet:
path: /ready
port: 9090
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 1 # Quick removal from service if unhealthy
livenessProbe:
httpGet:
path: /status
port: 9090
initialDelaySeconds: 0 # Disabled until startup probe succeeds
periodSeconds: 20
timeoutSeconds: 10
failureThreshold: 3
Troubleshooting
Pod Failing Readiness Checks
Problem: Pods remain in "Not Ready" state
Solutions:
# Check health endpoint directly
kubectl exec <pod-name> -- curl http://localhost:9090/ready
# Review application logs
kubectl logs <pod-name>
# Check probe configuration
kubectl describe pod <pod-name> | grep -A 10 Readiness
# Common fixes:
# - Increase initialDelaySeconds if app needs more startup time
# - Check that health dependencies are available
# - Verify metrics server is configured and running on correct port
Pod Continuously Restarting
Problem: Liveness probes causing restart loops
Solutions:
# Check restart count and reason
kubectl get pods -l app=your-app
# Review pod events
kubectl describe pod <pod-name>
# Check liveness endpoint
kubectl exec <pod-name> -- curl http://localhost:9090/status
# Common fixes:
# - Increase timeoutSeconds for slow health checks
# - Increase failureThreshold to avoid restarts on transient issues
# - Review custom health check logic for potential failures
# - Check if app is properly handling SIGTERM for graceful shutdown
Health Checks Always Failing
Problem: Health endpoints return 500/404 errors
Solutions:
# Verify metrics server configuration
kubectl exec <pod-name> -- netstat -ln | grep 9090
# Check Watt configuration
kubectl exec <pod-name> -- cat watt.json | grep -A 5 metrics
# Test endpoints manually
kubectl exec <pod-name> -- curl -v http://localhost:9090/ready
# Common fixes:
# - Ensure metrics.hostname is set to "0.0.0.0" not "127.0.0.1"
# - Verify metrics.port matches probe configuration
# - Check that custom health check functions don't throw exceptions
# - Ensure all services in Watt application are starting correctly
Slow Startup Times
Problem: Pods take too long to become ready
Solutions:
# Analyze startup time
kubectl logs <pod-name> --timestamps
# Check resource limits
kubectl describe pod <pod-name> | grep -A 5 Limits
# Profile health check performance
kubectl exec <pod-name> -- time curl http://localhost:9090/ready
# Common fixes:
# - Use startup probes for applications with long initialization
# - Optimize custom health check logic
# - Increase CPU/memory resources if resource-constrained
# - Remove expensive operations from readiness checks
Advanced Patterns
Multi-Service Health Dependencies
For complex applications with service interdependencies:
// Implement cascading health checks
globalThis.platformatic.setCustomHealthCheck(async () => {
try {
// Check primary service health
const serviceHealth = await checkServiceHealth()
// Check critical dependencies
const dbHealth = await checkDatabaseConnection()
const cacheHealth = await checkCacheConnection()
// Check non-critical dependencies (don't fail health check)
const externalServiceHealth = await checkExternalServices().catch(() => false)
if (serviceHealth && dbHealth && cacheHealth) {
return {
status: true,
body: {
service: 'healthy',
database: dbHealth,
cache: cacheHealth,
external: externalServiceHealth
}
}
}
return { status: false }
} catch (error) {
return {
status: false,
statusCode: 503,
body: { error: error.message }
}
}
})
Graceful Shutdown Handling
// Handle graceful shutdown for zero-downtime deployments
process.on('SIGTERM', async () => {
console.log('Received SIGTERM, starting graceful shutdown')
// Stop accepting new requests
globalThis.platformatic.setCustomReadinessCheck(() => false)
// Allow existing requests to complete
await new Promise(resolve => setTimeout(resolve, 5000))
// Clean up resources
await cleanupConnections()
process.exit(0)
})
Next Steps
Now that you have robust Kubernetes health checks:
- Set up monitoring and alerting - Track health check metrics
- Configure autoscaling - Scale based on health and load
- Implement circuit breakers - Handle dependency failures gracefully
- Set up distributed tracing - Debug complex health check failures
References
- Kubernetes Pod Lifecycle
- Configure Liveness, Readiness and Startup Probes
- Container Probes
- Example Application - Complete working example