Version: Next

Vertical Scaler

Overview

The Vertical Scaler is an automatic resource allocation algorithm that dynamically adjusts the number of workers for applications based on their Event Loop Utilization (ELU) metrics. It intelligently balances computational resources across multiple applications while respecting system constraints.

How It Works

Health Metrics

The algorithm uses two primary health metrics:

Event Loop Utilization (ELU)

ELU measures how busy the Node.js event loop is:

0.0 = Event loop is completely idle
1.0 = Event loop is fully saturated

ELU values are collected continuously from all workers and averaged over a configurable time window to smooth out temporary spikes and make stable scaling decisions.

Memory Usage

The algorithm tracks heap memory usage (heapUsed and heapTotal) for each worker. When making scaling decisions, it considers:

Total memory limit: A configured maximum total memory (maxTotalMemory), defaulting to 90% of the system's total memory
Available memory: Calculated as maxTotalMemory - currently used memory
Average heap usage: The average memory consumed by workers of each application

The system memory information is obtained from cgroup files when running in containerized environments (Docker, Kubernetes), or from the operating system otherwise. This ensures that new workers are only started when there's sufficient memory available to accommodate them based on the application's average heap usage.

Time Windows

The algorithm uses different time windows for scale-up and scale-down decisions:

Scale-up time window (timeWindowSec): A shorter window (default: 10 seconds) for detecting high utilization and scaling up quickly
Scale-down time window (scaleDownTimeWindowSec): A longer window (default: 60 seconds) for detecting sustained low utilization before scaling down, preventing premature worker removal

Scaling Logic

The algorithm operates in two modes:

Reactive Mode: Triggers immediately when any worker's ELU exceeds the scale-up threshold
Periodic Mode: Runs at regular intervals (default: every 60 seconds) regardless of metrics

Both modes analyze all applications and generate scaling recommendations:

1. Metric Collection

Collects ELU and heap memory metrics from all active workers every second
Only collects metrics from workers that have been running for at least the grace period (default: 30 seconds)
Maintains a rolling time window of metrics for both scale-up (default: 10 seconds) and scale-down (default: 60 seconds) decisions
Calculates average ELU and heap usage per application across all its workers using the appropriate time window
Checks available memory by calculating maxTotalMemory - currently used memory

2. Application Prioritization

Applications are prioritized based on:

Primary: ELU value (lower ELU = higher priority for scaling down)
Secondary: Worker count (more workers = higher priority for scaling down when ELU is equal)

3. Scaling Decisions

The algorithm makes decisions in this order:

Scale Down (Low Utilization)

Any application with ELU below the scale-down threshold (averaged over the longer scaleDownTimeWindowSec window) is reduced by 1 worker
Applications must have more workers than their configured minimum (default: 1 worker)
Multiple applications can scale down in the same cycle

Scale Up (High Utilization)

Among applications that haven't been scaled in this cycle, find the best candidate for scaling up
The candidate selection prioritizes:
- Primary: Highest ELU value (apps with higher load get priority)
- Secondary: Fewest workers (smaller apps get priority when ELU is equal)
The selected application receives 1 additional worker if:
- Its ELU is at or above the scale-up threshold (averaged over the shorter timeWindowSec window)
- It hasn't reached its configured maximum workers
- Total workers across all apps is below maxTotalWorkers
- There is sufficient available system memory (based on the application's average heap usage)
Only one application scales up per cycle

Important Note on Scaling Limits Unlike previous versions, the current algorithm does not perform worker reallocation between applications. If the maximum worker limit (maxTotalWorkers) is reached or there is insufficient memory, scaling up will not occur even if some applications have low utilization. Applications must be manually configured with appropriate min/max worker limits to ensure critical applications can scale when needed.

Cooldown Period

After each scaling operation, the algorithm enters a cooldown period to prevent rapid oscillations. No scaling decisions are executed during cooldown, even if triggers occur.

Configuration

Vertical scaler parameters

Parameter	Description	Default
maxTotalWorkers	Maximum total workers across all applications	`os.availableParallelism()`
maxTotalMemory	Maximum total memory that can be used by all workers (bytes)	90% of system total memory
minWorkers	Minimum workers for each application	1
maxWorkers	Maximum workers for each application	`maxTotalWorkers`
scaleUpELU	ELU threshold to trigger scaling up (0-1)	0.8
scaleDownELU	ELU threshold to trigger scaling down (0-1)	0.2
timeWindowSec	Time window for averaging ELU metrics for scale-up decisions (seconds)	10
scaleDownTimeWindowSec	Time window for averaging ELU metrics for scale-down decisions (seconds)	60
cooldownSec	Cooldown period between scaling operations (seconds)	20
gracePeriod	Delay after worker startup before collecting metrics (milliseconds)	30000
scaleIntervalSec	Interval for periodic scaling checks (seconds)	60

Per-Application Configuration

Individual applications can override global limits using the applications parameter:

Parameter	Description	Default
minWorkers	Minimum workers for this application	1
maxWorkers	Maximum workers for this application	Global maxWorkers

Example:

{
  "maxTotalWorkers": 10,
  "applications": {
    "api-service": {
      "minWorkers": 2,
      "maxWorkers": 6
    },
    "background-worker": {
      "minWorkers": 1,
      "maxWorkers": 4
    }
  }
}

Behavior Examples

Example 1: Scale Up (Under Limit, Sufficient Memory)

Initial State:

App A: 2 workers, ELU = 0.85, avg heap = 500MB
App B: 1 worker, ELU = 0.3, avg heap = 300MB
Total: 3 workers, Max: 10
Available memory: 4GB

Analysis:

App A exceeds scale-up threshold (0.85 > 0.8)
Under max worker limit (3 < 10)
Sufficient memory available (4GB > 500MB needed for new worker)

Decision: Scale up App A to 3 workers

Result:

App A: 3 workers
App B: 1 worker

Example 2: At Worker Limit - No Scaling

Initial State:

App A: 2 workers, ELU = 0.9, avg heap = 600MB
App B: 2 workers, ELU = 0.3, avg heap = 400MB
Total: 4 workers, Max: 4
Available memory: 2GB

Analysis:

App A needs scaling (ELU = 0.9 > 0.8)
At max worker limit (4 = 4)
App B is below scale-down threshold (0.3 > 0.2)

Decision: No scaling (at max worker limit)

Result:

App A: 2 workers (unchanged)
App B: 2 workers (unchanged)

Example 3: Scale Down Only

Initial State:

App A: 2 workers, ELU = 0.5
App B: 3 workers, ELU = 0.1
Total: 5 workers, Max: 10

Decision: Scale down App B to 2 workers (ELU below threshold)

Result:

App A: 2 workers (unchanged)
App B: 2 workers

Example 4: Multiple Scale Downs

Initial State:

App A: 3 workers, ELU = 0.15
App B: 2 workers, ELU = 0.18
App C: 2 workers, ELU = 0.6

Decision: Scale down both App A and App B

Result:

App A: 2 workers
App B: 1 worker
App C: 2 workers

Example 5: No Action (Insufficient Memory)

Initial State:

App A: 2 workers, ELU = 0.85, avg heap = 1.5GB
App B: 1 worker, ELU = 0.3, avg heap = 500MB
Total: 3 workers, Max: 10
maxTotalMemory: 10GB
Currently used memory: 9GB
Available memory: 1GB (10GB - 9GB)

Analysis:

App A needs scaling (ELU = 0.85 > 0.8)
Under max worker limit (3 < 10)
Insufficient memory for new worker (1GB available < 1.5GB needed for new worker)
App B cannot be scaled down (already at minWorkers = 1)

Decision: No scaling (insufficient memory and no workers to reallocate)

Overview​

How It Works​

Health Metrics​

Event Loop Utilization (ELU)​

Memory Usage​

Time Windows​

Scaling Logic​

1. Metric Collection​

2. Application Prioritization​

3. Scaling Decisions​

Cooldown Period​

Configuration​

Vertical scaler parameters​

Per-Application Configuration​

Behavior Examples​

Example 1: Scale Up (Under Limit, Sufficient Memory)​

Example 2: At Worker Limit - No Scaling​

Example 3: Scale Down Only​

Example 4: Multiple Scale Downs​

Example 5: No Action (Insufficient Memory)​

Overview

How It Works

Health Metrics

Event Loop Utilization (ELU)

Memory Usage

Time Windows

Scaling Logic

1. Metric Collection

2. Application Prioritization

3. Scaling Decisions

Cooldown Period

Configuration

Vertical scaler parameters

Per-Application Configuration

Behavior Examples

Example 1: Scale Up (Under Limit, Sufficient Memory)

Example 2: At Worker Limit - No Scaling

Example 3: Scale Down Only

Example 4: Multiple Scale Downs

Example 5: No Action (Insufficient Memory)