Node.js runs on a single thread. One process handles one request at a time through an event loop, and no matter how fast that loop is, a single process has a ceiling. For production traffic beyond a few hundred concurrent users, that ceiling becomes a wall.
Horizontal scaling is the standard solution. You run multiple instances of your application and distribute incoming requests across them. Done correctly, scaling a Node.js application horizontally is straightforward. Done incorrectly, it produces subtle bugs that only appear in production: sessions that vanish, jobs that run twice, and caches that serve stale data.
This guide covers everything you need to scale a Node.js application horizontally — from the architectural prerequisites to auto-scaling configuration on Out Plane.
What Is Horizontal Scaling?
Horizontal scaling means running multiple instances of your application and distributing traffic across them. It is distinct from vertical scaling, which means upgrading to a larger machine with more CPU and RAM.
Both approaches increase capacity, but they have different tradeoffs:
| Approach | Method | Ceiling | Cost Model |
|---|---|---|---|
| Vertical | Bigger machine | Hardware limits | Expensive per unit |
| Horizontal | More instances | Near-unlimited | Proportional to load |
Vertical scaling is simpler — no application changes required. But it has a hard ceiling. A 64-core machine is still one machine. If it goes down, your application goes down.
Horizontal scaling eliminates that ceiling and that single point of failure. You can run 2 instances, 20, or 200. Traffic distributes across all of them. If one instance fails, others continue serving requests.
Why Node.js Specifically Benefits from Horizontal Scaling
Node.js is built around an event loop running in a single process. This design is efficient for I/O-bound workloads — database queries, API calls, file reads — because the event loop can handle thousands of concurrent I/O operations without blocking.
The limitation appears with CPU-bound work. When a request requires heavy computation — image processing, cryptography, complex data transformations — it blocks the event loop for that entire duration. Every other request waits.
Even without CPU-bound work, a single Node.js process running on a 2 vCPU machine leaves one CPU core idle. The process cannot utilize more than one core natively.
Horizontal scaling for Node.js means each instance gets its own event loop, its own CPU core, and its own memory. Traffic distributes across all of them. The application scales linearly: two instances handle roughly twice the throughput of one.
Prerequisites for Horizontal Scaling
You cannot add instances to an application and expect it to work correctly unless that application is designed for it. The prerequisite is a stateless architecture.
Your App Must Be Stateless
A stateless application stores no data in memory that other instances need to access. Every instance is interchangeable. Any instance can handle any request without requiring prior context from the same instance.
If your application stores anything in process memory that another instance would need to access, it is not ready for horizontal scaling.
Concrete examples of state that breaks horizontal scaling:
- In-memory sessions:
req.sessionbacked byexpress-session's default MemoryStore. One instance creates the session; a different instance handles the next request and cannot find it. - Local file storage: Uploading a file to the local filesystem means only the instance that received the upload has access to that file.
- In-process caches: A cache built with a plain JavaScript object or
Mapon each instance starts empty and diverges immediately. Cache invalidation becomes impossible. - In-memory rate limiting: A rate limiter counting requests in process memory is reset on each instance. A user can make 100 requests per minute to each instance independently.
The Stateless Checklist
Before configuring multiple instances, verify your application passes these checks:
- Sessions stored externally — database, Redis, or JWT
- File uploads sent to object storage (S3, R2, or similar), not the local filesystem
- Caches are external (Redis) or genuinely cache-agnostic (stale reads acceptable)
- No process-level singletons that accumulate state over time
- Rate limiting backed by a database or external store
- A health check endpoint exists at
/healthor/healthz - Cron jobs configured to run on a single instance, not all instances simultaneously
If any item fails, fix it before scaling horizontally. Adding instances to a stateful application creates bugs, not capacity.
Session Management Strategies
Sessions are the most common source of horizontal scaling problems in Node.js applications. There are three reliable strategies.
JWT Tokens (Stateless Sessions)
JSON Web Tokens move session state into the token itself. The server signs the token with a secret key; the client stores the token and sends it with each request. Any instance can verify the token's signature and read the session data without consulting a central store.
const express = require("express");
const jwt = require("jsonwebtoken");
const app = express();
app.use(express.json());
const JWT_SECRET = process.env.JWT_SECRET;
// Issue a token on login
app.post("/auth/login", async (req, res) => {
const user = await authenticateUser(req.body.email, req.body.password);
if (!user) {
return res.status(401).json({ error: "Invalid credentials" });
}
const token = jwt.sign(
{ userId: user.id, email: user.email },
JWT_SECRET,
{ expiresIn: "24h" }
);
res.json({ token });
});
// Verify the token on protected routes
function authenticate(req, res, next) {
const token = req.headers.authorization?.replace("Bearer ", "");
if (!token) {
return res.status(401).json({ error: "No token provided" });
}
try {
req.user = jwt.verify(token, JWT_SECRET);
next();
} catch {
return res.status(401).json({ error: "Invalid token" });
}
}
app.get("/api/profile", authenticate, (req, res) => {
res.json({ userId: req.user.userId, email: req.user.email });
});JWT works well for APIs where clients are mobile apps or single-page applications. The tradeoff is that tokens cannot be revoked before expiry without a revocation list, which reintroduces server-side state.
Database Sessions
Store sessions in PostgreSQL using express-session with a persistent store adapter. All instances share the same database, so any instance can read any session.
npm install express-session connect-pg-simpleconst session = require("express-session");
const pgSession = require("connect-pg-simple")(session);
const { Pool } = require("pg");
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
});
app.use(
session({
store: new pgSession({
pool,
tableName: "user_sessions",
createTableIfMissing: true,
}),
secret: process.env.SESSION_SECRET,
resave: false,
saveUninitialized: false,
cookie: {
secure: true,
httpOnly: true,
maxAge: 24 * 60 * 60 * 1000, // 24 hours
},
})
);Run the session table migration once before deploying:
CREATE TABLE IF NOT EXISTS user_sessions (
sid VARCHAR NOT NULL COLLATE "default" PRIMARY KEY,
sess JSON NOT NULL,
expire TIMESTAMP(6) NOT NULL
);
CREATE INDEX IF NOT EXISTS IDX_session_expire ON user_sessions (expire);Database sessions work without additional infrastructure. If you already have PostgreSQL, you have everything you need.
External Cache (Redis)
For high-traffic applications where session reads happen on every request, Redis provides sub-millisecond lookup times that database reads cannot match at scale.
npm install express-session connect-redis redisconst { createClient } = require("redis");
const RedisStore = require("connect-redis").default;
const redisClient = createClient({
url: process.env.REDIS_URL,
});
await redisClient.connect();
app.use(
session({
store: new RedisStore({ client: redisClient }),
secret: process.env.SESSION_SECRET,
resave: false,
saveUninitialized: false,
cookie: { secure: true, httpOnly: true },
})
);Redis is the right choice when your session data is large, read frequently, or when session lookup latency is a performance bottleneck.
Handling WebSockets at Scale
WebSockets complicate horizontal scaling because they are inherently stateful. A WebSocket connection is a persistent TCP connection to a specific instance. When a client connects, it stays connected to that one instance until the connection closes.
This creates a problem: if client A and client B connect to different instances, they cannot receive messages from each other through the WebSocket layer unless the instances can communicate.
Option 1: Sticky Sessions
A load balancer with sticky sessions routes all requests from a given client to the same instance. This preserves WebSocket connections but undermines the load distribution benefits of horizontal scaling. It also means reconnection after an instance restart requires the client to reconnect to a different instance.
Option 2: Redis Pub/Sub Adapter (Recommended)
The better approach is to use the Socket.io Redis adapter. When a server on instance A emits to a room or user, it publishes the event to Redis. The Redis adapter on all other instances receives that publication and emits to their locally connected clients.
npm install socket.io @socket.io/redis-adapter redisconst { createServer } = require("http");
const { Server } = require("socket.io");
const { createClient } = require("redis");
const { createAdapter } = require("@socket.io/redis-adapter");
const httpServer = createServer(app);
const io = new Server(httpServer);
const pubClient = createClient({ url: process.env.REDIS_URL });
const subClient = pubClient.duplicate();
await Promise.all([pubClient.connect(), subClient.connect()]);
io.adapter(createAdapter(pubClient, subClient));
io.on("connection", (socket) => {
socket.on("join-room", (roomId) => {
socket.join(roomId);
});
socket.on("message", (roomId, data) => {
// This emit reaches all clients in the room,
// regardless of which instance they're connected to
io.to(roomId).emit("message", data);
});
});
httpServer.listen(process.env.PORT || 8080);With the Redis adapter, every instance becomes aware of every connected client and every room. Scaling WebSocket applications horizontally requires no changes to your application logic beyond this adapter configuration.
Background Jobs and Queues
Background jobs introduce another scaling hazard. If your application uses setTimeout or setInterval to run scheduled work, every instance runs that work simultaneously when you scale horizontally.
Consider a job that sends a daily email digest. With one instance, it runs once per day. With five instances, it runs five times per day — one per instance. Each user receives five emails.
Use an External Job Queue
Move all background work to an external queue. Workers on each instance pull jobs from the queue. Because a job is only delivered to one worker, it runs exactly once regardless of instance count.
BullMQ (requires Redis):
npm install bullmqconst { Queue, Worker } = require("bullmq");
const connection = { url: process.env.REDIS_URL };
// Add jobs to the queue (any instance can do this)
const emailQueue = new Queue("emails", { connection });
await emailQueue.add("send-digest", {
userId: user.id,
email: user.email,
});
// Workers pull from the queue (each job runs on exactly one worker)
const worker = new Worker(
"emails",
async (job) => {
if (job.name === "send-digest") {
await sendDigestEmail(job.data.userId, job.data.email);
}
},
{ connection }
);pg-boss (PostgreSQL-backed, no Redis required):
npm install pg-bossconst PgBoss = require("pg-boss");
const boss = new PgBoss(process.env.DATABASE_URL);
await boss.start();
// Schedule a recurring job — only runs on one instance
await boss.schedule("send-daily-digest", "0 8 * * *");
await boss.work("send-daily-digest", async (jobs) => {
for (const job of jobs) {
await sendDailyDigest(job.data);
}
});pg-boss uses PostgreSQL advisory locks to ensure jobs run on exactly one instance. If you already have PostgreSQL and want to avoid adding Redis, pg-boss provides a complete queue solution without additional infrastructure.
Configuring Auto-Scaling on Out Plane
Once your application is stateless and ready for multiple instances, configuring auto-scaling on Out Plane takes under five minutes.
Navigate to your application in console.outplane.com and open the Scaling settings. You will see two controls:
- Minimum instances: The floor — the number of instances always running. Set to
1for applications that need to be always-on with no cold starts. Set to0to enable scale-to-zero for applications with intermittent traffic. - Maximum instances: The ceiling — the maximum number of instances Out Plane will start during a traffic spike.
Out Plane scales based on incoming request volume automatically. You set the bounds; the platform handles the rest. There are no HPA manifests to configure, no custom metrics to expose.
Choosing your max instance count:
Start by estimating requests per second at peak load. Run a load test to find the per-instance throughput ceiling of your application (more on this below). Divide peak RPS by per-instance capacity to get your minimum max count. Add 20–30% headroom.
For example: peak load is 500 requests per second, and load testing shows one instance handles 60 RPS before CPU saturates. You need at least 9 instances at peak. Set max to 12 for headroom.
Instance sizing:
Each instance is an isolated container running your application. Out Plane instance types range from op-20 (0.5 vCPU, 512MB RAM) to op-94 (32 vCPU, 64GB RAM). For most Node.js applications, start with op-22 (1 vCPU, 1GB RAM) or op-24 (2 vCPU, 2GB RAM) and adjust based on metrics.
Smaller instances with higher max counts often provide better cost efficiency than large instances. Each instance runs your application independently, and per-second billing means you pay only for active instances during a traffic spike. Instances that scale down stop incurring cost immediately.
Monitor CPU and memory per instance in the Metrics view. If CPU consistently reaches 80%+ before traffic peaks, scale up instance size. If memory is the constraint, profile your application for memory leaks before increasing RAM.
Database Connection Management
Every instance you add opens its own pool of database connections. This creates a scaling constraint that catches many teams off guard: PostgreSQL has a maximum connection limit, and it is shared across all application instances.
The default PostgreSQL max_connections is 100 (some managed providers set it higher). With 10 instances each holding a pool of 20 connections, you reach 200 connections — exceeding the default limit. New connections fail. Your application returns 500 errors.
Calculate your pool size per instance before scaling:
pool_size_per_instance = max_connections / max_instances
If your database allows 200 connections and you might run 10 instances, each instance should pool no more than 20 connections.
const { Pool } = require("pg");
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: parseInt(process.env.DB_POOL_SIZE || "10"), // per instance
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
// Always release connections after use
app.get("/api/users", async (req, res) => {
const client = await pool.connect();
try {
const result = await client.query("SELECT id, email FROM users LIMIT 50");
res.json(result.rows);
} finally {
client.release(); // Critical — forgetting this exhausts the pool
}
});For applications that scale to 20+ instances, consider a connection pooler like PgBouncer or Supavisor. These tools accept thousands of application connections but maintain a smaller pool against the actual database, allowing much higher instance counts without exhausting max_connections.
Set DB_POOL_SIZE as an environment variable so you can adjust it without redeploying.
Load Testing Your Scaled Application
Before sending production traffic to a horizontally scaled configuration, verify it behaves correctly under load. Load testing serves two purposes: finding the per-instance capacity ceiling and catching stateful bugs that only manifest when requests hit different instances.
Tools:
- autocannon — fast, runs from Node.js, good for quick benchmarks
- Artillery — YAML-based scenarios, good for complex user flows
- k6 — JavaScript-based, good for programmable test scenarios
A minimal autocannon benchmark:
npm install -g autocannon
autocannon -c 100 -d 30 http://localhost:8080/api/usersThis runs 100 concurrent connections for 30 seconds. The output shows requests per second, latency percentiles, and error rates.
What to look for during load testing:
- Session consistency: Log in with a test user and make 1000 requests across the session. Verify authentication remains valid on every request. If sessions expire randomly, your session store is not shared correctly.
- CPU per instance: Check the Metrics view in Out Plane during the test. Note the CPU percentage where response times start increasing. That is your instance capacity ceiling.
- Database connections: Check active connections on your database during the test. Verify you are not approaching
max_connections. - Error rate: Any non-zero error rate during the test needs investigation before production traffic.
- Scale-up behavior: Watch Out Plane start new instances as load increases. Verify new instances begin receiving traffic within an acceptable time window.
Repeat the load test with two instances, then four. Throughput should scale roughly linearly. If it does not, there is likely a shared bottleneck — database connections, an external API rate limit, or remaining shared state.
Common Pitfalls
In-memory rate limiting. Libraries like express-rate-limit default to an in-memory store. With five instances, each instance maintains independent counters. A user can send 100 requests per minute to each instance. Switch to a Redis store or a database-backed store before enabling multiple instances.
npm install rate-limit-redisconst rateLimit = require("express-rate-limit");
const RedisStore = require("rate-limit-redis");
app.use(
rateLimit({
windowMs: 60 * 1000,
max: 100,
store: new RedisStore({
sendCommand: (...args) => redisClient.sendCommand(args),
}),
})
);File system writes. Any file written to the local filesystem is only accessible on the instance that wrote it. User avatars, export files, and temporary uploads must go to object storage. If your application writes to ./uploads, it will not work with more than one instance.
Cron jobs on every instance. Most job schedulers run on every process by default. node-cron, node-schedule, and similar libraries start their schedules in every instance. A nightly cleanup job runs N times with N instances. Use pg-boss or BullMQ, or implement leader election to designate one instance as the scheduler.
Assuming request order. Load balancers distribute requests across instances without guaranteeing order. A POST request that creates a resource and a subsequent GET request that reads it may hit different instances. This is fine if your database is the source of truth, but it breaks if you cache the created resource in memory on the writing instance.
Logging to local files. Logs written to files are isolated per instance. Use stdout and stderr for all log output. Out Plane captures all stdout logs and makes them visible in the HTTP Logs and application Logs views.
The Architecture at Scale
A horizontally scaled Node.js application running on Out Plane follows this pattern:
Users
|
v
Load Balancer (Out Plane)
|
+---> Instance 1 (Node.js process) ──┐
| |
+---> Instance 2 (Node.js process) ──+──> PostgreSQL
| |
+---> Instance N (Node.js process) ──+──> Redis
|
└──> Object Storage (files)
Each instance is identical and stateless. The load balancer distributes incoming HTTP requests. State lives in PostgreSQL, Redis, and object storage — all external to the application processes.
Out Plane handles the load balancer and instance orchestration. You provide the application container. The platform manages starting new instances when traffic rises and shutting them down when traffic falls.
This architecture scales Node.js applications from a single instance handling moderate traffic to dozens of instances handling millions of requests per hour. The application code does not change between one instance and twenty. Only the instance count changes.
Summary
Horizontal scaling for Node.js applications follows a clear sequence:
-
Make your application stateless. Move sessions to PostgreSQL or Redis. Move files to object storage. Move background jobs to BullMQ or pg-boss. Move rate limiters to a shared store.
-
Handle WebSockets correctly. Use the Socket.io Redis adapter if your application uses WebSockets. It ensures events reach clients on any instance.
-
Configure database connection pooling per instance. Calculate
max_connections / max_instancesand set that as your pool size. Use a connection pooler if you need to run more than 10–15 instances. -
Load test before enabling auto-scaling. Find your per-instance capacity ceiling. Verify no stateful bugs surface when requests hit different instances.
-
Set min/max instance counts on Out Plane. The platform handles scaling automatically within those bounds. Per-second billing means cost matches actual load, not your peak capacity reservation.
A stateless application running on Out Plane with auto-scaling configured handles production traffic spikes without manual intervention. New instances start within seconds. Instances that are no longer needed shut down and stop incurring cost.
For deployment setup, see how to deploy an Express.js application or the NestJS deployment guide. If you are evaluating whether a monolith or microservices architecture fits your scaling needs, the microservices vs monolith comparison covers the tradeoffs in detail.
Ready to scale your Node.js application? Configure auto-scaling and deploy at console.outplane.com.