Back to Blog
Guide

Scale to Zero: The Complete Guide to Pay-Nothing-When-Idle Infrastructure

Daniel Brooks8 min read
Scale to Zero: The Complete Guide to Pay-Nothing-When-Idle Infrastructure

Traditional hosting charges the same whether your application serves 10,000 requests per hour or zero. A development environment running at 3am on a Tuesday costs the same as one handling a product launch. Scale to zero changes this fundamental equation: when your application has no traffic, it scales down to zero instances and you pay nothing. When traffic arrives, instances spin up automatically and serve requests.

This guide explains how scale to zero works at the infrastructure level, when it makes sense to use it, when to avoid it, and how to configure it effectively for different workload types.

What Is Scale to Zero?

Scale to zero is an infrastructure pattern where an application automatically reduces its running instances to zero when there is no incoming traffic, and automatically starts new instances when requests arrive.

When min instances is set to 0, the platform terminates all running containers after a period of inactivity. The application occupies no CPU, no memory, and no compute resources. On a per-second billing model, it costs exactly nothing. When a new request comes in, the platform detects it, starts a container from the existing image, waits for the health check to pass, and routes the request to the new instance.

This is meaningfully different from two patterns it's often confused with.

Serverless functions (AWS Lambda, Cloudflare Workers) also scale to zero, but they run individual functions rather than full applications. You can't run a containerized Express, Django, or Rails app directly in a serverless function environment. The execution model is different: functions run for milliseconds to seconds and terminate. Scale-to-zero containers run your full application stack.

Sleeping dynos (Heroku's free tier behavior) are a different mechanism. Heroku would sleep a dyno after 30 minutes of inactivity, and the next request would wait 10 to 30 seconds for it to wake. This was unpredictable and not configurable. Scale to zero is an intentional, configurable setting — not a cost-cutting workaround imposed by the platform.

The key distinction: scale to zero runs your full Docker container with your full application. You get the same environment in a scaled-down state as you do at full traffic. Nothing changes about how your application behaves, only whether instances are running.

How Scale to Zero Works

Understanding the lifecycle helps you reason about when it's appropriate.

Idle detection and scale-down. The platform monitors incoming traffic for each application. After a configurable period of inactivity — typically 5 to 15 minutes of zero requests — it terminates running instances. The container image stays cached. Environment variables, networking configuration, and everything else remains intact. Only the running processes stop.

Cold start sequence. When a request arrives to an application at zero instances, the platform queues the request and starts the scale-up process:

  1. The platform detects an incoming request with no available instances
  2. It pulls the cached container image (or builds if needed)
  3. The container starts and the application process initializes
  4. The health check endpoint confirms the application is ready
  5. The queued request is routed to the new instance

This process takes seconds, not milliseconds. The exact duration depends on your image size, application startup time, and health check configuration. A lightweight Node.js Express app starting from a cached alpine image might be ready in 2 to 4 seconds. A Java Spring Boot application initializing its full context might take 8 to 15 seconds.

Warm period. After the cold start, the instance stays running. Subsequent requests within that window hit a warm instance with no delay. The idle timer resets on every request. An application receiving even occasional traffic — a request every few minutes — may rarely cold start at all.

Scale-up under load. As traffic increases beyond what a single instance handles, the platform adds more instances. This follows normal auto-scaling behavior governed by your max instance configuration. The difference from traditional auto-scaling is the lower bound: instances can go all the way to zero rather than staying at a minimum of one.

Scale to Zero vs. Traditional Hosting vs. Serverless

FeatureTraditional HostingScale to ZeroServerless Functions
Minimum costFixed monthly$0 when idle$0 when idle
Cold startNoneSecondsMilliseconds to seconds
Application typeAnyAny containerFunctions only
StatePersistentEphemeralStateless
ScalingManual or HPAAutomaticAutomatic
Runtime limitNoneNone5 to 15 minutes
Docker supportYesYesLimited
Language supportAnyAnyPlatform-specific runtimes
Cold start predictabilityN/AConsistentVariable

Traditional hosting wins on latency consistency. Serverless wins on cold start speed for lightweight functions. Scale to zero occupies the middle: full application runtime with zero idle cost, at the expense of cold start latency you can predict and optimize.

When Scale to Zero Makes Sense

The right use cases share a common trait: traffic is predictable, bursty, or infrequent enough that idle time represents most of the billing period.

Development and Staging Environments

This is the clearest win. Development environments exist to test code changes. They're not serving real users. They sit idle overnight, on weekends, and most of the workday while engineers are writing code rather than testing it.

A development environment that runs actively for 8 hours per day saves approximately 67% on compute costs compared to an always-on instance. A staging environment that sees activity only during CI/CD pipeline runs or QA sessions might be active 2 to 4 hours per day, saving 83 to 92%.

Scale-to-zero is particularly effective for preview deployments. When a team creates a deployment for each pull request, those preview environments might see traffic once or twice for review and then sit idle. Paying full price for dozens of idle preview environments adds up quickly. With scale to zero, each preview environment costs nothing until someone opens it.

Internal Tools

Admin dashboards, internal reporting tools, and management APIs share a usage pattern: they're accessed during business hours and ignored otherwise. An admin dashboard used 9am to 6pm Monday through Friday is idle for roughly 70% of the week.

Internal tools also tend to have users with higher tolerance for a brief wait. An admin user loading a dashboard and waiting 3 to 4 seconds for a cold start is a minor inconvenience, not a user experience failure. The same delay on a public-facing product page is unacceptable.

Reporting tools that run scheduled jobs or generate reports on request are ideal candidates. The tool wakes up, does its work, and the next user might not come along for hours.

Side Projects and MVPs

Validating an idea means building something people can use, not optimizing infrastructure costs. Scale-to-zero hosting lets you deploy a real application at effectively zero cost during the pre-traction phase.

A side project that gets 50 visitors per day, clustered in a few hours of activity, might cost a few cents per month with per-second billing and scale to zero. The same application on a fixed $10 or $20 per month plan costs the same whether it gets 50 visitors or 50,000.

This changes the economics of experimentation. You can run ten small projects simultaneously, paying only for the ones that actually get used.

Webhook Receivers and Background Processors

Services that respond to external events — payment webhooks, GitHub hooks, notification processors — often have highly unpredictable traffic. They might receive 100 requests in a minute after a deployment, then nothing for hours.

A webhook receiver at zero instances starts up when the webhook fires, processes it, and returns to zero. The billing reflects actual processing time, not calendar time.

When Not to Use Scale to Zero

Scale to zero is not appropriate for every workload. Be clear about the tradeoffs.

Production web applications with real user traffic. A 3 to 8 second cold start on a landing page or web application will frustrate users and hurt conversion rates. If your production application has steady traffic throughout the day, keep a minimum of 1 instance running. The cost of one always-on instance is far less than the cost of poor user experience.

WebSocket applications. WebSocket connections require a persistent, long-lived connection between client and server. An application that has scaled to zero cannot maintain a WebSocket connection — there's nothing to connect to. Applications using WebSockets, Server-Sent Events, or long-polling need a persistent minimum instance count.

Applications with in-memory state. If your application stores session data, cache, or shared state in process memory, scaling to zero destroys that state. Requests after a cold start start fresh. Applications that require in-memory state need either persistent instances or external state management (Redis, database) before scale-to-zero is viable.

Latency-sensitive services. Any service where sub-100ms response times are a requirement — high-frequency APIs, real-time data feeds, latency-sensitive financial operations — cannot tolerate cold starts. Keep these services running.

Background workers that run continuously. A queue consumer that processes jobs continuously has no idle periods. Scaling it to zero would interrupt job processing and potentially lose in-flight work. Keep continuous workers at a minimum of 1 instance.

Configuring Scale to Zero on Out Plane

Setting up scale to zero takes under two minutes.

  1. Go to console.outplane.com and select your application.
  2. Navigate to the scaling settings for your application.
  3. Set Min Scale to 0. This tells the platform it can terminate all instances when the application is idle.
  4. Set Max Scale based on your expected peak traffic. Start conservative — 2 to 5 instances handles most low-to-moderate traffic workloads. Adjust based on observed behavior.
  5. Deploy your configuration. The change takes effect on the next deployment or scaling event.

Your application now scales to zero when idle. Out Plane's per-second billing means you pay nothing during those idle periods — not a fraction of a cent, not a minimum charge. Zero.

When configuring Max Scale, consider what happens during a cold start under load. If 100 requests arrive simultaneously to an application at zero instances, the platform will start multiple instances in parallel. Your max instance count is the ceiling on that expansion.

Optimizing Cold Start Times

Cold starts are the cost of scale to zero. The goal is to make them short and predictable.

Use small base images. Alpine-based images are 5 to 50MB. Ubuntu-based images are 100 to 400MB. Smaller images pull faster and start faster. Use node:alpine, python:alpine, golang:alpine as your base and install only what your application requires.

Minimize application startup time. Profile how long your application takes to go from container start to first successful health check. Common culprits: database connection pooling that waits for multiple connections on startup, loading large ML models or configuration files, running database migrations on every startup.

Separate initialization from startup. If your application runs database migrations on every start, it adds seconds to every cold start. Run migrations as a separate deployment step. If your application loads a large configuration file, load it lazily when first needed rather than at startup.

Define a fast health check endpoint. Your health check endpoint determines when the platform considers an instance ready. A health check that queries the database on every call adds latency to cold starts. A health check that returns 200 OK immediately makes the instance available faster, and you can have a separate liveness check for deeper validation.

Pre-warm with scheduled pings. For applications where cold starts happen at predictable times — an internal tool used every morning, a report that runs at 9am — a scheduled lightweight request 30 to 60 seconds before expected usage keeps the instance warm. A simple cron job hitting the health check endpoint accomplishes this without any code changes.

Framework-specific considerations:

  • Next.js: The Next.js startup time is dominated by route compilation. Pre-build your routes and avoid heavy getServerSideProps operations that block the first render.
  • FastAPI: FastAPI starts quickly. Watch for Pydantic model compilation and heavy import chains. Consider using startup event handlers only for operations that cannot be deferred.
  • Express: Node.js starts fast. The main overhead is usually connecting to external services (databases, Redis). Use lazy initialization: connect when first needed, not at module load time.
  • Spring Boot: Java has the slowest cold starts of common web frameworks. Use Spring Boot's lazy initialization feature (spring.main.lazy-initialization=true) and GraalVM native images for production workloads where cold starts matter.

Cost Savings Calculation

Per-second billing combined with scale to zero makes cost predictable based on actual usage. Here's what real usage patterns look like in practice.

Development environment (8 hours active per day):

  • Hours active: 8 out of 24 (33% utilization)
  • Hours idle: 16 out of 24 (67% idle)
  • Savings vs. always-on: approximately 67%
  • A $50/month always-on instance costs roughly $16.50/month with scale to zero

Internal admin tool (business hours only, Monday through Friday):

  • Hours active: 45 out of 168 per week (27% utilization)
  • Hours idle: 123 out of 168 (73% idle)
  • Savings vs. always-on: approximately 73%

Internal reporting tool (2 hours active per day):

  • Hours active: 2 out of 24 (8% utilization)
  • Hours idle: 22 out of 24 (92% idle)
  • Savings vs. always-on: approximately 92%
  • A $50/month always-on instance costs roughly $4/month with scale to zero

Side project in early traction (1 hour active per day):

  • Hours active: 1 out of 24 (4% utilization)
  • Hours idle: 23 out of 24 (96% idle)
  • Savings vs. always-on: approximately 96%

A development environment that runs 8 hours per day instead of 24 saves approximately 67% on compute costs compared to always-on hosting. For a team running 10 development environments, that's the equivalent of 6 or 7 full instances saved every month.

The compounding effect across multiple environments is significant. A team with 5 developers, each with a dev environment and a staging environment, runs 10 non-production deployments. Scale to zero on all 10 might cost as much as 3 always-on instances.

Scale to Zero with Databases

Database connectivity requires specific consideration when running scale-to-zero applications.

Databases themselves do not scale to zero. Out Plane's managed PostgreSQL instances run continuously. Your database is always available, which is the correct behavior — you don't want to wait for a database cold start in addition to an application cold start. Only your application instances scale to zero. The database connection is reestablished when the new instance starts.

Connection pooling behavior changes with scale to zero. Traditional connection pooling assumes a long-running application process that maintains a pool of open connections. With scale to zero, your application process terminates, closing all connections. When a new instance starts, it establishes new connections. For applications with rapid scale-up (many instances starting simultaneously), this can create a connection spike.

Configure your connection pool to open a small number of connections on startup rather than eagerly filling the pool. In most frameworks:

  • Prisma: Set connection_limit in your DATABASE_URL to a low value (2 to 5 per instance)
  • SQLAlchemy: Set pool_size=2 and max_overflow=3 for development instances
  • pg (Node.js): Set max: 5 in your pool configuration for non-production workloads

Use the DATABASE_URL environment variable. Out Plane injects the database connection string as DATABASE_URL for managed PostgreSQL instances. Your application reads this at startup on every cold start. You never hardcode connection strings or credentials.

The pattern is clean: your application starts, reads DATABASE_URL, opens a minimal connection pool, passes the health check, and serves requests. The database was waiting the whole time.

Summary

Scale to zero is the most cost-efficient hosting model for workloads with irregular or predictable idle periods. The economics are straightforward: you pay for compute time your application uses, not calendar time it exists.

The right use cases are clear:

  • Development and staging environments — idle most of the time, tolerate cold starts, significant savings
  • Internal tools — business-hours usage, users accept brief delays, 70 to 90% savings
  • Side projects and MVPs — pre-traction traffic, pay only for actual visitors
  • Webhook receivers and event processors — idle between events, scale up on demand

The wrong use cases are equally clear:

  • Public-facing production apps with real users — cold start latency degrades user experience
  • WebSocket or persistent connection applications — require running instances
  • Latency-sensitive APIs — cannot absorb cold start delay
  • Continuous background workers — have no idle periods

Configure it by setting Min Scale to 0 in your application's scaling settings on console.outplane.com. Optimize cold starts by reducing image size and application startup time. Use connection pooling that handles connection creation at cold start without overwhelming your database.

For most teams, scale to zero belongs on every non-production environment by default. The savings compound across environments and pay for production capacity where always-on instances are genuinely needed.


Related reading:


Tags

scaling
infrastructure
cost-optimization
serverless
auto-scaling
devops

Start deploying in minutes

Connect your GitHub repository and deploy your first application today. $20 free credit. No credit card required.