Node.js Production Best Practices
Node.js production applications require specific handling to run reliably. The default development setup, running node app.js in a terminal, falls apart fast when the process crashes, the server restarts, or you need to handle more than a few hundred concurrent users. This guide covers the essential patterns that separate a hobby project from something you can actually deploy and maintain.
Process Management with PM2
PM2 is the standard process manager for Node.js in production. It monitors your process, restarts it on crashes, aggregates logs, and can run multiple instances behind a load balancer.
Install it globally first:
npm install -g pm2
The global install makes PM2 available as a system command. Once installed, you can launch any Node.js script through it rather than running node directly. This single change gives you crash recovery, log management, and process listing out of the box.
The most important flag is -i max, which spreads your app across all available CPU cores:
pm2 start app.js -i max
That single command launches one process per core, each with PM2’s built-in load balancer handling the distribution. If one worker crashes, the others keep serving while PM2 restarts the broken one.
For anything beyond a quick test, use an ecosystem.config.js file instead of command-line flags. This gives you reproducible configuration, environment switching, and fine-tuned restart behavior:
module.exports = {
apps: [{
name: 'my-api',
script: './dist/index.js',
instances: 'max',
exec_mode: 'cluster',
env: {
NODE_ENV: 'development',
},
env_production: {
NODE_ENV: 'production',
PORT: 3000,
},
wait_ready: true,
kill_timeout: 5000,
max_memory_restart: '1G',
autorestart: true,
max_restarts: 10,
min_uptime: '10s',
restart_delay: 4000,
}]
};
A few flags here deserve attention. exec_mode: 'cluster' is required when you set instances: 'max', since PM2 defaults to fork mode (single process) and the cluster flag is what enables the load balancer. wait_ready: true tells PM2 not to forward traffic until your app sends a “ready” signal, which matters for graceful startup sequencing. max_memory_restart automatically restarts workers that exceed the configured memory threshold, which catches memory leaks before they take down your server.
Start with production environment variables using:
pm2 start ecosystem.config.js --env production
Beyond startup commands, the PM2 CLI gives you visibility and control over every running process. You can inspect process status, stream logs in real time, restart workers, and monitor resource usage without leaving the terminal.
Useful PM2 commands to know:
pm2 list # see all processes
pm2 logs my-api # stream logs
pm2 restart my-api # restart
pm2 stop my-api # stop without removing
pm2 delete my-api # remove from PM2 registry
pm2 monit # real-time CPU/memory dashboard
pm2 describe my-api # details on a specific process
Graceful Shutdown
When a production server needs to restart (deploying a new version, scaling down, or receiving a termination signal from Kubernetes), you want zero dropped requests. Graceful shutdown handles this by stopping new connections while letting in-flight requests finish.
The pattern works for any HTTP framework:
const server = app.listen(3000);
process.on('SIGTERM', () => {
console.log('SIGTERM received, starting graceful shutdown');
server.close(async () => {
// Close database connections first
await db.close();
// Then caches, queues, etc.
await cache.quit();
console.log('All connections closed');
process.exit(0);
});
// Safety net: force exit after 10s
setTimeout(() => {
console.error('Forced shutdown after timeout');
process.exit(1);
}, 10000);
});
process.on('SIGINT', () => {
// Ctrl+C in dev triggers the same flow
process.emit('SIGTERM');
});
A few things to understand here. SIGTERM is the signal Kubernetes and Docker send when they want your container to stop. It is the standard termination signal in production. server.close() stops accepting new connections but lets existing ones finish. The explicit process.exit(0) is important: Node.js won’t exit on its own just because the event loop is empty if there are still open handles (database connections, open files). The timeout prevents the process from hanging forever if cleanup code hangs.
If you use PM2 with wait_ready: true, your app needs to signal when it’s fully initialized:
const app = express();
// ... setup routes, connect to DB, etc.
app.listen(3000, () => {
console.log('Server ready');
process.send('ready'); // PM2 waits for this before marking online
});
PM2 won’t route traffic to your app until it receives this signal.
Structured Logging
console.log works fine in development, but in production you need log levels, timestamps, and structured data instead of just strings. Two popular choices are Winston and Pino.
Winston is the most widely used:
const winston = require('winston');
const logger = winston.createLogger({
level: process.env.LOG_LEVEL || 'info',
format: winston.format.combine(
winston.format.timestamp(),
winston.format.errors({ stack: true }),
winston.format.json()
),
defaultMeta: { service: 'my-api' },
transports: [
new winston.transports.Console({
format: process.env.NODE_ENV === 'development'
? winston.format.combine(winston.format.colorize(), winston.format.simple())
: winston.format.json()
}),
new winston.transports.File({ filename: 'error.log', level: 'error' }),
new winston.transports.File({ filename: 'combined.log' }),
],
});
module.exports = logger;
Winston’s JSON output is machine-readable, making it easy to ingest into log aggregators like Elasticsearch or Datadog. The timestamp and error stack trace formatters ensure you can correlate events across services when debugging incidents.
The log levels, in order of severity, are: error (0), warn (1), info (2), http (3), debug (4). Always include structured fields on your log entries so you can filter and search them:
logger.info('Request processed', {
requestId: req.id,
method: req.method,
path: req.path,
statusCode: res.statusCode,
duration: Date.now() - start,
userId: req.user?.id,
});
Including context fields like request IDs and user identifiers turns log entries from raw strings into searchable events. Without structured metadata, finding the log line for a specific failed request means scrolling through thousands of entries manually.
Pino is a faster alternative, about 2-5x lower overhead than Winston. Fastify uses it by default:
const pino = require('pino');
const logger = pino({
level: process.env.LOG_LEVEL || 'info',
formatters: { level: (label) => ({ level: label }) },
timestamp: pino.stdTimeFunctions.isoTime,
});
Pino skips Winston’s transport system and writes directly to stdout as newline-delimited JSON, which keeps the overhead low. For apps handling thousands of requests per second, the performance difference between Pino and Winston can be noticeable in CPU profiles.
Environment variables and configuration
Never hardcode configuration. Use environment variables, validated at startup with a schema library like Zod.
const { z } = require('zod');
const envSchema = z.object({
NODE_ENV: z.enum(['development', 'test', 'production']).default('production'),
PORT: z.coerce.number().min(1024).max(65535).default(3000),
DATABASE_URL: z.string().url(),
REDIS_URL: z.string().url().optional(),
LOG_LEVEL: z.enum(['error', 'warn', 'info', 'http', 'debug']).default('info'),
API_KEY: z.string().min(32),
});
const parsed = envSchema.safeParse(process.env);
if (!parsed.success) {
console.error('Invalid environment variables:', parsed.error.flatten());
process.exit(1);
}
module.exports = parsed.data;
This fails fast with clear error messages if required variables are missing or malformed. That approach is much better than mysterious runtime errors three minutes after deployment.
In production, pull secrets from a secrets manager (AWS Secrets Manager, HashiCorp Vault, Kubernetes Secrets) rather than baking them into environment files. Use dotenv locally, but never commit .env files.
Security Hardening
A few middleware pieces prevent common attack vectors.
Helmet sets sensible HTTP security headers:
const helmet = require('helmet');
app.use(helmet());
Helmet adds multiple headers with one line: X-Content-Type-Options, X-Frame-Options, Strict-Transport-Security, and others. These headers tell the browser to enforce specific security policies that block common XSS, clickjacking, and sniffing attacks.
CORS should never be origin: '*' in production:
const cors = require('cors');
app.use(cors({
origin: process.env.ALLOWED_ORIGINS?.split(',') || [],
credentials: true,
}));
The CORS configuration restricts which domains can call your API and whether credentials (cookies, authorization headers) are included. Without this, any website visited by your users can make authenticated requests to your backend.
Rate limiting prevents abuse:
const rateLimit = require('express-rate-limit');
const limiter = rateLimit({
windowMs: 15 * 60 * 1000,
max: 100,
standardHeaders: true,
legacyHeaders: false,
message: 'Too many requests',
});
app.use('/api', limiter);
The rate limiter applies a sliding window: 100 requests per 15 minutes per IP. Once exceeded, subsequent requests receive a 429 response. This protects your API from brute-force attacks, scrapers, and accidental request loops.
Input validation with Zod catches bad data before it reaches your business logic:
const createUserSchema = z.object({
email: z.string().email(),
password: z.string().min(8).regex(/[A-Z]/).regex(/[0-9]/),
age: z.number().int().min(13).max(120).optional(),
});
const result = createUserSchema.safeParse(req.body);
if (!result.success) {
return res.status(400).json({ errors: result.error.flatten() });
}
Keep dependencies updated. Run npm audit regularly and use npx npm-check-updates to find outdated packages.
Health Checks
Kubernetes and load balancers need endpoints to determine whether your app is alive and ready to receive traffic.
// Liveness: is the process alive?
app.get('/healthz/live', (req, res) => {
res.status(200).json({ status: 'ok' });
});
// Readiness: can it accept traffic?
app.get('/healthz/ready', async (req, res) => {
try {
await db.query('SELECT 1');
await cache.ping();
res.status(200).json({ status: 'ok', db: 'ok', cache: 'ok' });
} catch (err) {
res.status(503).json({ status: 'error', reason: err.message });
}
});
Liveness probes should be cheap and fast, since they are checked frequently. Readiness probes verify external dependencies (database, cache) and should return 503 when those dependencies are unavailable.
Handling unhandled errors
Two event handlers must be at the top of your entry point, before anything else runs:
process.on('unhandledRejection', (reason, promise) => {
console.error('Unhandled Rejection:', reason);
process.exit(1);
});
process.on('uncaughtException', (err) => {
console.error('Uncaught Exception:', err);
process.exit(1);
});
Unhandled rejections and uncaught exceptions should crash your process. Silent failures hide bugs. PM2 restarts the process automatically, and your monitoring should catch the error log.
Detecting memory leaks
Node.js is single-threaded, so it uses only one CPU core by default. In cluster mode with PM2, each worker is a separate process with its own memory space. Memory leaks in one worker don’t directly crash others, but a leaking worker will eventually run out of memory and restart.
Common causes: unbounded arrays or caches, event listeners that accumulate, streams not properly drained. Use the --inspect flag during development and Chrome DevTools to profile heap usage:
node --inspect app.js
# Open chrome://inspect in Chrome
Chrome DevTools connects to your running Node.js process and provides heap allocation timelines, comparison views, and dominator trees. These tools help you identify which objects are holding onto memory and why they are not being garbage collected.
In production, the heapdump module lets you take snapshots on demand:
const heapdump = require('heapdump');
process.on('SIGUSR2', () => {
heapdump.writeSnapshot('./heap-' + Date.now() + '.heapsnapshot');
});
Writing a heap snapshot to disk allows offline analysis with the same Chrome DevTools, even if the leak only appears after days of uptime. You can trigger snapshots with a Unix signal (kill -SIGUSR2 <pid>) without restarting the process.
Use PM2’s max_memory_restart as a safety net. Workers exceeding the threshold get restarted automatically.
Keep startup work visible
Production issues often begin at boot time, not after the app has been running for hours. If your service needs to read secrets, connect to databases, warm caches, or build expensive indexes, make that startup path obvious and measurable. A short boot log with clear steps is easier to troubleshoot than a silent process that simply never becomes ready. That also helps deployment systems decide whether a release is healthy or still stuck in initialization.
Prefer small operational habits
Many Node.js incidents come from neglecting basic routines rather than from exotic bugs. Rotate logs, check disk space, look at memory growth, and review dependency updates on a schedule. Those tasks are easy to postpone, but they protect the service from avoidable failures. A few boring habits make the production system calmer, and calmer systems are much easier to support when something does go wrong.
Make recovery part of the plan
An operational checklist is more useful when it includes the next step after a failure, not just the failure itself. Know which logs you will check, which process you will restart, and which metrics you will inspect first. That sort of preparation keeps incidents from turning into guesswork. It also makes it easier for teammates to help because the response path is already written down in plain language.
See Also
- /tutorials/javascript-fundamentals/js-error-handling/ — Error handling patterns in JavaScript
- /tutorials/javascript-fundamentals/js-async-callbacks-promises/ — Async patterns and Promise best practices
- /reference/node-modules/childprocess/: Spawning child processes from Node.js