Error Handling in Production
Why Error Handling Matters in Production
Production Node.js applications face a hostile environment. Networks drop, databases timeout, external APIs misbehave, and users submit unexpected input. If you do not handle errors explicitly, Node.js swallows them and lets your process exit silently — or worse, leaves your application in an inconsistent state.
Unlike development where a stack trace is enough, production demands structured responses, proper logging, and graceful degradation. The goal is to fail safely and give your operators enough information to diagnose and recover.
- Unhandled exceptions crash the entire process and drop in-flight requests
- Unhandled promise rejections behave identically to unhandled exceptions in modern Node.js
- Silent failures hide bugs from developers while users experience broken features
Sync vs Async Error Flows
Node.js handles synchronous and asynchronous errors differently. You need to handle both correctly.
Synchronous Errors
Synchronous code inside a try/catch block works as you would expect. The catch block receives the error synchronously and execution continues normally afterward.
function parsePort(raw) {
if (typeof raw !== 'number') {
throw new TypeError(`Expected number, got ${typeof raw}`);
}
if (raw < 1 || raw > 65535) {
throw new RangeError(`Port must be between 1 and 65535, got ${raw}`);
}
return raw;
}
try {
parsePort('8080');
} catch (err) {
console.error(`Failed to parse port: ${err.message}`);
// TypeError: Expected number, got string
}
The try/catch catches both TypeError and RangeError because they extend Error. After catching, execution continues normally — the process does not exit.
Async Errors: Callback Pattern
The original Node.js callback convention passes errors as the first argument to the callback function. Check it before processing the result.
const fs = require('fs');
fs.readFile('/etc/hosts', 'utf8', (err, data) => {
if (err) {
console.error(`Read failed: ${err.code} — ${err.message}`);
return;
}
console.log(`Read ${data.length} characters`);
});
The callback convention means you must always check the error first. Forgetting this check is the single most common source of silent failures in Node.js code.
Async Errors: Promises
Promises shift error handling to .catch() chains. An unhandled rejection in a promise chain does not automatically bubble up to a try/catch unless you await it.
const { readFile } = require('fs').promises;
readFile('/etc/hosts', 'utf8')
.then(data => {
console.log(`Read ${data.length} characters`);
return JSON.parse(data); // intentionally throws
})
.catch(err => {
console.error(`Operation failed: ${err.message}`);
// SyntaxError: Unexpected end of JSON input
});
.catch() at the end of a promise chain catches any rejection that occurred anywhere above it.
Async Errors: async/await
async/await lets you write asynchronous code that reads like synchronous code. Behind the scenes, rejected promises still exist — you must await them inside a try/catch.
const { readFile } = require('fs').promises;
async function readConfig() {
try {
const data = await readFile('config.json', 'utf8');
return JSON.parse(data);
} catch (err) {
if (err.code === 'ENOENT') {
console.error('config.json not found, using defaults');
return {};
}
throw err; // re-throw unexpected errors
}
}
The try/catch catches rejections from both readFile and JSON.parse. Re-throwing preserves the error for callers who need to handle it further up the stack.
The Error Class Hierarchy
JavaScript’s built-in Error object is bare. It holds a message, an optional stack trace, and nothing else. Production applications need richer error types so handlers can distinguish between different failure modes.
Creating Custom Error Classes
Extend Error to create domain-specific error types. Give each class a name, a meaningful status code, and a way to carry extra context.
class AppError extends Error {
constructor(message, statusCode = 500, code) {
super(message);
this.name = this.constructor.name;
this.statusCode = statusCode;
this.code = code;
Error.captureStackTrace(this, this.constructor);
}
}
class NotFoundError extends AppError {
constructor(resource) {
super(`${resource} not found`, 404, 'NOT_FOUND');
}
}
class ValidationError extends AppError {
constructor(field, message) {
super(message, 400, 'VALIDATION_ERROR');
this.field = field;
}
}
throw new NotFoundError('User');
// AppError { name: 'NotFoundError', statusCode: 404, code: 'NOT_FOUND' }
Every custom error gets a name (used by some error-handling libraries), a statusCode (useful for Express handlers), and a machine-readable code for programmatic detection.
Built-in Error Subclasses
Node.js and JavaScript provide several subclasses worth knowing. TypeError, RangeError, and SyntaxError are built in. Use them when they match the actual failure mode.
function divide(a, b) {
if (typeof a !== 'number' || typeof b !== 'number') {
throw new TypeError('Arguments must be numbers');
}
if (b === 0) {
throw new RangeError('Division by zero');
}
return a / b;
}
try {
divide(10, 0);
} catch (err) {
console.log(err instanceof RangeError); // true
console.log(err instanceof Error); // true
console.log(err.message); // Division by zero
}
instanceof checks work against the built-in error subclasses, so you can distinguish between error types without relying on string matching.
Aggregating Multiple Errors
Sometimes a single operation can fail in multiple ways simultaneously — for example, validating a form with many fields. AggregateError holds an array of errors and treats them as a unit.
function validateUser(input) {
const errors = [];
if (!input.email || !input.email.includes('@')) {
errors.push(new Error('Invalid email address'));
}
if (!input.password || input.password.length < 8) {
errors.push(new Error('Password must be at least 8 characters'));
}
if (input.age !== undefined && (input.age < 0 || input.age > 150)) {
errors.push(new RangeError('Age must be between 0 and 150'));
}
if (errors.length > 0) {
throw new AggregateError(errors, 'Validation failed');
}
return input;
}
try {
validateUser({ email: 'notanemail', password: 'short' });
} catch (err) {
if (err instanceof AggregateError) {
console.log(`Failed with ${err.errors.length} errors:`);
err.errors.forEach((e, i) => console.log(` ${i + 1}. ${e.message}`));
}
}
AggregateError.errors is iterable, so you can report each failure individually to the user rather than just saying “validation failed.”
Express Error Middleware
Express has a special middleware signature for error handlers. Unlike regular middleware with (req, res, next), error handlers receive (err, req, res, next) and only run when an error is passed via next(err).
Defining an Error Handler
Register error handlers after all regular routes. Express identifies error handlers by arity — four parameters means error handler.
const express = require('express');
const app = express();
// Stub — replace with your actual database call
async function getUserById(id) {
const user = await db.query('SELECT * FROM users WHERE id = $1', [id]);
return user.rows[0] || null;
}
app.get('/users/:id', async (req, res, next) => {
try {
const user = await getUserById(req.params.id);
if (!user) return res.status(404).json({ error: 'User not found' });
res.json(user);
} catch (err) {
next(err); // pass to error handler
}
});
app.use((err, req, res, next) => {
console.error(`[${new Date().toISOString()}] ${err.name}: ${err.message}`);
const statusCode = err.statusCode || err.status || 500;
res.status(statusCode).json({
error: err.message,
code: err.code,
});
});
Express only routes to a four-parameter middleware when next(err) is called or when something throws. Regular res.json() responses skip error handlers entirely.
Distinguishing Operational vs Programmer Errors
Not all errors are equal. Operational errors are expected failures — invalid input, network timeouts, missing resources. Programmer errors are bugs — calling a function with the wrong type, referencing an undefined variable.
- Operational errors should be handled gracefully and reported to the user as structured responses
- Programmer errors should crash the process and trigger an alert, because the application is now in an undefined state
function isOperationalError(err) {
if (err instanceof AppError) return true;
// Known operational error codes
const operationalCodes = new Set([
'ENOENT', 'EADDRINUSE', 'ECONNREFUSED', 'ETIMEDOUT', 'ENOTFOUND',
]);
return err.code && operationalCodes.has(err.code);
}
app.use((err, req, res, next) => {
if (isOperationalError(err)) {
res.status(err.statusCode || 500).json({
error: err.message,
});
} else {
console.error('PROGRAMMER ERROR — process will exit:', err);
process.exit(1);
}
});
This distinction is the most important design decision in any error-handling strategy. Do not treat programmer errors as user-facing problems — they indicate your code is wrong.
Filtering Error Detail by Environment
Never expose internal error details like stack traces or file paths to end users in production. These details are useful during development but become attack surface in production.
app.use((err, req, res, next) => {
const isDev = process.env.NODE_ENV === 'development';
res.status(err.statusCode || 500).json({
error: err.message,
...(isDev && {
stack: err.stack,
originalError: err, // useful during debugging
}),
});
});
In development you get the full stack trace. In production the user sees only the error message, keeping internal implementation details hidden.
Unhandled Rejections and Uncaught Exceptions
Node.js gives you one shot to handle an unhandled rejection or uncaught exception before the process exits. Writing handlers for these is not optional in production — it is how you get alerts and graceful shutdowns instead of silent deaths.
Listening for unhandledRejection
When a promise rejection goes uncaught, Node.js emits this event. If you do not listen for it, Node.js prints a warning and continues. If you do listen, you control what happens.
process.on('unhandledRejection', (reason, promise) => {
console.error(`[${new Date().toISOString()}] UNHANDLED REJECTION:`);
console.error('Reason:', reason);
console.error('Promise:', promise);
// In production: send alert, log to monitoring service
// Then exit — continuing risks inconsistent state
process.exit(1);
});
Node.js emits unhandledRejection even when the rejection value is undefined. Always check both reason and promise.
Listening for uncaughtException
Synchronous code that throws and is not caught triggers uncaughtException. Handle it to log the error, perform cleanup, and exit with a non-zero code.
process.on('uncaughtException', (err) => {
console.error(`[${new Date().toISOString()}] UNCAUGHT EXCEPTION:`);
console.error(err.name, err.message);
console.error(err.stack);
// Give loggers a moment to flush before exiting
setTimeout(() => {
process.exit(1);
}, 100);
});
The Problem with Recovery
After an uncaughtException, your process is in an undefined state. Any open handles (files, sockets, database connections) may be corrupted. The safest choice is always to exit and let a process manager restart.
let isShuttingDown = false;
process.on('uncaughtException', (err) => {
console.error('uncaughtException received — exiting', err.message);
if (!isShuttingDown) {
isShuttingDown = true;
setTimeout(() => process.exit(1), 100);
}
});
process.on('unhandledRejection', (reason) => {
console.error('unhandledRejection received — exiting', reason);
if (!isShuttingDown) {
isShuttingDown = true;
setTimeout(() => process.exit(1), 100);
}
});
The isShuttingDown guard prevents multiple exit attempts if cleanup code itself throws.
Structured Logging for Error Context
A stack trace alone is not enough for production debugging. You need structured logs that tie errors to request IDs, user sessions, and the exact state of your application at the time of failure.
Using pino
Pino is a fast structured logger that outputs JSON by default, making it ideal for log aggregation systems.
const pino = require('pino');
const logger = pino({
level: process.env.LOG_LEVEL || 'info',
formatters: {
level: (label) => ({ level: label }),
},
});
module.exports = logger;
Attaching Request Context
Every incoming request should receive a unique ID. Propagate that ID through your entire call stack so error logs and access logs can be correlated.
const { v4: uuidv4 } = require('uuid');
const logger = require('./logger');
function requestContext(req, res, next) {
const requestId = req.headers['x-request-id'] || uuidv4();
req.requestId = requestId;
res.setHeader('x-request-id', requestId);
// Create a child logger with the request ID baked in
req.log = logger.child({ requestId });
next();
}
// Usage in any route handler
app.get('/users/:id', (req, res) => {
req.log.info({ userId: req.params.id }, 'Fetching user');
// All logs from this request include the requestId field
});
With requestId in every log line, you can grep your log aggregator for a specific request and see every operation that touched it.
Redacting Sensitive Data
Loggers must never output passwords, tokens, API keys, or personally identifiable information. Configure redaction rules to strip these fields automatically before logging.
const logger = pino({
level: 'info',
redact: {
paths: [
'req.headers.authorization',
'req.headers.cookie',
'req.body.password',
'req.body.token',
'user.ssn',
'user.password',
],
censor: '[REDACTED]',
},
});
// req.log.info({ req: { headers: { authorization: 'Bearer secret' } } });
// Output: { req: { headers: { authorization: '[REDACTED]' } } }
Pino redacts at write time, not at serialization time, so redacted values never reach the log file.
Graceful Shutdown
When your process receives a termination signal, it should stop accepting new connections, finish in-flight requests, close database connections, flush logs, and exit cleanly. Skipping this step causes connection resets and data loss.
Handling SIGTERM vs SIGINT
Docker, Kubernetes, and most process managers send SIGTERM to request a graceful stop. SIGINT (Ctrl+C) is for interactive stops. Handle both identically.
const http = require('http');
const app = require('./app'); // Express app
const server = http.createServer(app);
async function shutdown(signal) {
console.log(`\n${signal} received — starting graceful shutdown`);
// Stop accepting new connections
server.close(async () => {
console.log('HTTP server closed');
// Close database connections
await db.pool.end();
console.log('Database connections closed');
console.log('Shutdown complete');
process.exit(0);
});
// Force exit after timeout — something is stuck
setTimeout(() => {
console.error('Shutdown timeout — forcing exit');
process.exit(1);
}, 10_000);
}
process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));
Docker’s default SIGTERM timeout is 10 seconds. Set your forced exit slightly longer to let the graceful shutdown complete first.
Closing HTTP Servers
The server.close() method stops accepting new connections but lets existing ones finish. Call it before closing database connections to avoid in-flight requests failing mid-shutdown.
function gracefulClose(server, db) {
return new Promise((resolve) => {
server.close(async () => {
// Finish in-flight requests are still running here
await db.end();
resolve();
});
// Fallback
setTimeout(() => {
console.warn('Graceful close timeout — forcing');
resolve();
}, 8000);
});
}
Closing Database Connections
Databases must receive a close signal to flush buffers and release connections. For PostgreSQL, call pool.end(). For MongoDB with Mongoose, call mongoose.connection.close().
// PostgreSQL with pg
const { Pool } = require('pg');
const db = new Pool({ connectionString: process.env.DATABASE_URL });
async function shutdown() {
console.log('Closing database pool...');
await db.end();
console.log('Database pool closed');
}
// MongoDB with Mongoose
const mongoose = require('mongoose');
async function shutdownMongoose() {
await mongoose.connection.close();
}
Always close database connections during shutdown. For connection pools, pool.end() waits for active queries to finish before releasing connections.
Health Checks and Error Recovery Patterns
Production systems need health endpoints that report whether the application is healthy, whether its dependencies are reachable, and whether it should remain in a load balancer rotation.
Liveness vs Readiness Probes
Kubernetes distinguishes between liveness probes (is the process alive?) and readiness probes (can it accept traffic?). Liveness failures trigger a restart. Readiness failures remove the instance from the load balancer.
const app = require('express')();
// Simple liveness check — is the process alive?
app.get('/health/live', (req, res) => {
res.json({ status: 'ok' });
});
// Readiness check — can the app handle traffic?
app.get('/health/ready', async (req, res) => {
const checks = await Promise.allSettled([
db.query('SELECT 1'), // database reachable?
cache.ping(), // cache reachable?
]);
const results = checks.map((c, i) => ({
dependency: ['database', 'cache'][i],
status: c.status === 'fulfilled' ? 'ok' : 'unavailable',
}));
const healthy = results.every(r => r.status === 'ok');
res.status(healthy ? 200 : 503).json({
status: healthy ? 'ready' : 'not ready',
checks: results,
});
});
Liveness should be cheap and fast — no database calls. Readiness should check real dependencies and return 503 when dependencies are unavailable.
Retry Logic with Exponential Backoff
Transient failures (network hiccups, brief overloads) often resolve themselves if you retry. Exponential backoff prevents thundering herd problems by increasing the delay between retries.
async function fetchWithRetry(url, options = {}) {
const {
retries = 3,
baseDelayMs = 100,
maxDelayMs = 5000,
} = options;
for (let attempt = 0; attempt < retries; attempt++) {
try {
const res = await fetch(url);
if (!res.ok) throw new Error(`HTTP ${res.status}`);
return await res.json();
} catch (err) {
if (attempt === retries - 1) throw err;
// Calculate delay: base × 2^attempt, capped at maxDelayMs
const jitter = Math.random() * 50; // small random offset to avoid thundering herd
const delay = Math.min(baseDelayMs * Math.pow(2, attempt), maxDelayMs) + jitter;
console.warn(`Attempt ${attempt + 1} failed: ${err.message}. Retrying in ${Math.round(delay)}ms...`);
await new Promise(resolve => setTimeout(resolve, delay));
}
}
}
// fetchWithRetry('https://api.example.com/data');
// Retries 3 times with delays of ~100ms, ~200ms, ~400ms (+ jitter)
Add jitter (small random offset) even when using exponential backoff. Without it, many clients restarting simultaneously create a thundering herd that overwhelms the recovering service.
Circuit Breaker Pattern
When a downstream service is failing, continuing to send requests wastes resources and can cascade failures. A circuit breaker tracks failure rates and opens the circuit to fast-fail requests when a threshold is exceeded.
class CircuitBreaker {
constructor(options = {}) {
this.failureThreshold = options.failureThreshold || 5;
this.resetTimeoutMs = options.resetTimeoutMs || 30_000;
this.state = 'CLOSED';
this.failures = 0;
this.nextAttempt = Date.now();
}
async call(fn) {
if (this.state === 'OPEN') {
if (Date.now() > this.nextAttempt) {
this.state = 'HALF_OPEN';
console.log('Circuit breaker: entering HALF_OPEN state');
} else {
throw new Error('Circuit breaker is OPEN — failing fast');
}
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (err) {
this.onFailure();
throw err;
}
}
onSuccess() {
this.failures = 0;
if (this.state === 'HALF_OPEN') {
this.state = 'CLOSED';
console.log('Circuit breaker: CLOSED after successful HALF_OPEN call');
}
}
onFailure() {
this.failures++;
console.warn(`Circuit breaker failure ${this.failures}/${this.failureThreshold}`);
if (this.failures >= this.failureThreshold) {
this.state = 'OPEN';
this.nextAttempt = Date.now() + this.resetTimeoutMs;
console.warn(`Circuit breaker: OPEN — will retry after ${this.resetTimeoutMs}ms`);
}
}
}
// Stub — replace with your actual service call
async function fetchUserFromService(userId) {
const res = await fetch(`https://api.example.com/users/${userId}`);
if (!res.ok) throw new Error(`HTTP ${res.status}`);
return res.json();
}
// Usage
const breaker = new CircuitBreaker({ failureThreshold: 3, resetTimeoutMs: 10_000 });
app.get('/users/:id', async (req, res, next) => {
try {
const user = await breaker.call(() => fetchUserFromService(req.params.id));
res.json(user);
} catch (err) {
res.status(503).json({ error: 'Service temporarily unavailable' });
}
});
The circuit breaker cycles through three states: Closed (normal operation), Open (fast-fail because too many recent errors), and Half-Open (probe to see if the downstream service has recovered). This pattern prevents cascading failures under sustained load.
Summary
Good error handling in production Node.js comes down to a handful of consistent practices. Use custom error classes to distinguish failure modes. Handle all async errors with try/catch or .catch(). Write Express error middleware to format responses and hide internal details. Log structured errors with request context. And implement graceful shutdown so termination signals do not become silent data loss.
Errors will always happen. What separates production-ready applications from fragile ones is having a plan for each category of failure — and testing that plan under realistic conditions.
See Also
- Input Validation and Sanitization — Validate and sanitize user input before it reaches your business logic