Input Validation and Sanitization in Node.js
Before you start
You should be comfortable with Express basics and understand how HTTP requests carry user data through query strings, body payloads, and headers. This tutorial covers validation libraries (Zod, Joi, Yup, express-validator), sanitization techniques for HTML and SQL contexts, and practical patterns for keeping user input safe at every layer of a Node.js application.
Why input validation matters
Every piece of data that enters your application from the outside is potentially dangerous. Users can accidentally submit bad data (typos, wrong formats) or deliberately try to break things (hackers submitting malicious code). Without validation, your app might crash, store corrupted data, or worse — become vulnerable to attacks.
Think of input validation like a bouncer at a club. They check your ID, make sure you’re on the list, and turn away anyone who doesn’t meet the rules. Your app needs the same bouncer for every piece of user input.
There are two related concepts here:
- Validation checks if input meets your rules (is this a valid email?)
- Sanitization cleans input to make it safe (remove dangerous characters)
You need both. Validation keeps bad data out. Sanitization makes sure even “valid” data won’t cause harm when you use it.
Schema validation with Zod
Zod is a library that lets you define the exact shape of data your app accepts. It works great with TypeScript but works fine in plain JavaScript too.
First, install it:
npm install zod
Zod’s approach is to define the shape of valid data once and let the library handle both runtime checks and TypeScript type generation. You describe what you expect: a string with a minimum length, a properly formatted email, an optional numeric field. Zod enforces those rules every time data enters your application. Here’s how you define a simple user schema:
import * as z from "zod";
// Define what valid user data looks like
const UserSchema = z.object({
username: z.string().min(3).max(20),
email: z.string().email(),
age: z.number().min(0).optional(),
});
// Try to validate some input
const result = UserSchema.safeParse(someInput);
// Check if it passed
if (result.success) {
console.log(result.data); // Valid data with correct types
} else {
console.log(result.error.issues); // Array of validation errors
}
The key methods are parse() (throws on error) and safeParse() (returns an object you can check). Always use safeParse in production. It won’t crash your server with uncaught exceptions. When validation fails, safeParse returns a structured error you can inspect programmatically, making it ideal for API endpoints that need to report specific field failures back to clients instead of a generic 500.
Zod can also infer TypeScript types from your schema automatically, keeping your runtime validation and compile-time types in sync without duplicate definitions:
type User = z.infer<typeof UserSchema>; // { username: string; email: string; age?: number }
Joi for JavaScript projects
Joi is another popular choice, especially if you’re not using TypeScript. It has a similar purpose but a different style. Joi’s API is chain-based and reads left to right, which many teams find intuitive for defining validation rules without needing to learn TypeScript generics.
npm install joi
Joi returns both the validated value and any errors in a single destructured object, which keeps your validation logic compact. The validate method runs all rules against the input and gives you either cleaned data or a detailed error breakdown. There is no need to check intermediate results or chain conditional logic yourself:
import Joi from "joi";
const schema = Joi.object({
name: Joi.string().min(1).max(100).required(),
age: Joi.number().integer().min(0).optional(),
email: Joi.string().email(),
});
// Validate
const { error, value } = schema.validate(userInput);
if (error) {
console.log(error.details); // List of validation errors
}
One thing to watch: Joi tries to convert input types by default. A string "25" might become a number 25. Pass { convert: false } if you need strict type checking.
Yup for form-like validation
Yup looks a lot like Joi but integrates nicely with form libraries. It’s a good choice if you’re building forms. Yup’s schema definitions use named imports that read clearly in code, and its async validation support makes it a natural fit for form libraries like Formik that need to validate against server endpoints.
npm install yup
Yup schemas are immutable: each method call returns a new schema instance, so you can compose and extend schemas without mutating the original. This matters when you have a base schema that several forms share but each form needs a few extra fields. You can clone the base and add to it without worrying about side effects leaking into other validation paths:
import { object, string, number } from "yup";
const schema = object({
username: string().required().min(3),
email: string().email().required(),
age: number().min(0).integer().optional(),
});
// Sync or async validation
schema.validateSync(data); // throws on error
await schema.validate(data); // returns Promise, rejects on error
Yup also supports validateSync for quick checks during development and the async validate method for production code where you might need to hit a database or external service to confirm uniqueness. Both throw on failure, so wrap them in try/catch or use .catch() when you need to handle errors gracefully.
Yup also has transforms, which let you clean or modify data during validation:
const schema = string()
.trim() // remove whitespace from ends
.lowercase() // convert to lowercase
.email();
Transforms run in the order you chain them, so .trim().lowercase().email() first strips whitespace, then lowercases, then checks for valid email format. This pipeline pattern means you get clean data without a separate sanitization pass.
express-validator for Express routes
If you’re using Express, this library integrates directly with your route handlers. It lets you define validation rules right in your route. Unlike standalone validators, express-validator attaches errors to the request object so your route handler can inspect them in one place without try/catch blocks around every validation call.
npm install express-validator
Each validator function builds a chain that runs when the request arrives. The chain stops at the first failure, but you can collect all errors across multiple fields by checking validationResult after the middleware array completes. This means you get a complete picture of what went wrong in a single response:
import express from "express";
import { body, validationResult } from "express-validator";
const app = express();
app.use(express.json());
app.post("/users", [
body("username").isLength({ min: 3 }).isAlphanumeric(),
body("email").isEmail().normalizeEmail(),
body("age").optional().isInt({ min: 0 }),
], (req, res) => {
const errors = validationResult(req);
if (!errors.isEmpty()) {
return res.status(400).json({ errors: errors.array() });
}
// req.body is now validated and sanitized
const { username, email, age } = req.body;
res.json({ message: "User created successfully" });
});
The validators run in order, and sanitizers modify values before the next validator sees them. normalizeEmail() cleans up the email address, and isInt() ensures age is a valid integer.
Preventing XSS attacks
XSS (cross-site scripting) happens when untrusted data gets executed as JavaScript in a browser. This occurs when you take user input and put it directly into HTML without encoding.
The simplest rule: never use .innerHTML with raw user input.
// UNSAFE
element.innerHTML = userInput;
// SAFE - use textContent instead
element.textContent = userInput;
If you need to render HTML from user input, you must sanitize it first. Browsers parse HTML aggressively, and an attacker only needs one unescaped angle bracket to inject a script tag into your page. The safest approach is to use a library like DOMPurify that understands the DOM and can distinguish safe formatting tags from dangerous executable elements.
npm install dompurify
DOMPurify works by parsing the input as HTML, walking the resulting DOM tree, and removing any elements or attributes not on your allowlist. This is far more reliable than regex-based sanitizers, which attackers can often bypass with creative encoding or malformed markup:
import DOMPurify from "dompurify";
const clean = DOMPurify.sanitize(dirtyHtml, {
ALLOWED_TAGS: ["b", "i", "em", "strong", "a"],
ALLOWED_ATTR: ["href", "target", "rel"],
});
DOMPurify strips out dangerous tags and attributes while keeping safe formatting tags. The allowlist approach means you explicitly decide what markup survives. Everything else gets removed. This default-deny posture is the safest way to handle user-supplied HTML, and it integrates cleanly with any server-side or client-side rendering pipeline.
SQL injection prevention
SQL injection happens when attackers insert malicious SQL commands into your queries through user input. The fix is simple: never concatenate user input into SQL strings.
// UNSAFE - don't do this
const query = "SELECT * FROM users WHERE email = '" + userEmail + "'";
db.query(query);
// SAFE - use parameterized queries
const result = await db.query(
"SELECT * FROM users WHERE email = $1",
[userEmail]
);
Different database drivers use different placeholders:
pg(PostgreSQL):$1,$2mysql2:?better-sqlite3:?
Best practices summary
Here’s what you should do:
- Validate early: check input at the entry point of your application
- Use schema validation libraries. Zod, Joi, Yup, and express-validator handle the heavy lifting
- Sanitize for the output context: HTML encoding for web pages, parameterized queries for databases
- Keep dependencies updated. Security libraries like DOMPurify get patches for new bypass techniques
- Validate on the server. Client-side validation is convenient but can be bypassed
Remember: validation and sanitization are two different things. Validation decides what to accept. Sanitization makes sure accepted data is safe to use. You need both.
Validate at the boundary
The safest place to validate input is right where it enters your app. That might be a route handler, a form submission, or a queue consumer. Once data passes that boundary, every later layer can assume the shape is already correct, which keeps the rest of the code simpler and easier to test. Boundary checks also make error messages easier to explain to users.
Normalize before storage
Sanitization should happen before data reaches your database or rendering layer. Trim whitespace, lower-case fields when case does not matter, and remove characters that your app will never use. Doing this up front keeps records consistent and reduces the number of special cases you need to handle later. It also makes search, sorting, and deduplication much less awkward.
Match the output context
Validation tells you whether the input is acceptable. Safety depends on where that data goes next. HTML, SQL, logs, and shell commands all need different handling, so the final output step matters just as much as the initial check. If you keep the context in mind, the app will be much less likely to turn safe data into a problem.
Validate early, reject clearly
If input is bad, say so at the boundary. Clear validation errors are easier to fix than silent failures or strange downstream bugs. Make messages specific enough that users or API clients can act on them, and keep the rules close to the route or form they protect.
Clean data before it spreads
Normalization is most useful when it happens before data enters storage, search, or rendering. A trimmed string, a lowercased email, or a numeric coercion can prevent small mismatches from turning into bigger support issues later. Once the data is cleaned, the rest of the app can assume a steadier shape.
Return errors users can act on
Validation is most helpful when the failure message tells the user what to change. “Invalid input” is true, but it is not useful. Point to the field, explain the rule, and keep the wording plain so the next step is obvious. That small bit of care makes bad input much easier to fix.
Sanitize by context
The same value can be safe in one place and risky in another. Text for a page, data for a query, and a log message all need different treatment. When you sanitize by context, you reduce the chance that data slips into the wrong place with the wrong meaning.
Next steps
- Apply these patterns to every route handler that accepts user input. Start with the endpoints that handle form submissions and API payloads
- Add integration tests that exercise your validation layer with both valid and malicious input to confirm rejection happens at the boundary
- Read the OWASP Input Validation Cheat Sheet for a comprehensive treatment of validation strategies across different data types
See also
- Express Basics: build your first Express server
- Middleware Patterns: chain middleware to handle cross-cutting concerns
- REST API Design: design APIs that scale
- XSS Prevention: deep dive into cross-site scripting defense