jsguides

Input Sanitization and Validation in JavaScript

Every application that accepts user input is a potential attack surface. Whether it is a comment form, a search box, or an API endpoint, your code needs to distinguish between legitimate data and malicious payloads. Input sanitization and validation are the two layers that make that distinction possible, and together they form the most practical defense against injection attacks in JavaScript applications.

Validation vs Sanitization

These two terms get used interchangeably, but they solve different problems.

Input validation checks whether data matches expected format, type, or range. It sits at the gate, rejecting what does not fit your criteria before your code even processes it. Validation is strict by nature: if the data does not pass the rules, it is out.

Input sanitization transforms potentially dangerous input into something safe. Rather than rejecting bad data outright, sanitization strips or neutralizes dangerous characters so the input can still be used without causing harm.

You need both. Validation alone leaves gaps: a correctly formatted string can still contain malicious content. Sanitization alone is incomplete: you might let garbage through that breaks your application. Together they form a defense-in-depth strategy.

// Validation: reject invalid email format
const email = req.body.email;
if (!/^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(email)) {
  return res.status(400).send('Invalid email');
}

// Sanitization: normalize the valid email
const safeEmail = email.toLowerCase().trim();

Common attack vectors

Understanding what you’re defending against shapes how you build the defenses.

Cross-Site Scripting (XSS)

XSS happens when user input finds its way into a web page without proper escaping, allowing attacker-controlled scripts to execute in other users’ browsers.

Three main varieties:

  • Reflected XSS: user input echoed in a server response without encoding, like search results
  • Stored XSS: malicious input saved to a database and served to other users
  • DOM-based XSS: client-side JavaScript reads input and writes it to the DOM unsafely
// Vulnerable: innerHTML lets scripts execute
element.innerHTML = `<p>${userInput}</p>`;

// Safe: textContent auto-escapes
element.textContent = userInput;

Using textContent is safe because the browser treats its value as plain text rather than markup. For cases where you do need to insert HTML, combine this approach with a sanitizer like DOMPurify — never pass raw user input to innerHTML.

SQL Injection

When user input reaches a database query through string concatenation, attackers can break out of the intended query structure. This is especially dangerous in Node.js backends that construct queries by interpolating strings from request bodies:

// Dangerous: string concatenation allows injection
db.query(`SELECT * FROM users WHERE name = '${name}'`);
// Input: "'; DROP TABLE users; --" becomes catastrophic

// Safe: parameterized query separates code from data
db.query('SELECT * FROM users WHERE name = $1', [name]);

Parameterized queries keep the SQL structure separate from the data values. The database driver sends the query template and the values in separate protocol messages, so no amount of special characters in the input can change the query’s meaning. Every major Node.js database driver supports this pattern.

Command Injection

Passing unsanitized user input to shell execution functions like child_process.exec, eval, or the Function constructor can let attackers run arbitrary system commands. These APIs interpret their string arguments as code, making them fundamentally dangerous with untrusted input:

// Never do this
eval(`console.log("${userInput}")`);

// Never this either
child_process.exec(`echo ${userInput}`);

Browser-Side Defenses

Using textContent Instead of innerHTML

The simplest XSS defense in browser JavaScript is choosing the right DOM property. textContent treats its value as plain text. innerHTML parses it as HTML, which means any embedded <script> tags execute.

If you must use innerHTML, sanitize the input first.

DOMPurify

For applications that need to accept structured HTML (think rich text editors), DOMPurify is the standard browser-side sanitizer. It parses HTML and strips anything dangerous while preserving safe markup.

import DOMPurify from 'dompurify';

const dirty = '<script>alert("xss")</script><p>Hello <b>world</b></p>';
const clean = DOMPurify.sanitize(dirty);
// clean === '<p>Hello <b>world</b></p>'

// Lock down to a minimal set of tags and attributes
const strictClean = DOMPurify.sanitize(dirty, {
  ALLOWED_TAGS: ['b', 'i', 'p', 'em', 'strong'],
  ALLOWED_ATTR: ['href']
});

DOMPurify operates on the actual DOM, not on strings, which protects against mutation-based bypasses that trick string-based sanitizers.

Content security policy

CSP is an HTTP response header that tells the browser what resources it is allowed to load and from where. It is not a substitute for input validation, but it limits what injected scripts can do even if XSS slips through.

Content-Security-Policy: default-src 'self'; script-src 'self'; object-src 'none'

Breaking it down:

  • default-src 'self' restricts all resources to your own origin by default
  • script-src 'self' allows scripts only from your own origin
  • object-src 'none' blocks Flash, Java, and other plugin content entirely

For inline scripts, use a nonce: a random token the server generates per request. The nonce must be unpredictable, so generate it with a cryptographically secure random function. Here is how you set it in the header:

Content-Security-Policy: script-src 'self' 'nonce-abc123'

The nonce in the header must match the nonce on every <script> tag on the page. On the server side, you generate a fresh nonce for each request and inject it into both the CSP header and the rendered HTML. Here is how that looks in an Express application:

// Server generates a unique nonce per request
app.use((req, res, next) => {
  res.locals.nonce = crypto.randomBytes(16).toString('hex');
  next();
});

app.get('/', (req, res) => {
  res.render('index', {
    nonce: res.locals.nonce,
    // Pass nonce to the template to add to <script nonce="...">
  });
});

Context-Aware Escaping

Escaping rules change depending on where you’re inserting data. The same character requires different handling in HTML, JavaScript strings, URLs, and CSS.

ContextCharacters to EscapeMethod
HTML body<, >, &, ", 'textContent or innerText
HTML attribute", ', <, >Always quote attributes
JavaScript string\, ', ", `JSON encode or escape manually
URL parameterSpecial charsencodeURIComponent()

The browser’s native textContent handles HTML context automatically. For other contexts, encodeURIComponent() is the standard choice for URL parameters.

Server-side validation in Node.js

Client-side validation improves UX but means nothing to an attacker who can call your API directly. Every endpoint must validate on the server.

Zod

Zod has become a go-to choice for Node.js validation because it combines schema definition with TypeScript type inference. You define the shape once and get both runtime validation and compile-time types.

import { z } from 'zod';

const UserSchema = z.object({
  username: z.string()
    .min(3)
    .max(30)
    .regex(/^[a-zA-Z0-9_]+$/),
  email: z.string().email(),
  age: z.number().int().positive().optional()
});

const result = UserSchema.safeParse(req.body);

if (!result.success) {
  return res.status(400).json({
    errors: result.error.issues
  });
}

// result.data is typed as { username: string; email: string; age?: number }
const user = result.data;

safeParse returns an object with either data (on success) or error (on failure). It never throws. This makes error handling straightforward. Zod also supports .refine() and .transform() for custom validation logic and data reshaping, which keeps non-trivial rules expressible without moving them outside the schema.

express-validator

For Express middleware chains, express-validator offers a fluent, chainable API. It integrates directly with Express’s middleware pattern, letting you declare validation rules alongside your route handlers:

import { body, validationResult } from 'express-validator';

app.post('/register', [
  body('email')
    .isEmail()
    .normalizeEmail(),
  body('password')
    .isLength({ min: 8 })
    .matches(/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)/),
  body('age')
    .optional()
    .isInt({ min: 0, max: 150 })
], (req, res) => {
  const errors = validationResult(req);
  if (!errors.isEmpty()) {
    return res.status(400).json({ errors: errors.array() });
  }
  // proceed
});

Building a complete validation pipeline

A realistic form handler combines schema validation with HTML sanitization and safe database writes. Each layer addresses a different class of threat: the schema validates shape and types, the sanitizer strips executable markup, and the parameterized query prevents database escape. Used together, even if one layer misses something, the next layer is there to catch it:

import DOMPurify from 'dompurify';
import { z } from 'zod';

const CommentSchema = z.object({
  username: z.string().min(3).max(30).regex(/^[a-zA-Z0-9_]+$/),
  email: z.string().email(),
  content: z.string().min(1).max(1000)
});

app.post('/comments', async (req, res) => {
  // 1. Validate the raw shape with Zod
  const parsed = CommentSchema.safeParse(req.body);
  if (!parsed.success) {
    return res.status(400).json({ error: parsed.error.flatten() });
  }

  // 2. Sanitize HTML content with DOMPurify
  const sanitizedContent = DOMPurify.sanitize(parsed.data.content, {
    ALLOWED_TAGS: ['p', 'b', 'i', 'em', 'strong', 'a'],
    ALLOWED_ATTR: ['href']
  });

  // 3. Write using parameterized query
  await db.query(
    'INSERT INTO comments (username, email, content) VALUES ($1, $2, $3)',
    [parsed.data.username, parsed.data.email, sanitizedContent]
  );

  res.status(201).json({ message: 'Comment saved' });
});

This flow does three things in order: it validates the structure and types with Zod, sanitizes any HTML markup with DOMPurify, and inserts using a parameterized query so database input cannot break out of the query structure.

Key Principles

Whitelist over blacklist. Trying to block known dangerous patterns is a losing game. New bypasses appear constantly. Define exactly what is allowed and reject everything else.

// Blacklist approach — easy to bypass
const bad = input.replace(/<script/gi, '');

// Whitelist approach — strict and predictable
const safe = input.replace(/[^a-zA-Z0-9 .,!?-]/g, '');

Never trust client-side validation. Browser JavaScript is fully inspectable and modifiable. Use it to improve form UX, but always re-validate on the server.

Sanitize late, validate early. Validate as soon as data enters your system. Sanitize right before the data leaves, when you know exactly where it is going.

Conclusion

Input validation and sanitization are complementary defenses. Validation rejects bad data at the boundary using strict rules. Sanitization neutralizes dangerous characters so input can still be used safely. Together they protect against XSS, SQL injection, and command injection: the three most common attack vectors targeting user input.

The practical stack is straightforward: Zod or express-validator for server-side schema validation, DOMPurify for HTML sanitization in the browser, parameterized queries for database writes, and CSP as a safety net that limits the damage any successful injection can cause.

Start with validation. Every endpoint should know what shape of data it expects and reject anything that does not match. Add sanitization for any context where HTML or rich content is involved. Set a strict CSP so that even if something slips through, the browser blocks the damage.

See Also