String.prototype.isWellFormed()

isWellFormed()

Returns boolean· Added in ves2024· Updated June 7, 2026· String Methods

javascriptstringunicodeutf-16surrogateses2024

What isWellFormed does

String.prototype.isWellFormed() answers one question: does this string contain any lone UTF-16 surrogates? It returns true if the string is well-formed, false otherwise, and it never throws.

const strings = [
  "ab\uD800",            // lone high surrogate at the end
  "ab\uD800c",           // lone high surrogate in the middle
  "\uDFFFab",            // lone low surrogate at the start
  "c\uDFFFab",           // lone low surrogate in the middle
  "abc",                 // plain ASCII
  "ab\uD83D\uDE04c",     // surrogate pair (U+D83D U+DE04) for U+1F604
];

for (const s of strings) {
  console.log(s.isWellFormed());
}
// false
// false
// false
// false
// true
// true

That last case matters. Strings that look “complicated” because they contain emoji or other non-BMP characters are still well-formed, because code points above the Basic Multilingual Plane (U+1F604 and similar) are encoded as a proper surrogate pair (U+D83D U+DE04), not as lone halves.

Syntax

isWellFormed()

Parameter	Type	Description
(none)		The method takes no arguments.

Returns: boolean — true if the string contains no lone surrogates, false otherwise.

Throws: Never. The whole point of the API is to let you branch on validity without a try/catch.

Why well-formedness matters

JavaScript strings are UTF-16 encoded. Characters in the Basic Multilingual Plane (U+0000–U+FFFF) fit in a single 16-bit code unit. Code points above that, which includes most emoji and many CJK characters, are encoded as a surrogate pair: a high surrogate in the range U+D800–U+DBFF followed by a low surrogate in the range U+DC00–U+DFFF.

A string is well-formed when every high surrogate is immediately followed by a low surrogate, and every low surrogate is immediately preceded by a high surrogate. Anything else is ill-formed, also called a “lone surrogate” string.

Lone surrogates show up in real code more often than you might expect:

String.fromCharCode(0xD800) produces a lone high surrogate. The fromCodePoint form does not.
Slicing a surrogate pair in half: "\uD83D\uDE04".slice(0, 1) gives you "\uD83D".
Decoding invalid UTF-8 with a non-fatal TextDecoder produces lone surrogates for byte sequences that don’t form a valid code point.
Older APIs or binary protocols that hand you a string decoded as Latin-1 and re-encoded as UTF-16.

A lone surrogate isn’t a crash on its own, but it is a problem for anything that expects valid UTF-16: encodeURI throws URIError, the behavior of JSON.stringify on strings containing them is implementation-defined, and rendering can be inconsistent.

Guarding encodeURI

encodeURI and encodeURIComponent throw URIError: URI malformed when handed a lone surrogate. isWellFormed() gives you a clean way to check first:

const url = "https://example.com/search?q=\uD800";

if (url.isWellFormed()) {
  console.log(encodeURI(url));
} else {
  console.warn("Refusing to encode a string with lone surrogates.");
}
// Refusing to encode a string with lone surrogates.

Without isWellFormed, the same code would need a try/catch around the encodeURI call, which is the exact pattern the new method is meant to replace.

Coercion

Like other String.prototype methods, isWellFormed coerces this to a string first, so you can call it on any value:

String.prototype.isWellFormed.call(123);       // true  (123 -> "123", all ASCII)
String.prototype.isWellFormed.call(null);      // true  (null -> "null", all ASCII)
String.prototype.isWellFormed.call("\uD800");  // false (lone high surrogate)

In practice you almost always call it as a method, but the coercion behavior is useful to know when you want to validate a value without first checking that it’s already a string.

Gotchas

It scans the whole string. There is no “is this one character well-formed” form. If you need to find where a lone surrogate lives, walk the string with String.prototype.codePointAt() and look for indices where the returned code point is itself a surrogate, or compare the string’s .length to Array.from(str).length.
Companion method: toWellFormed(). Also added in ES2024, this returns a new string with every lone surrogate replaced by U+FFFD (the Unicode replacement character �). Use isWellFormed() to reject, toWellFormed() to sanitize.
Not the same as normalize(). String.prototype.normalize() handles Unicode normalization forms (NFC, NFD, NFKC, NFKD). It does not check, fix, or even care about lone surrogates.
A regex is slower and easier to get wrong. The native method inspects the engine’s internal UTF-16 representation in a single pass. A regex like /[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?<![\uD800-\uDBFF])[\uDC00-\uDFFF]/ checks the same condition but is typically slower on long strings and notoriously fiddly to write correctly the first time.

Browser and runtime support

Available in all current engines:

Chrome and Edge 111+ (March 2023)
Firefox 119+ (October 2023)
Safari 16.4+ (March 2023)
Node.js 20.0+ (April 2023)

For older runtimes, core-js ships a well-formed-unicode-strings module, and the es-shims/string.prototype.iswellformed package on npm provides a spec-compliant polyfill.