String.prototype.isWellFormed()
isWellFormed() What isWellFormed does
String.prototype.isWellFormed() answers one question: does this string contain any lone UTF-16 surrogates? It returns true if the string is well-formed, false otherwise, and it never throws.
const strings = [
"ab\uD800", // lone high surrogate at the end
"ab\uD800c", // lone high surrogate in the middle
"\uDFFFab", // lone low surrogate at the start
"c\uDFFFab", // lone low surrogate in the middle
"abc", // plain ASCII
"ab\uD83D\uDE04c", // surrogate pair (U+D83D U+DE04) for U+1F604
];
for (const s of strings) {
console.log(s.isWellFormed());
}
// false
// false
// false
// false
// true
// true
That last case matters. Strings that look “complicated” because they contain emoji or other non-BMP characters are still well-formed, because code points above the Basic Multilingual Plane (U+1F604 and similar) are encoded as a proper surrogate pair (U+D83D U+DE04), not as lone halves.
Syntax
isWellFormed()
| Parameter | Type | Description |
|---|---|---|
| (none) | The method takes no arguments. |
Returns: boolean — true if the string contains no lone surrogates, false otherwise.
Throws: Never. The whole point of the API is to let you branch on validity without a try/catch.
Why well-formedness matters
JavaScript strings are UTF-16 encoded. Characters in the Basic Multilingual Plane (U+0000–U+FFFF) fit in a single 16-bit code unit. Code points above that, which includes most emoji and many CJK characters, are encoded as a surrogate pair: a high surrogate in the range U+D800–U+DBFF followed by a low surrogate in the range U+DC00–U+DFFF.
A string is well-formed when every high surrogate is immediately followed by a low surrogate, and every low surrogate is immediately preceded by a high surrogate. Anything else is ill-formed, also called a “lone surrogate” string.
Lone surrogates show up in real code more often than you might expect:
String.fromCharCode(0xD800)produces a lone high surrogate. ThefromCodePointform does not.- Slicing a surrogate pair in half:
"\uD83D\uDE04".slice(0, 1)gives you"\uD83D". - Decoding invalid UTF-8 with a non-fatal
TextDecoderproduces lone surrogates for byte sequences that don’t form a valid code point. - Older APIs or binary protocols that hand you a string decoded as Latin-1 and re-encoded as UTF-16.
A lone surrogate isn’t a crash on its own, but it is a problem for anything that expects valid UTF-16: encodeURI throws URIError, the behavior of JSON.stringify on strings containing them is implementation-defined, and rendering can be inconsistent.
Guarding encodeURI
encodeURI and encodeURIComponent throw URIError: URI malformed when handed a lone surrogate. isWellFormed() gives you a clean way to check first:
const url = "https://example.com/search?q=\uD800";
if (url.isWellFormed()) {
console.log(encodeURI(url));
} else {
console.warn("Refusing to encode a string with lone surrogates.");
}
// Refusing to encode a string with lone surrogates.
Without isWellFormed, the same code would need a try/catch around the encodeURI call, which is the exact pattern the new method is meant to replace.
Coercion
Like other String.prototype methods, isWellFormed coerces this to a string first, so you can call it on any value:
String.prototype.isWellFormed.call(123); // true (123 -> "123", all ASCII)
String.prototype.isWellFormed.call(null); // true (null -> "null", all ASCII)
String.prototype.isWellFormed.call("\uD800"); // false (lone high surrogate)
In practice you almost always call it as a method, but the coercion behavior is useful to know when you want to validate a value without first checking that it’s already a string.
Gotchas
- It scans the whole string. There is no “is this one character well-formed” form. If you need to find where a lone surrogate lives, walk the string with
String.prototype.codePointAt()and look for indices where the returned code point is itself a surrogate, or compare the string’s.lengthtoArray.from(str).length. - Companion method:
toWellFormed(). Also added in ES2024, this returns a new string with every lone surrogate replaced byU+FFFD(the Unicode replacement character�). UseisWellFormed()to reject,toWellFormed()to sanitize. - Not the same as
normalize().String.prototype.normalize()handles Unicode normalization forms (NFC, NFD, NFKC, NFKD). It does not check, fix, or even care about lone surrogates. - A regex is slower and easier to get wrong. The native method inspects the engine’s internal UTF-16 representation in a single pass. A regex like
/[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?<![\uD800-\uDBFF])[\uDC00-\uDFFF]/checks the same condition but is typically slower on long strings and notoriously fiddly to write correctly the first time.
Browser and runtime support
Available in all current engines:
- Chrome and Edge 111+ (March 2023)
- Firefox 119+ (October 2023)
- Safari 16.4+ (March 2023)
- Node.js 20.0+ (April 2023)
For older runtimes, core-js ships a well-formed-unicode-strings module, and the es-shims/string.prototype.iswellformed package on npm provides a spec-compliant polyfill.
See also
String.prototype.normalize(): the closest “Unicode hygiene” sibling onString.prototype. It handles normalization forms, which is a different concern from encoding validity.String.prototype.codePointAt(): reads a full code point starting at an index. Use it when you need to locate a lone surrogate inside a string.encodeURI(): the function that throws on ill-formed strings. TheisWellFormedguard pattern in this article’s example pairs naturally with it.