What is a homoglyph attack?

A homoglyph attack registers a domain using characters that look identical to your brand's name but are technically different — for example a Cyrillic "а" in place of a Latin "a" — so the lookalike reads as your domain to the human eye while resolving to an attacker's server.

What is Punycode and the xn-- prefix?

Punycode is the ASCII encoding the DNS uses to represent Unicode (non-ASCII) domain labels. A Unicode label is encoded to an ASCII string beginning with `xn--`, which is what actually gets registered and resolved. Browsers can choose to show the `xn--` form to expose a spoof.

Do browsers block IDN homograph attacks?

Only partially. Modern browsers show the `xn--` Punycode form when a label mixes scripts or trips confusable rules, but single-script spoofs (such as an all-Cyrillic name) can still render as Unicode, and email clients, chat apps, and link previews are far weaker.

How do you detect homoglyph lookalike domains?

You compute a "skeleton" for each string using the Unicode Consortium's confusables data (UTS #39), which maps confusable characters to a canonical form. Two strings that reduce to the same skeleton are visual confusables, which flags lookalikes programmatically rather than by guesswork.

Homoglyph & IDN Lookalike Domains: The Unicode Attack Explained

A domain that looks identical to yours but is built from different characters is a Unicode spoof — here is how it works and how to catch it.

A domain that looks like yours but isn't

Open your brand's name in a browser tab and it looks exactly right. Every letter is in place. But the domain resolving behind it belongs to someone else, because one of those "letters" is a different character that merely looks identical. This is the homoglyph attack, and it defeats the one defense most people rely on: their own eyes.

The attack works because the characters humans read and the characters computers resolve are not the same alphabet. Lead with that fact and the rest follows.

What a homoglyph actually is

A homoglyph is a character that looks like another character but has a different code point. Some are within plain ASCII; most of the dangerous ones come from other Unicode scripts.

The Cyrillic small letter "а" (U+0430) is visually identical to the Latin "a" (U+0061).
The digit "0" stands in for the letter "O"; the digit "1" for lowercase "l" or uppercase "I".
The letter pair "rn" rendered together reads as "m" at a glance — so modern.com and modem.com blur.
Greek, Armenian, and Cherokee scripts contribute dozens more Latin lookalikes.

Each substitution produces a string that is byte-for-byte different from your real domain while being pixel-for-pixel close enough to pass. Typosquatting (a fat-fingered misspelling) and combosquatting (yourbrand-login.com) are related tricks, but they are distinct — a typo is a different word, a homoglyph is the same word in a different alphabet.

How IDNs and Punycode make it possible

The DNS was built for ASCII. Internationalized Domain Names (IDNs) let people register domains in their own scripts — Arabic, Chinese, Cyrillic — by encoding the Unicode label into an ASCII-safe form called Punycode, marked with the xn-- prefix.

So a domain a user sees as Unicode is actually registered and resolved as something like xn--80ak6aa92e.com. That string is the real registration; the Unicode display is a rendering of it.

The classic IDN homograph attack exploits this. An attacker registers an all-Cyrillic spelling of a well-known brand — visually indistinguishable from the Latin original — which encodes to an innocuous-looking xn-- label nobody recognizes. To a victim, the address bar reads as the genuine brand. To the resolver, it is an entirely separate domain pointing wherever the attacker wants.

Displayed:   аррӏе.com      (Cyrillic characters)
Registered:  xn--80ak6aa92e.com
Looks like:  apple.com      (Latin characters)

What browsers protect — and what they don't

This is the part that gets oversimplified. Browser vendors did respond.

Chrome, Firefox, and Safari now apply heuristics: if a domain label mixes scripts (Latin + Cyrillic in one word) or matches known risky confusable patterns, the browser refuses to render the Unicode and shows the raw xn-- Punycode instead — which looks obviously wrong and breaks the illusion.

But the protection is uneven:

Single-script spoofs slip through. A label that is entirely Cyrillic doesn't "mix scripts," so some configurations still render it as Unicode.
Coverage varies by script and browser version. The confusable rules are inconsistent across the long tail of Unicode.
Everything outside the browser is weaker. Email clients, chat apps, SMS, and link-preview cards frequently render the Unicode glyphs with no Punycode fallback at all — and phishing arrives by email, not by someone typing your domain.

The lookalike can be visually identical and still sail past a browser's defenses; the spoof only becomes obvious when you stop trusting the rendering and compare the underlying characters — which is exactly what skeleton matching does.

Treating "modern browsers handle this" as a closed case is how impersonation domains keep landing in inboxes.

How detection actually works: the skeleton algorithm

You cannot eyeball this at scale, and you shouldn't try. The Unicode Consortium publishes confusables data in UTS #39 along with a defined procedure called the skeleton algorithm.

The idea is simple and robust:

For every character in a string, look up its canonical confusable form in the UTS #39 mapping (Cyrillic "а" maps to the same skeleton element as Latin "a"; "0" collapses toward "o"; multi-character confusables like "rn" toward "m").
Replace each character with its mapping to produce a normalized skeleton string.
Compute the skeleton for the suspect domain and for your real brand name.
If two strings produce the same skeleton, they are confusables — visually equivalent regardless of which scripts or code points they use.

This turns an unbounded visual-similarity problem into an exact string comparison. A domain built from Cyrillic characters and your genuine Latin domain reduce to the same skeleton, and the match exposes the impersonation no matter how convincing the rendering is. This is precisely the technique Brandfence uses: it computes confusable skeletons across your brand's name, then resolves each candidate's DNS to see which lookalikes are actually live and mail-capable — because a registered-but-dark domain and one with active MX records are very different threats — scores them, and routes confirmed impersonations to human-reviewed takedown.

What brand owners should do

Defensive registration helps but cannot win on its own — the confusable space is effectively infinite, so you can never pre-register every variant. The durable posture is detect, then disrupt.

Register the highest-value variants defensively (the most plausible single-substitution spoofs of your primary domain), accepting that this is triage, not coverage.
Monitor Certificate Transparency and DNS for newly registered confusables — a lookalike provisioning a TLS certificate is preparing to look legitimate.
Prioritize by liveness and mail capability, not by registration count. A confusable domain with active mail records is an imminent phishing channel.
Take down the malicious ones with evidence packaged for the registrar or host, and lean on your trademark: a UDRP complaint is far stronger when the lookalike infringes a registered mark.

Defense checklist

Compute UTS #39 confusable skeletons for your brand and alert on any registration that matches.
Watch CT logs and zone-file/DNS feeds for new confusable and mixed-script registrations.
Resolve candidates to flag which are live, which serve content, and which have MX records.
Don't rely on the browser address bar — assume email, chat, and previews render the spoof cleanly.
Pre-register a short list of the most dangerous single-character variants.
Keep trademark registrations current to enable UDRP and strengthen takedown demands.
Require human sign-off on every takedown notice; package evidence before you send.

Catch the lookalikes you can't see

Brandfence computes confusable skeletons across your brand's name, resolves which lookalikes are live and mail-capable, scores them, and routes confirmed impersonations to evidence-backed, human-reviewed takedown. Get a free brand exposure report.

Homoglyph & IDN Lookalike Domains: The Unicode Attack Explained

A domain that looks like yours but isn't

What a homoglyph actually is

How IDNs and Punycode make it possible

What browsers protect — and what they don't

How detection actually works: the skeleton algorithm

What brand owners should do

Defense checklist

Catch the lookalikes you can't see

Frequently asked questions

See what's impersonating your brand