David Papp Blog

Is Your Browser Telling You Everything? How Web Domains Aren’t Always What You Think

Developer Xudong Zheng created a web page to educate people about a new security threat online. The danger was rooted in the weaknesses of several browsers, including Chrome, Firefox, and Opera.

He found that web addresses could have one appearance while being registered as something completely different.

To show the potential danger, he set up a dummy site. https://www.xn--80ak6aa92e.com/ is a URL appearing in several browsers as www.apple.com. How did he do it? Keep reading to find out.

Back in the early 90’s, only ASCII characters were used in domain names. ASCII, short for American Standard Code for Information Interchange, assigns a number code to each standard English character. If you were working in certain industries or in college school during the 80’s and 90’s, you may remember using it to share document texts between computers and countries.


An International Solution

ASCII was helpful for sharing research, but there were problems. Other systems of numerical code were being used worldwide at the same time. Also, ASCII didn’t account for international characters.

Specialists at Xerox and Apple began working on the problem in the late 80’s but it wasn’t until 1991 that the solution was published by the Unicode Consortium. The code they developed was designed to help people share information and software globally.

The system gives 16-bit codes to over 60,000 characters from languages worldwide. Need to use an ancient language? You can find Unicode for Ancient Egyptian Hieroglyphs. Are you more into millennial symbols? No problem – you can also find 77 of the latest emojis.

However, to be used in the Domain Name System, these international characters must be converted to ASCII characters. That’s where Punycode comes in.

Punycode and Look-alikes

Also known as International Domain Names in Applications (IDNA), Punycode converts non-Latin language to the limited 128 characters of ASCII text. It takes out non-English characters like “ñ” and those with umlauts. Then it replaces them with chains of lowercase letters, dashes, and number symbols. If you wanted to register the domain for your fan site – for a Romanian footballer named Gică, for example. You could pop “Gică” into any of a number of free online converters and get a name that looks like this: xn--gic-cpa.

The real trouble comes when people combine Punycode with homoglyphs. Homoglyphs already pose a risk to security. These are words with characters that look alike but have different meanings. For example, an uppercase i (I) looks like lower-case L (l) in most address bars. Anyone could set up a dummy site by making a substitution.

Punycode creates new possibilities for homoglyphs. Names entered in Punycode with non-Latin alphabet characters often show up as normalized versions in browsers like Chrome and Firefox. How did Zheng’s dummy site on https://www.xn--80ak6aa92e.com/ appear in several browsers as www.apple.com? He used the Punycode for the Cyrillic lowercase a” – which looked just like the ASCII version in the browser.

Zheng, who exposed the problem in January of this year, reports that it has now been fixed with the recently released Chrome 58.