BLOG

BLOG

What is it?

Punycode is a system for converting words that can’t be written in ASCII (American Standard Code for Information Interchange), such as Ancient Greek. The phrase ΓΝΩΘΙΣΕΑΥΤΟΝ (“know yourself”), once converted into an ASCII characters, looks like this: xn--mxadglfwep7amk6b.

This conversion system allows International Domain Names (IDNs), which include non-ASCII characters, to be displayed using only the Roman letters A to Z, the digits 0 to 9 and the hyphen (-) character.

Punycode is useful, because the world-wide Domain Name System (DNS), which turns readable server names into computer-friendly network numbers, can only recognise the limited subset of ASCII characters in domain names.

Some of the letters in the Roman alphabet are the same shape as letters in the Greek, Cyrillic and other alphabets. Examples are: the letters I, E, A, Y, T, O and N.

Our experts explain

We all know to check for the little green padlock when browsing websites, because it lets us know that the site has TLS encryption and no one will be able to eavesdrop on any data we submit, particularly when making purchases or doing banking. However, a malicious site that can imitate a legitimate URL and display that padlock leaves us with very few ways to tell if we are being tricked by an imposter. Attackers who trick people into loading the fake page could more easily obtain personal information because the site appears to be trustworthy.

An Imperfect Industry Standard

Many years ago, the Internet Corporation for Assigned Names and Numbers (ICANN) allowed non-ASCII (Unicode) characters to be included in web domains. It didn’t take long for them to realise that this decision was going to cause problems. Certain characters from different languages can be confused for Unicode, since they look the same when displayed in a browser. This could be used as a tool by cyber criminals to spoof URLs and target unsuspecting victims.

To counteract the issue, ICANN developed ‘Punycode’ as a way of specifying actual domain registrations by representing Unicode within the limited character subset of ASCII used for internet host names. The idea was that browsers would first read the Punycode URL and then transform it into displayable Unicode characters inside the browser.

However, just like with Unicode, Punycode could also hide phishing attempts using characters found in different languages. To combat this, Web browser vendors introduced add-on filters to render URLs as Punycode, instead of Unicode, if they contained characters from different languages.

Everyone thought this would stop URL substitution, however, a security researcher called Xudong Zheng managed to recently find a glitch in the matrix.

Punycode Problems

By default, many web browsers use Punycode encoding to represent unicode characters in the URL to defend against Homograph phishing attacks (where the website address looks legitimate, but is not, because a character or characters have been replaced deceptively with Unicode characters). For example, the Chinese domain .co” is represented in Punycode as “xn--s7y.co” and the German city of “München” becomes the Punycode “xn--mnchen-3ya” because the letter ü is not available in English.

Note: You can convert text on a site like Punycoder to see how other names are converted.

According to Zheng, the loophole means that if someone chooses a domain name where all characters are from a single foreign language character set, then browsers will render it in that language, rather than in Punycode format. This is dangerous when all of the characters selected from the foreign character set resemble the characters of the targeted domain, as they will appear to be identical when rendered in browsers.

There are quite a few Unicode characters represented in alphabets such as Greek, Cyrillic, and Armenian, which look almost identical to Latin letters at a glance, but are treated very differently by computers when resolving the different web addresses. For example, Cyrillic “а” (U+0430) and Latin “a” (U+0041) are both treated differently by browsers, but are displayed as “a” in the browser address.

Zheng registered the domain “xn--80ak6aa92e.com”, a Cyrillic domain name. Because he used the Cyrillic “a” rather than the ASCII “a”, some browser defenses failed and displayed the URL as “аррӏе.com” when converted back from Punycode to “Russian” text.

Note: The ‘xn’ prefix is known as an ‘ASCII compatible encoding’ prefix, which indicates that the browser uses ‘Punycode’ encoding to denote Unicode characters.

Apple Safari, Microsoft Edge and Internet Explorer don’t fall for the trick domain, and simply display it as plain old xn--80ak6aa92e.com (provided your system settings don’t include any Cyrillic languages).

Figure 1: Image sourced from Naked Security

Whist Google Chrome, Firefox and Opera won’t routinely decode Punycode URLs if there is a combination of multiple alphabets or languages (as those text strings are extremely unlikely in real life and therefore suspicious), they will auto-convert Punycode URLs that contain all their characters in the same language, like this:

Figure 2: Image sourced from Naked Security

These browsers are therefore vulnerable to a Punycode phishing attack, as the user will think they are on the legitimate apple.com website.

Zheng reported this issue to the affected browser vendors on 20th January 2017.

Try it yourself

Test it on your own browser. Copy and Paste xn--80ak6aa92e.com into the Address Bar of your browser and press ENTER.

If your web browser is displaying “apple.com” with a security certificate in the address bar, however you did not end up on Apple’s website, then your browser is vulnerable to a homograph attack. If an attacker had cleverly replicated Apple’s website, instead of displaying the “hey there” message, would you have noticed that you are not on the official Apple website?

Google has already patched the vulnerability with the release of Chrome Stable 58, launched at the end of April. Firefox programmers, in contrast, are extremely reluctant to implement any kind of protection, because “the Mozilla Foundation’s desire is to avoid favouritism, and to treat all languages equally, this sort of protection is culturally insensitive and technically undesirable.” They believe that the responsibility for preventing “confusables”, lies with the registrars of each top-level domain.

Until such time as Mozilla provide a patch for Firefox, millions of Internet users are at risk of this sneaky type of phishing attack. Our experts at FraudWatch International recommend that users disable Punycode support in their web browsers, which will provide temporary protection against this attack and will identify any related phishing domains.

Stay tuned next week as we learn the steps involved in preventing and protecting yourself from online Punycode phishing attacks, including setting Firefox to display Punycode names, as well as security awareness tips to avoid becoming a victim.