No wonder that web developers have overlooked some of the nuances. On the other hand, attackers can use the features of Unicode in the purposes that do.
Related Unicode bugs have such property that they can be found in any application that processes text entered by the user. There are vulnerabilities in web applications and native applications on Android and iOS. One of the most famous was the iOS from 2015, when several Unicode characters in a text message caused a crash of the operating system. Last year, a similar uniatowski bug found in iOS 11.3, it is known as “black point”. A similar failure occurred in the WhatsApp application for Android, if you touch Emoji.
Unicode — standard character encoding that includes the characters of nearly all written languages in the world. Its use began after it became clear that different languages need different character encodings and therefore need to be put together. Encoding is called the representation of numbers, letters and other characters in computer memory and in a language they understand. Encoding are different, such as, for example,
ISO-8859-1but over time their use became uncomfortable because firstly, to correctly display characters of different languages need to use different encodings.
And secondly, a numerical representation of a character can be the same for different letters in different languages. For example, the binary representation
cp1251 is the letter “I”, but at the same time in the encoding
ISO-8859-1 this is the German Eszett. With the advent of
Unicode the situation has improved and now all the letters and symbols of all languages in the world are in one huge table.
Unicode is the standard by which symbols are connected with a certain numeric value, and for the representation of these numbers is elaborately
Unicode-coding, the most common of which were
Information is provided for informational purposes only. Do not break the law!
What is it? Together with ease of use
Unicode there are new opportunities for criminals. Many of you know or heard about
DNS stuffing. For
Unicode uses the same method, only in this case, the original characters are replaced by identical or most similar from other languages.
For example, in the address
"a" you can replace
"a"but already Russian, and visually they look identical. The problem is not new, because before it was possible to introduce users to the error and using
ASCIIcode. For example, when writing an address
"l" the attackers changed to
"I"which, depending on the font used, does not visually differ.
омографwhen the words look the same in spelling but have different pronunciation. A similar story happened with PayPal. These methods sluffing aimed solely at users, because if you type the address on the keyboard, then be wrong could be difficult, but people like to open links that are sent via e-mail or in any other way. And what
URL at the open, when you click on the link of the website the user may not notice.
The second method is a bit similar to the previous — use
Punycode. The fact that A-record DNS to allow only English characters, digits, and hyphen. But if there is a need to use symbols from another language, for example, using a domain name in Russian language
пример.рфit is necessary
pentestit.ru encoded will look like
pеntеstit.ruwhere , with the help
Punycode will use Russian letters
"е" — how
These are criminals, forcing the visitors to follow links to malicious website where the domain doesn’t have any obvious typos since replaced the letter most similar to the original. Also it works in the opposite direction, when the Russian letter
"о" replaced by Latin, or even Greek
"омикрон"that is very similar to the original.
Turned on its head
A common option used by attackers is the use of “turncoats”. The fact that
Unicode supports all languages and some require you to write not left to right and Vice versa. For this
U+202E: right-to-left-overridethat just expands the inscription. This is actively used by attackers who, for example,
gpj.exe turn in
exe.jpg in the file name. The user, seeing the extension
.jpglaunched file with malicious code inside.
This can be done in the following way:
- Create a file with the extension
- Find and copy the symbol
- Insert the symbol at the beginning of the file name when it is modified. Therefore, the file
gpj.exeit will appear as
In addition to social engineering, a feature of Unicode is used to bypass protection from hacker attacks, for example, WAF.
If you go down to the level of applications or OSes, here to show bugs in incorrectly constructed the algorithms related to the conversion — normalization is a bad, overly long UTF-8, removing and eating the symbols, incorrect character conversion, etc. This all leads to a wide range of attacks, from XSS to remote code execution.
In General, in terms of fancy Unicode does not restrict you, but rather only supports. Many of the above attacks are often combined, combining the bypass filter with the attack on a specific target. Combining business with pleasure, so to speak. Moreover, the standard is not in place and who knows what will the new extensions, because there were those who later were removed due to safety concerns.
So, as you know, the problems with Unicode are still problems number one and the reasons for the disparate attacks. But the root of evil here is one of misunderstanding or ignoring the standard. Of course, even the most famous vendors this sin, but it is not supposed to relax. On the contrary, you should think about the scale of the problem. You have already made sure that Unicode quite tricky and wait for the catch, if you give slack and don’t look in the standard. By the way, the standard is updated regularly and therefore do not rely on ancient books or article — outdated information is worse than its absence. But I hope that this article has not left you indifferent to the problem.
Source: https://defcon.ru/ and https://xakep.ru/