Introduction

Historically, support any language other than English has not been a central concern of developers, for various reasons. As the need to offer a more friendly interface for users outside the English speaking world, different vendors came up with different solutions, resulting in annoying incompatibilities, even within a relatively small set of characters that would give support to all Western European languages. Even in the CJK world, different standards appeared for each given language and country. And those would support only that Asian language and English. It made it really hard for someone using, say a Japanese system, to type or display German.

With the widespread adoption of Unicode,a good part of the problems brought about by different standards of character encoding went away, but it's still not perfect. The Latin based scripts can be conveniently encoded mostly in 8 bits with UTF-8, but the other scripts were left with the higher code points, so that it still makes sense, for example, to encode Japanese using S-JIS if one needs to save storage space or network bandwidth, as that system can encode the most frequently used characters using considerably less bytes than Unicode would.

Another issue that arises from the use of Unicode as a common encoding system is that it doesn't encode separately Chinese, Japanese and Korean characters. Therefore, the encoding itself is oblivious to what language that character belongs to. This leads to the problem of having a text written in a given language being displayed with some characters that actually belong to a different language. However, there are ways to work around this problem when the underlying system knows what language is supposed to be displayed and choose the correct quality font for that language.

Finally, there is the problem of language input. For most languages the letters on the keyboard will correspond exactly to the characters being inputted. That is not true for more complicated scripts. Those need an additional helper system known as an ”input method engine” (IME). There are multiple different IMEs, each with its advantages and disadvantages and their respective fandoms, so that a minimal number of them needs to be supported to make everyone happy.

Funtoo Multilingual Project

Introduction

Navigation menu

Search