Funtoo Multilingual Project

From Funtoo
Jump to navigation Jump to search
   Summary
Our goal is to bring Grade A support for using Funtoo Linux in different languages without hassles. The user should just need to change a few centralized settings and it should ”just work”.
   Related Pages
   People

Welcome to the Funtoo Multiligual project! If you'd like to join our effort to bring Grade A support for multiple languages on Funtoo, or if you just want it to work better in your native language, come chat with us on our Discord channel and join the pack!

Introduction

Historically, support any language other than English has not been a central concern of developers, for various reasons. As the need to offer a more friendly interface for users outside the English speaking world, different vendors came up with different solutions, resulting in annoying incompatibilities, even within a relatively small set of characters that would give support to all Western European languages. Even in the CJK world, different standards appeared for each given language and country. And those would support only that Asian language and English. It made it really hard for someone using, say a Japanese system, to type or display German.

With the widespread adoption of Unicode,a good part of the problems brought about by different standards of character encoding went away, but it's still not perfect. The Latin based scripts can be conveniently encoded mostly in 8 bits with UTF-8, but the other scripts were left with the higher code points, so that it still makes sense, for example, to encode Japanese using S-JIS if one needs to save storage space or network bandwidth, as that system can encode the most frequently used characters using considerably less bytes than Unicode would.

Another issue that arises from the use of Unicode as a common encoding system is that it doesn't encode separately Chinese, Japanese and Korean characters. Therefore, the encoding itself is oblivious to what language that character belongs to. This leads to the problem of having a text written in a given language being displayed with some characters that actually belong to a different language. However, there are ways to work around this problem when the underlying system knows what language is supposed to be displayed and choose the correct quality font for that language.

Finally, there is the problem of language input. For most languages the letters on the keyboard will correspond exactly to the characters being inputted. That is not true for more complicated scripts. Those need an additional helper system known as an ”input method engine” (IME). There are multiple different IMEs, each with its advantages and disadvantages and their respective fandoms, so that a minimal number of them needs to be supported to make everyone happy.

CJK Project

As part of the larger project of making Funtoo multilingual, we have a sub-project that deals specifically with concerns related to those languages that need an input method engine for input and fonts with a good coverage of their large character sets. Traditionally, this has been referred to as CJK, which stands for Chinese/Japanese/Korean, which are notorious for their need of additional settings, such as environment variables and services, like the IME itself, until they can get a usable system. In any major distribution today, this represents a major hassle these user need to go through to be able to do mundane activities, such as writing an email or a blog post.

System level support

As non-English speakers we all have our share of frustration with computers that don't speak our language properly. Why do I see a “�” when there should be a ”ç”? Or even worse, “•¶Žš‰»‚¯” instead of “文字化け”? Well, if you understand Japanese, you one of them actually means the other, but I digress. If you use your computer in a Western European language, you will probably say that this kind of issue has gone away with the adoption of Unicode, but CJK users still see them quite often.

When something like this happens, the user starts to search for solutions on the internet, and needs to go to many forums and tutorials until they finally get their system to work in their language. CJK user, in particular, are treated as second class citizens by every major distribution.

Look for example, the tutorial to make your Arch Linux to work with Simplified Chinese and your Gentoo. Even our own wiki currently has (and needs) one of those. That's not the Funtoo way.

Ideally, the Funtoo user should be doing something like this:

root # epro language Chinese
root # emerge -NDauv @world
root # reboot

and voilà, they should reboot into a system fully configured and with full support for Chinese. No tutorial, no environment variables to set, it should just work.

There's a lot to be considered, discussed and thought through before we arrive at the right combination of packages, use flags, environment variables and services a simple set like that would trigger. There is also interaction with the mix-ins. For example, if the user selected KDE and not Gnome, then their IME should probably default to Fcitx, otherwise Ibus. If both were selected, or if for whatever reason they want to have 2 or more IME front-ends, then we should give them an eselect module, which would enable them to choose which IME they want to which desktop environment.

Translation

This is an activity that eventually should become its own project, coupled with a Documentation Project. Ideally this project would be based on a Git repository, where all the translation memories, glossaries, dictionaries, sorted out in separate language projects.

That work needs to be done using a proper tool, which will ensure some quality assurance and consistence in the translations, as well as make it faster and easier for the translator to work.

Currently, there is no such software on Funtoo's or even on Gentoo's tree for that matter. In fact few distributions will offer any good CAT tool, while at the same time offering many different applications specialized on the translation of .po files. The best desktop opensource CAT tool is a Java based software called OmegaT, which should be coupled with another package called Okapi Filters, making it compatible with tens of different formats, including .po and even media-wiki. There are also web based solutions and even machine translation engines.