Difference between revisions of "Funtoo:Multilingual"

From Funtoo
Jump to navigation Jump to search
m (Punctuation)
 
(12 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{Project
{{Project
|summary=Our goal is to bring Grade A support for using Funtoo Linux in different languages without hassles. The user should just need to change a few centralized settings and it should ”just work”.
|summary=Our goal is to bring Grade A support for using Funtoo Linux in different languages without hassles. The user should just need to change a few centralized settings, and it should ”just work”.
|Keywords=dictionary translation cjk input-method i18n l10n LINGUAS locale localization
|Keywords=dictionary translation cjk input-method i18n l10n LINGUAS locale localization
|Project Category=General
|Project Category=General
|leads=adbosco
|leads=adbosco
|members=Madman10k, Omasanori
|members=Madman10k, Omasanori
|subpages=https://www.funtoo.org/Funtoo:CJK
|related pages=Package:IBus, Funtoo:CJK, Fonts, [[FLOP:Foreign_language_support|FLOP]], Funtoo Linux Localization
|related pages=https://www.funtoo.org/Package:IBus
|translate=yes
|translate=yes
}}
}}
Welcome to the Funtoo Multiligual project!  If you'd like to join our effort to bring Grade A support for multiple languages on Funtoo, or if you just want it to work better in your native language, come chat with us on our [https://discord.gg/JxbquMAn Discord channel] and join the pack!
= Introduction =
= Introduction =
Historically, support any language other than English has not been a central concern of developers, for various reasons.  As the need to offer a more friendly interface for users outside the English speaking world, different vendors came up with different solutions, resulting in annoying incompatibilities, even within a relatively small set of characters that would give support to all Western European languages. Even in the CJK world, different standards appeared for each given language and country. And those would support only that Asian language and English. It made it really hard for someone using, say a Japanese system, to type or display German.
Historically, support for any language other than English has not been a central concern of developers, for various reasons.  As the need to offer a more friendly interface for users outside the English-speaking world grew, different vendors came up with different solutions, resulting in annoying incompatibilities, even within a relatively small set of characters that would give support to all Western European languages. Even in the CJK world, different standards appeared for each given language and country. In addition, those would only support that Asian language and English. It made it really hard for someone using, say a Japanese system, to type or display German.


With the widespread adoption of Unicode,a good part of the problems brought about by different standards of character encoding went away, but it's still not perfect. The Latin based scripts can be conveniently encoded mostly in 8 bits with UTF-8, but the other scripts were left with the higher code points, so that it still makes sense, for example, to encode Japanese using S-JIS if one needs to save storage space or network bandwidth, as that system can encode the most frequently used characters using considerably less bytes than Unicode would.
With the widespread adoption of Unicode, a good part of the problems brought about by different standards of character encoding went away, but it's still not perfect. The Latin based scripts can be conveniently encoded mostly in 8 bits with UTF-8, but the other scripts were left with the higher code points. As a result, it still makes sense, for example, to encode Japanese using S-JIS if one needs to save storage space or network bandwidth, as that system can encode the most frequently used characters using considerably fewer bytes than Unicode would.


Another issue that arises from the use of Unicode as a common encoding system is that it doesn't encode separately Chinese, Japanese and Korean characters. Therefore, the encoding itself is oblivious to what language that character belongs to. This leads to the problem of having a text written in a given language being displayed with some characters that actually belong to a different language. However, there are ways to work around this problem when the underlying system knows what language is supposed to be displayed and choose the correct quality font for that language.
Another issue that arises from the use of Unicode as a common encoding system is that it doesn't encode separately Chinese, Japanese and Korean characters. Therefore, the encoding itself is oblivious to what language that character belongs to. This leads to the problem of having a text written in a given language being displayed with some characters that actually belong to a different language. However, there are ways to work around this problem when the underlying system knows what language is supposed to be displayed and choose the correct quality font for that language.
 
Finally, there is the problem of language input. For most languages, the letters on the keyboard will correspond exactly to the characters being inputted. That is not true for more complicated scripts. Those need an additional helper system known as an ”input method engine” (IME). There are multiple different IMEs, each with its advantages, disadvantages and their respective fandoms, so that a minimal number of them needs to be supported to make everyone happy.


Finally, there is the problem of language input. For most languages the letters on the keyboard will correspond exactly to the characters being inputted.  That is not true for more complicated scripts.  Those need an additional helper system known as an ”input method engine” (IME). There are multiple different IMEs, each with its advantages and disadvantages and their respective fandoms, so that a minimal number of them needs to be supported to make everyone happy.
== CJK Project ==
== CJK Project ==
As part of the larger project of making Funtoo multilingual, we have a [[Funtoo:CJK|sub-project]] that deals specifically with concerns related to those languages that need an input method engine for input and fonts with a good coverage of their large character sets. Traditionally, this has been referred to as CJK, which stands for Chinese/Japanese/Korean, which are notorious for their need of additional settings, such as environment variables and services, like the IME itself, until they can get a usable system.  In any major distribution today, this represents a major hassle these user need to go through to be able to do mundane activities, such as writing an email or a blog post.
As part of the larger project of making Funtoo multilingual, we have a [[Funtoo:CJK|sub-project]] that deals specifically with concerns related to those languages that need an input method engine for input and fonts with a good coverage of their large character sets. Traditionally, this has been referred to as CJK, which stands for Chinese/Japanese/Korean, which are notorious for their need of additional settings, such as environment variables and services, like the IME itself, until they can get a usable system.  In any major distribution today, this represents a major hassle these users need to go through to be able to do mundane activities, such as writing an email or a blog post.
== i18n-kit ==
 
While some packages related to languages, such as serif/sans-serif fonts with support for European languages, and maybe the IME front ends themselves clearly belong in the desktop-kit, or maybe in the gnome-kit (app-i18n/ibus) or the kde-kit (fcitx), the majority of fonts specialized on a given language (most of the noto fonts, for example), spellcheck dictionaries (but not the engines themselves, like hunspell, aspell, etc.), unbundled translations when they exist, like in the case of app-office/libreoffice, for example, should move or be added to the i18n-kit.
== System level support ==
As non-English speakers, we all have our share of frustration with computers that don't speak our language properly. Why do I see a “�” when there should be a ”ç”?  Or even worse, “•¶Žš‰»‚¯” instead of “文字化け”? Well, if you understand Japanese, you one of them actually means the other, but I digress. If you use your computer in a Western European language, you will probably say that this kind of issue has gone away with the adoption of Unicode, but CJK users still see them quite often.
 
When something like this happens, the user starts to search for solutions on the internet, and needs to go to many forums and tutorials until they finally get their system to work in their language. CJK users, in particular, are treated as second class citizens by every major distribution.
 
Look at, for example, the tutorial to make your [https://wiki.archlinux.org/title/Localization/Simplified_Chinese Arch Linux to work with Simplified Chinese] and your [https://wiki.gentoo.org/wiki/How_to_read_and_write_in_Chinese Gentoo]. Even [[Package:IBus|our own wiki]] currently has (and needs) one of those. That's not the Funtoo way.
 
Ideally, the Funtoo user should be doing something like this:
{{console|body=
# epro language Chinese
# emerge -NDauv @world
# reboot
}}
and voilà, they should reboot into a fully configured system with full support for Chinese. No tutorial to follow, no environment variables to set, it should just work.
 
There's a lot to be considered, discussed and thought through before we arrive at the right combination of packages, use flags, environment variables and services a simple set like that would trigger. There is also interaction with the mix-ins. For example, if the user selected KDE and not Gnome, then their IME should probably default to Fcitx, otherwise IBus. If both were selected, or if for whatever reason they want to have 2 or more IME front-ends, then we should give them an eselect module, which would enable them to choose which IME they want to which desktop environment.
 
== Translation ==
This is an activity that eventually should become its own project, coupled with a Documentation Project. Ideally, this project would be based on a Git repository, where all the translation memories, glossaries, dictionaries, sorted out in separate language projects.


Generally speaking, I believe that if a users chooses not to use the i18n-kit, they should have support at least for the major European languages, such as English, Spanish and French, and if they need support for ”more complicated” (e.g. Chinese) or ”minor” languages (e.g. Welsh), then they will find them in the i18n-kit.
That work needs to be done using a proper [https://en.wikipedia.org/wiki/Computer-assisted_translation|CAT tool], which will ensure some quality assurance and consistence in the translations, as well as make it faster and easier for the translator to work.


Anything packages related to translation (e.g. CAT tools) or language learning are also good candidates to go into the i18n-kit.
Currently, there is no such software on Funtoo's or even on Gentoo's tree for that matter. In fact, few distributions will offer any good CAT tool, while at the same time offering many different applications specialized on the translation of .po files. The best desktop open source CAT tool is a Java-based software called OmegaT, which should be coupled with another package called Okapi Filters, making it compatible with tens of different formats, including .po and even media-wiki. There are also web based solutions and even machine translation engines.
{{ProjectFooter}}
{{ProjectFooter}}

Latest revision as of 21:53, January 20, 2024

   Summary
Our goal is to bring Grade A support for using Funtoo Linux in different languages without hassles. The user should just need to change a few centralized settings, and it should ”just work”.
   People

Welcome to the Funtoo Multiligual project! If you'd like to join our effort to bring Grade A support for multiple languages on Funtoo, or if you just want it to work better in your native language, come chat with us on our Discord channel and join the pack!

Introduction

Historically, support for any language other than English has not been a central concern of developers, for various reasons. As the need to offer a more friendly interface for users outside the English-speaking world grew, different vendors came up with different solutions, resulting in annoying incompatibilities, even within a relatively small set of characters that would give support to all Western European languages. Even in the CJK world, different standards appeared for each given language and country. In addition, those would only support that Asian language and English. It made it really hard for someone using, say a Japanese system, to type or display German.

With the widespread adoption of Unicode, a good part of the problems brought about by different standards of character encoding went away, but it's still not perfect. The Latin based scripts can be conveniently encoded mostly in 8 bits with UTF-8, but the other scripts were left with the higher code points. As a result, it still makes sense, for example, to encode Japanese using S-JIS if one needs to save storage space or network bandwidth, as that system can encode the most frequently used characters using considerably fewer bytes than Unicode would.

Another issue that arises from the use of Unicode as a common encoding system is that it doesn't encode separately Chinese, Japanese and Korean characters. Therefore, the encoding itself is oblivious to what language that character belongs to. This leads to the problem of having a text written in a given language being displayed with some characters that actually belong to a different language. However, there are ways to work around this problem when the underlying system knows what language is supposed to be displayed and choose the correct quality font for that language.

Finally, there is the problem of language input. For most languages, the letters on the keyboard will correspond exactly to the characters being inputted. That is not true for more complicated scripts. Those need an additional helper system known as an ”input method engine” (IME). There are multiple different IMEs, each with its advantages, disadvantages and their respective fandoms, so that a minimal number of them needs to be supported to make everyone happy.

CJK Project

As part of the larger project of making Funtoo multilingual, we have a sub-project that deals specifically with concerns related to those languages that need an input method engine for input and fonts with a good coverage of their large character sets. Traditionally, this has been referred to as CJK, which stands for Chinese/Japanese/Korean, which are notorious for their need of additional settings, such as environment variables and services, like the IME itself, until they can get a usable system. In any major distribution today, this represents a major hassle these users need to go through to be able to do mundane activities, such as writing an email or a blog post.

System level support

As non-English speakers, we all have our share of frustration with computers that don't speak our language properly. Why do I see a “�” when there should be a ”ç”? Or even worse, “•¶Žš‰»‚¯” instead of “文字化け”? Well, if you understand Japanese, you one of them actually means the other, but I digress. If you use your computer in a Western European language, you will probably say that this kind of issue has gone away with the adoption of Unicode, but CJK users still see them quite often.

When something like this happens, the user starts to search for solutions on the internet, and needs to go to many forums and tutorials until they finally get their system to work in their language. CJK users, in particular, are treated as second class citizens by every major distribution.

Look at, for example, the tutorial to make your Arch Linux to work with Simplified Chinese and your Gentoo. Even our own wiki currently has (and needs) one of those. That's not the Funtoo way.

Ideally, the Funtoo user should be doing something like this:

root # epro language Chinese
root # emerge -NDauv @world
root # reboot

and voilà, they should reboot into a fully configured system with full support for Chinese. No tutorial to follow, no environment variables to set, it should just work.

There's a lot to be considered, discussed and thought through before we arrive at the right combination of packages, use flags, environment variables and services a simple set like that would trigger. There is also interaction with the mix-ins. For example, if the user selected KDE and not Gnome, then their IME should probably default to Fcitx, otherwise IBus. If both were selected, or if for whatever reason they want to have 2 or more IME front-ends, then we should give them an eselect module, which would enable them to choose which IME they want to which desktop environment.

Translation

This is an activity that eventually should become its own project, coupled with a Documentation Project. Ideally, this project would be based on a Git repository, where all the translation memories, glossaries, dictionaries, sorted out in separate language projects.

That work needs to be done using a proper tool, which will ensure some quality assurance and consistence in the translations, as well as make it faster and easier for the translator to work.

Currently, there is no such software on Funtoo's or even on Gentoo's tree for that matter. In fact, few distributions will offer any good CAT tool, while at the same time offering many different applications specialized on the translation of .po files. The best desktop open source CAT tool is a Java-based software called OmegaT, which should be coupled with another package called Okapi Filters, making it compatible with tens of different formats, including .po and even media-wiki. There are also web based solutions and even machine translation engines.