Category Archives: internationalization

Fedora 28 and GNOME 3.28: New Features for Eastern Europe

This time this is not fake, edited, patched, nor a custom build from COPR but the real screenshots of the unmodified downstream Fedora 28 planned to be released on May 1 this year. Here is how the default calendar widget in GNOME Shell looks in Greek, Polish, and Ukrainian:

For those who can’t speak those languages: the major change here is that the month names are displayed in a correct grammatical form, both in dates and standalone. This is a new feature, or rather a new bugfix, in GNOME 3.28 which has been released on March 14 and pushed to Fedora 28 (prerelease) stable updates today. The series of bugfixes in GNOME was preceded by the similar bugfix in glibc 2.27 released earlier this year.

What Is Eastern Europe

This term must be explained because it is ambiguous. Usually when we say eastern Europe we mean the eastern end of our continent (as opposed to western, northern, southern, and, last but not least, central). But in this context I mean the eastern half of Europe (as opposed to western, and nothing else). I often strongly emphasize that this feature is not just for Slavic languages but also for other language groups of our region: Baltic, Greek, partially also Finnish, and even some western languages like Catalan or Scottish Gaelic.

More Applications

Of course, dates are now displayed correctly in all applications, not just GNOME Shell. In most of them this happened automagically. Few of them, however, needed some minor updates to make sure that the month names are displayed in a genitive case only where needed, not just everywhere. Here is an example of a correct month names display in GNOME Calendar, this time in Croatian:

Please note the difference between the nominative name for March (ožujak) and its correct genitive case as used in date (ožujka; literally: of March).

Western European Languages

English does not have any unsupported features but, while at this, I have examined the date displays in some other western European languages and few features were not supported. For example, some Romance languages (Spanish, Portuguese, etc.) also use the genitive case of both the month name and the year number but they construct it just adding the de preposition before. This feature although so simple was not yet supported so far but now it has been added to GNOME 3.28. Here is a screenshot of the same calendar widget in Spanish:

Please note the correct header saying diciembre de 2017 as opposed by the incorrect diciembre 2017 which is displayed by the older versions.

More Languages

The genitive case of month names is currently supported in Fedora 28 prerelease in only 7 languages: Belarusian, Croatian, Greek, Lithuanian, Polish, Russian, and Ukrainian. But the support of more languages is on the way: Catalan and Czech have been added to GLib and they are already used if the latest GNOME is ran on older systems. The support of these languages has been also pushed to glibc upstream and eventually will reach Fedora 28 but has not yet as of today. However, it has already reached Fedora Rawhide. If we have this chance, let’s take a look at the screenshot of GNOME in Fedora Rawhide in Catalan:

Please note the correct Catalan preposition of genitive case: de març (of March) vs. d’abril (of April).


I’d like to thank all the people from Fedora and GNOME communities and from the outer world who supported me in this challenge: Piotr Drąg, Mike Fabian, Zack Weinberg, Carlos O’Donell, Masha Leonova, Ihar Hrachyshka, Dmitry Levin, Igor Gnatenko, Charalampos Stratakis, Robert Buj, Philip Withnall, and more.

PS. If some date formats in these screenshots are incorrect please approach the respective translation teams.

Some Bugs Are Really Funny

People learn from errors. Therefore bugs should be made public, not hidden.

GLib is a utility library originally developed for GNOME but also used by other projects. One of many functions it provides is g_date_set_parse(). It is really smart and simple. It accepts a string to parse but other than many date parsing functions it does not require a date format to be passed. Instead it tries to find numbers and month names in the parsed string and figure out what date they can represent. Of course, month names are recognized according to the current locale.

Let’s see how it works for Polish. Here is the list of month names:

Month # Full name Abbreviated name
1. Styczeń Sty
2. Luty Lut
3. Marzec Mar
4. Kwiecień Kwi
5. Maj Maj
6. Czerwiec Cze
7. Lipiec Lip
8. Sierpień Sie
9. Wrzesień Wrz
10. Październik Paź
11. Listopad Lis
12. Grudzień Gru

The loop implementing the algorithm iterates over all months and checks if the string being parsed contains a full or abbreviated month name as a substring. The first month which is found as a substring of the parsed string is recognized as a result. Let’s see what happens when a string containing the 9th month, September, which in Polish is wrzesień, is parsed by this algorithm:

Iteration # Full name Abbreviated name Does the string wrzesień contain it?
1. Styczeń Sty No
2. Luty Lut No
3. Marzec Mar No
4. Kwiecień Kwi No
5. Maj Maj No
6. Czerwiec Cze No
7. Lipiec Lip No
8. Sierpień Sie Yes: wrzesień!

So, as a result, the string wrzesień (September) is recognized as sierpień (August).

Is this severe at all?

To be honest, not really. The bug seems to have been around for 20 years now and nobody has complained so far. Parsing dates is not really useful. There are many good reasons why it may not work in localized texts, like incomplete or incorrect translations, varying orthographic rules, Unicode characters updates, etc. Probably no real applications actually use this.

Nevertheless, the problem has been reported to GNOME Bugzilla and will be worked on.

Internationalization FAD, Pune 2017

For the second time in a short period of time I participated in an important Fedora event. November 20–22, 2017, an Internationalization FAD was organized by a group of Fedora contributors from Red Hat Pune. FAD stands for Fedora Activity Day, it is a mini-conference. It differs from large conferences like Flock because it is attended by small number of people and it is focused on one subject.

Day #0

November 19, 2017

Actually I should write Day #-1 (November 18) and Day #0 because my as well as some other attendees’ travel lasted more than 24 hours. Due to the time zone difference and all the mess it’s difficult to define when a day ended and when it began. In general, the travel went smoothly and without any problem except one: I spent 1.5 hours in a huge queue to the immigration desk at the Mumbai Airport. Somewhere far behind me there was Mike Fabian and even further behind him there were Takao Fujiwara, Akira Tagoh, and Peng Wu, who arrived little later than me.

The long queue in Mumbai Airport

The long queue in Mumbai Airport

I really don’t know why it took so long. Probably because several large jumbo jets with many foreign tourists arrived in a short time. The immigration officers worked rather fast and without unnecessary delays. However, we all met and left Mumbai only after 4 AM local time and we reached our hotel in Pune before 8 AM. Big shouts to Sundeep Anand and Parag Nemade who despite the night and the weekend were contacting us online all the time, giving us advices and making sure that we were OK.

Our first day in India must have been spent on taking some rest after the journey. The hotel turned out to be very comfortable. Parag perfectly organized our time: first he let us take as long rest as we wanted and then in the afternoon he took us for a Red Hat office visit. That was my first Red Hat office visit ever so everything was impressing for me. A brand new office building, some places still being finished, everything in a perfect order.

Day #1

November 20, 2017

The actual first day of the FAD was for presentations. It started with an official opening and self-introductions.

Opening and self-introductions

Opening and self-introductions: Jens Petersen, Pooja Yadav, and Pravin Satpute.

Next everyone had an opportunity to present their current works. It turns out that each of us works on a tasks which are personally familiar. Takao Fujiwara, Akira Tagoh and Peng Wu work on rendering (Pango library) and input (IBus) of the text in East Asian languages. Unfortunately, I know almost nothing about these languages so I don’t understand much of their work – except obvious things like that it’s more complex than in European languages and needed for their speakers. But, on the other hand I spoke about my current work on formatting dates in inflected languages. Each time I talk about it to the foreign people I have a feeling that the audience don’t know that I’m talking about. I guess that time it was the same.

My talk about formatting dates. Photo credits: Jens Petersen.

My talk about formatting dates. Photo credits: Jens Petersen.

Inflection is an original feature of Proto-Indo-European language which disappeared totally or almost totally in most of the contemporary Indo-European languages. However, it still exists in Slavic and Baltic languages, also in Greek, Sanskrit and several more. But this diversity of the discussed topics only means that the term “Internationalization” is very broad, it includes features local to some groups of languages. There is a place for both inflected languages and logographic scripts and more phenomenons than you can think of.

There were more familiar for me topics discussed as well by Mike Fabian with whom I have been working directly since July this year on the maintenance of locale data in glibc project, and Jens Petersen who works on improving the localization support in Fedora (separation of translation packages from the main software packages, installing them depending on the languages chosen by the administrator etc.)

It’s nice that Mike Fabian, Takao Fujiwara and others work on a better support (input and displays) of emojis in Fedora.

Transtats project is getting more and more interesting. While at this, I learned that Sundeep Anand is not working on it alone. FAD was attended by several people from the Quality Assurance team of Red Hat who support him. Those people also actively test other projects, like IBus and East Asian fonts.

Day #2

November 21, 2017
Working on our projects

Working on our projects. Photo credits: Jens Petersen.

The second and the third day were meant for the common work on our project. Most of the time I spent working with Mike Fabian. Despite my initial plans we were neither working on my project of formatting dates in inflected languages nor on the automatic locale data import from CLDR to glibc. Mike says that my work is basically completed and we can’t add anything more, we can only wait for more positive reviews. Instead of this we were working on fixing the collation orders in Latvian and Polish, the nearest plans include more languages, like Czech and Upper Sorbian. It’s a really hard and dirty work. In most of the languages there are established rules of collation order of the letters of their proper alphabets but what should we do if there are foreign letters? Language scientists are free to say “this is unlikely to happen” or “we don’t define how to handle this” but we developers must be able to handle every Unicode string. Moreover, some languages have really unusual collation rules. Usually the rules say that we should compare the letters starting from the beginning and towards the end. If there is a difference between letters it determines the collation order. If the letters differ in the diacritical marks only then some languages treat them like different letters and some like the same letters. But in French language there is, or rather there was a rule saying that if two words differ in diacritical marks only then for the collation order we must take the diacritics… counting from the end of the word! This rule is so weird that finally it has been rejected from most of the French variants but it is still in use in Canadian French. How to deal with this? But Mike has managed to fix it.

We were talking with Jens about the Fedora bug 1401096. While installing Fedora Workstation you can select the user interface language but the localization packages are not installed because they are missing from the installation disk. They must be downloaded from the net. This problem does not occur with the network installation which by definition downloads the packages. I think that we need a way to mark in the package management system that some packages are required and they should be installed in future, as soon as the network becomes available. It’s crucial that I understood the problem because in the past I contributed to gnome-software (and I still hope to contribute in future) and I think this is a task for that project or rather to powering it PackageKit.

Another unplanned task which we had together with Mike and Pravin Satpute was adding the Filipino language to Fedora. Actually all we had to do was to coordinate some tasks because most of them had been finished already or must wait until at least one application translation is ready.

After this hard working day we spent the evening bowling and having BBQ at Amanora Mall. We also celebrated Takao Fujiwara’s birthday.

Day #3

November 22, 2017

The last day of FAD was similar to the second one: we were working on our projects. Also I continued yesterday’s works with Mike. Besides this, I files some more suggestions of changes in glibc:

In the afternoon there was also the Fedora 27 Release Party. How was it? There came more people working in the same office and a large cake with the beautiful printed image was put on the table.

I have a feeling that the Release Party was dominated by us, the FAD attendees. The organizers asked the oldest of us, that means Mike, Jens Petersen and myself to cut the cake. It was really yummy!

That was, unfortunately, my last (so far!) day in India. I warmly thank the organizers for all their help, mostly I thank Satyabrata Maitra but also Parag Nemade, Pravin Satpute and Sundeep Anand. I really regret that I couldn’t stay longer.

Day #4

November 23, 2017

Most of that day I spent traveling which went absolutely without any problem. See you online or in real life! नमस्ते!

Linux Autumn 2017


Linux Autumn is an annual Polish conference dedicated to the free software and GNU/Linux. This year it was its 15th edition and this time it was held in Muflon Leisure Center in Ustroń.

Shortly speaking: the conference was interesting but my participation was limited due to a virus¹ attack.

Day #1: September 22

Not much has been planned for that day because the attendees were only arriving. The event started at 4 PM and the first speaker was Igor Gnatenko from Red Hat. He talked about the dependencies between the packages, especially about the new kinds of dependencies added in RPM 4.14. I was a little late to this talk but thanks to YouTube I know how it was like and I must admit that it was interesting. I like the idea of a talk which focuses on a small subject which do not requires advanced skills to understand it and at the same time provides important information to the attendees. It’s very worth to be mentioned here as it was the only talk in English:

The second speaker was myself. I talked about preparing an application for internationalization and avoiding typical errors. How it was – you should judge on your own. Unfortunately, this talk and all others were in Polish and English translations do not exist so I don’t provide links here.

My talk about preparing an application for internationalization. Photo: Igor Gnatenko.

In the evening there was a dinner and long conversation about professional and non-professional subjects.

Day #2: September 23

In the morning I woke up with a sore throat and I knew that the conference is actually over for me. Luckily, I had given my talk the previous day when I still had felt good. Despite this I pulled myself together and attended all talks. I’d like to mention two most interesting ones in my opinion. The first was Maciej Nabożny‘s talk about his libdinemic project. In his talk he included many subjects like cryptography, certificates, but first of all Maciej comprehensibly explained how blockchain works and how it powers bitcoin. The second talk was by Dariusz Puchalak about OpenSSH, Ansible and other network tools. Usually I’m less interested in administrative stuff than programming but Dariusz’ talk was really zestful and impressed me. I recommend his talks to everyone, he is a really great speaker.

Piotr Kliczewski from Red Hat talks about oVirt

Day #3: September 24

So this was really the end for me. In the night I had a fever, shortly after the breakfast I packed my things, said goodbye and went back home. I wish I could recommend you watching the videos on YouTube, unfortunately they are mostly in Polish. Please come next year, the more foreign speakers and attendees we have the more likely we switch to English.

PS. Regarding the virus, as it usually happens, the next day I felt much better and two days later I was quite good.

¹ Virus: a biological structure similar to but unrelated with computer viruses. They attack the cells of living organisms and are totally safe for computers.

How Polish Plurals in MATE Went Broken

On March 13, 2017 the new version 1.18 of MATE Desktop was released. One of the last minute changes in the project was pulling the most recent translations from Transifex. Usually this is a good thing but apparently for the Polish language this turned out to be a little disaster because the plural rules have been (incorrectly) changed.

Plural rules

Foreign readers deserve an explanation here. Polish plural rules (as well as of several other Slavic languages) are a little more complex than English. There are three forms required:

  • 1 – singular – that’s obvious and similar to English and other Indo-European languages.
  • 2, 3, 4, and anything ending with 2, 3, 4 except 12, 13, 14 (for example: 22, 23, 24, 32, 33, 34 and so on). This group is sometimes referred to as few in some internationalization toolkits.
  • everything else (5 and greater except the numbers mentioned above). This group is sometimes referred to as many.

Plurals support in gettext package is good and complete. All we need is to write the correct rules in the header of a *.po file. This task should be done once and the rules can be reused for every translation into the same language because the grammar rules don’t change often, we can safely assume that they never change. Usually for Polish translations we use this formula:

"Plural-Forms: nplurals=3; plural=(n==1 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2);n"

This expression is neither simple nor complex. Just sufficient to describe what the language needs.

Here comes the disaster

On March 13 the commit synchronizing translations from Transifex changed the plural rules for Polish language. The new formula is:

“Plural-Forms: nplurals=4; plural=(n==1 ? 0 : (n%10>=2 && n%10<=4) && (n%100<12 || n%100>=14) ? 1 : n!=1 && (n%10>=0 && n%10<=1) || (n%10>=5 && n%10<=9) || (n%100>=12 && n%100<=14) ? 2 : 3);n" [/code] Now this is complex, isn't it? What's wrong with this expression:

  • it states that Polish language needs 4 forms to support plurals which is not true;
  • it is unnecessarily complex: if the expression states that n==1 belongs to the group 0 there is no need to make sure that n!=1 in the further part;
  • the complexity leads to one actual bug: the second group includes all numbers which end with 2, 3, 4 (correct), except 12 and 13 (incorrect, 14 must be excluded as well);
  • the result 3 is unreachable which is correct but confusing for translators.

As MATE Desktop is a large project consisting of multiple applications (like Caja file manager, Pluma text editor etc.) the same happened to every single application of the project.

Difficult to fix

The bug has been reported to the upstream immediately. The MATE project maintainres responded that the bug came from Transifex: it is pointless to fix it in the MATE source code repository because the next pull will overwrite the fix.

Unfortunately, it is not so easy to file a ticket in Transifex. It does not have Bugzilla nor any other ticket system. However, some people managed to contact Transifex team. They responded that they have pulled the plural rules from CLDR which lists 4 plural forms for the Polish language although they admitted that assigning the number 14 to the few plural group is their fault and fixed this. As MATE project continues pulling translations from Transifex more and more of their applications will start handling the number 14 correctly. Some of the applications have been updated recently, the update is a part of the 1.19 development release.

What CLDR says

Let’s look what CLDR database says about the Polish plural rules. Indeed, it lists 4 groups and there is a mysterious v parameter which has something in common with fractions because the sample expressions display the fractional forms. But as gettext supports integer values only we should drop the fractional cases totally.

The documentation of that v parameter is difficult to find but as soon as you find it you can read it means number of visible fraction digits in n, with trailing zeros. In this sentence, n is the number controlling the plural form itself.

Other languages

CLDR provides additional forms for fractions for other languages as well: Czech, Manx, Russian, Slovak, Ukrainian. For some other languages (Bosnian, Croatian, Filipino, Macedonian, Serbian, Lower and Upper Sorbian) the rules seem to be even more complex: fractional values belong to multiple integer groups.

This should be a warning for other languages that their rules might have been broken in Transifex as well. However, the further investigation of MATE Desktop source code does not reveal any recent changes in plural rules of other languages.


It seems that pulling plural rules from CLDR automatically is not a good idea.

Translators and language coordinators: please make sure that your plural rules are correct.

Transifex and other translation platforms: please don’t pull the translation rules from CLDR without a thorough analysis. Better ask the language communities and reuse the existing rules.

CLDR: please simplify your plural expressions and make the documentation of fractions support easier to access.

glibc 2.26: New and Updated Locales

On August 2, 2017 glibc (The GNU C library) version 2.26 has been released. Among others, many issues related with supported locales have been addressed, most of them shortly before the release. Let’s see what has been changed.

New locales

Compared to the previous version, this release introduces the support of 6 new languages: Aguaruna, Bislama, Fiji Hindi, Samoan, Tok Pisin, and Tongan as well as 2 new variants: South Azerbaijani for Iran, and Maithili for Nepal.

Aguaruna is a language spoken by about 38,000–45,000 indigenous people in Peru. Bislama is an official language of Vanuatu although spoken by about 10,000 people only. Fiji Hindi is a language descending from although different than Hindi. It is spoken by about 300,000 citizens of Fiji which makes about ⅓ of its total population and is one of the official languages of the country. It is written using both the Latin and the Devanagari script. This release introduces the Latin script only but Devanagari is also considered to be introduced in future. Tok Pisin is one of the official languages of Papua New Guinea. Although spoken by only 120,000 native speakers which makes 1.7% of total population it is the most widely used language of the country. No wonder since Papua New Guinea features about 850 native languages.

South Azerbaijani is a variant of Azerbaijani language spoken by about 13 million people (16% of total population) in Iran and Maithili is spoken by about 3 million people (11.5% of total population) in Nepal. Both have been previously represented by their variants for Azerbaijan and India, respectively. Now their users may enjoy more granularity.


Bugs in alphabetic sorting in Hungarian and Malayalam (see also: here) have been fixed. But lots of other fixes have been introduced in date and time elements, mostly in month names. Typos in either full or abbreviated or both names have been fixed, among others, in Arabic (many variants), Belarusian, Breton, Friulian, Hindi, Kannada, Konkani, Malayalam, Marathi, Mongolian, Northern Sami, Serbian (Latin only), Spanish (Peru and Uruguay), Uzbek, Yoruba, Zulu — total of 55 languages have been updated to the content of CLDR version 31. Weekday names have been updated in Arabic, Chechen, and Kashmiri — Saudi Arabian users had them displayed in English so far. Yes and no translated strings have been added or fixed in many languages.

Incorrectly appended trailing spaces have been removed in several locales, usually from weekday names. They mainly include languages of India but also Albanian (where the issue has been first spotted), Haitian, Maltese, and more. This change will polish date formatting in these locales.

Unicode 10.0

This version also introduces the full support of Unicode 10.0. The changes are mainly focused on new emoji characters.

It’s worth mentioning that the full Unicode 10.0 support has been added to glibc only 2 days after its official release by the Unicode Consortium.