Category Archives: glibc

Fedora 28: Updates for Czech, Catalan, Greek, and Lithuanian Users

Continuing my previous article I’d like to write about the more recent updates in date formats in glibc. These updates will be included in Fedora 28 final release. On March 29 a new version of glibc 2.27-8 has been released in f28 branch. Together with the unreleased version 2.27-7 it features the correct date formats in Czech, Catalan, Greek, and Lithuanian.

Unfortunately, these updates have not been included in the recently released Fedora 28 Beta ISO image so all Fedora 28 users must update their systems first.

Czech

Bugzilla link: https://sourceware.org/bugzilla/show_bug.cgi?id=22963.

These changes are the most controversial. While talking to my Czech friends I had various answers to my question whether a genitive form of a month name in a date is obligatory in Czech language or not. Is April 10 in Czech 10. dubna or 10. duben? Because of these doubts the changes for Czech language were not included in glibc 2.27 initial release (February 1). But since the Czech translator has added the genitive forms of the months names to glib2 (whose aim is to provide the same features for the systems which do not support genitive forms of months names) I decided that there is no reason to wait any more.

So, this is a short message for Czech users: if you can see a date formatted incorrectly in Czech language because a month name should be nominative rather than genitive, then you must change the date format specifier from "%B" to "%OB" in the translation of an application as soon as possible. I am sorry about the confusion but other inflected languages require a genitive case here. The "%OB" format specifier has been introduced in order to support the cases where a nominative form is required.

By the way: probably the same problem will be in Serbian and Slovak but so far no changes have been introduced in these languages. It would be good to make some decisions before glibc 2.28 is released which is planned on August 1 this year, and better not in the last minute – one month or more before would be recommended.

Catalan

Bugzilla link: https://sourceware.org/bugzilla/show_bug.cgi?id=22848.

We are in April which is a good time to discuss the Catalan language because April in Catalan is abril and the date April 10 is 10 d’abril. The next month will be May (Catalan: maig) and the date May 10 will be 10 de maig.

As I wrote in the previous article, this update had already landed in Fedora Rawhide but now it has been also included in Fedora 28 repository. However, this is not the only change. It turns out that in Catalan the de preposition (or d’ if the following noun begins with a vowel) obligatorily must be added before the abbreviated months names, so there is not only d’abril but also d’abr.

This causes some problem: the ls command line utility which displays file modification timestamp (ls -l) limits abbreviated months names to 5 letters. Let’s see how it looks in Catalan if we use the correct genitive case:

$ LANG=ca_ES.utf8 ls -l
total 0
-rw-rw-r--. 1 rl rl 0  1 de ge 00:00 20180101.test
-rw-rw-r--. 1 rl rl 0  2 de fe 00:00 20180202.test
-rw-rw-r--. 1 rl rl 0  3 de ma 00:00 20180303.test
-rw-rw-r--. 1 rl rl 0 14 d’abr  2018 20180414.test
-rw-rw-r--. 1 rl rl 0  5 de ma  2018 20180505.test
-rw-rw-r--. 1 rl rl 0  6 de ju  2018 20180606.test
-rw-rw-r--. 1 rl rl 0  7 de ju  2018 20180707.test
-rw-rw-r--. 1 rl rl 0  8 d’ag.  2018 20180808.test
-rw-rw-r--. 1 rl rl 0  9 de se  2018 20180909.test
-rw-rw-r--. 1 rl rl 0 10 d’oct  2018 20181010.test
-rw-rw-r--. 1 rl rl 0 11 de no  2018 20181111.test
-rw-rw-r--. 1 rl rl 0 12 de de  2018 20181212.test

March and May are both displayed as de ma and June and July as de ju. I have already filed the request for enhancement against the coreutils project and it has been added upstream – we are waiting for the coreutils 8.30 release which I suspect will be in a month. Will it make it to Fedora 28 before the final release?

Greek

Bugzilla link: https://sourceware.org/bugzilla/show_bug.cgi?id=22937.

These changes are not revolutionary but still interesting. Greek is an inflected language (same as Slavic languages and Latin) and the differences between the genitive and nominative cases are visible even in abbreviated forms of some months names. For example, the month May in Greek is Μάιος in the nominative case and the genitive case is Μαΐου; the abbreviated forms are Μάι and Μαΐ, respectively. From now this difference is correctly supported in Linux.

The change is also visible in an output of ls -l command:

$ LANG=el_GR.utf8 ls -l
σύνολο 0
-rw-rw-r--. 1 rl rl 0 Ιαν   1 00:00 20180101.test
-rw-rw-r--. 1 rl rl 0 Φεβ   2 00:00 20180202.test
-rw-rw-r--. 1 rl rl 0 Μαρ   3 00:00 20180303.test
-rw-rw-r--. 1 rl rl 0 Απρ  14  2018 20180414.test
-rw-rw-r--. 1 rl rl 0 Μαΐ   5  2018 20180505.test
-rw-rw-r--. 1 rl rl 0 Ιουν  6  2018 20180606.test
-rw-rw-r--. 1 rl rl 0 Ιουλ  7  2018 20180707.test
-rw-rw-r--. 1 rl rl 0 Αυγ   8  2018 20180808.test
-rw-rw-r--. 1 rl rl 0 Σεπ   9  2018 20180909.test
-rw-rw-r--. 1 rl rl 0 Οκτ  10  2018 20181010.test
-rw-rw-r--. 1 rl rl 0 Νοε  11  2018 20181111.test
-rw-rw-r--. 1 rl rl 0 Δεκ  12  2018 20181212.test

Lithuanian

Bugzilla link: https://sourceware.org/bugzilla/show_bug.cgi?id=22932.

These changes are minor. The Lithuanian translator had just asked to use in glibc the same abbreviated months names as he used in glib2 and which are also provided by CLDR – so for example the abbreviated name of April will be displayed as bal. rather than Bal now.

This change could be already visible in the ls -l output – unfortunately, for now only the numerical date formats are used:

$ LANG=lt_LT.utf8 ls -l
viso 0
-rw-rw-r--. 1 rl rl 0 2018-01-01 00:00 20180101.test
-rw-rw-r--. 1 rl rl 0 2018-02-02 00:00 20180202.test
-rw-rw-r--. 1 rl rl 0 2018-03-03 00:00 20180303.test
-rw-rw-r--. 1 rl rl 0 2018-04-14 20180414.test
-rw-rw-r--. 1 rl rl 0 2018-05-05 20180505.test
-rw-rw-r--. 1 rl rl 0 2018-06-06 20180606.test
-rw-rw-r--. 1 rl rl 0 2018-07-07 20180707.test
-rw-rw-r--. 1 rl rl 0 2018-08-08 20180808.test
-rw-rw-r--. 1 rl rl 0 2018-09-09 20180909.test
-rw-rw-r--. 1 rl rl 0 2018-10-10 20181010.test
-rw-rw-r--. 1 rl rl 0 2018-11-11 20181111.test
-rw-rw-r--. 1 rl rl 0 2018-12-12 20181212.test

Is this only because the Lithuanian translators did not like the abbreviated months names and they decided that the ls command line utility should display only numbers? If this was the reason then you can restore the text format now. I can’t speak Lithuanian but I would suggest this form:

$ LANG=lt_LT.utf8 ls -l --time-style +"%b %e"
viso 0
-rw-rw-r--. 1 rl rl 0 saus.  1 20180101.test
-rw-rw-r--. 1 rl rl 0 vas.   2 20180202.test
-rw-rw-r--. 1 rl rl 0 kov.   3 20180303.test
-rw-rw-r--. 1 rl rl 0 bal.  14 20180414.test
-rw-rw-r--. 1 rl rl 0 geg.   5 20180505.test
-rw-rw-r--. 1 rl rl 0 birž.  6 20180606.test
-rw-rw-r--. 1 rl rl 0 liep.  7 20180707.test
-rw-rw-r--. 1 rl rl 0 rugp.  8 20180808.test
-rw-rw-r--. 1 rl rl 0 rugs.  9 20180909.test
-rw-rw-r--. 1 rl rl 0 spal. 10 20181010.test
-rw-rw-r--. 1 rl rl 0 lapkr 11 20181111.test
-rw-rw-r--. 1 rl rl 0 gruod 12 20181212.test

For now the dots after lapkr and gruod do not fit but, as I wrote above while discussing the Catalan language, the problem has been already fixed upstream and sooner or later the update will land in Fedora.

Summary

After adding Catalan and Czech support now we have 9 languages which display the dates correctly using the required genitive case (with previously supported Belarusian, Croatian, Greek, Lithuanian, Polish, Russian, and Ukrainian). Belarusian and Russian are not the only which require the different genitive and nominative forms of abbreviated months names, the same is required in Catalan (because of the de or d’ preposition) and in Greek.

Same as previously, if you see in the screenshots in this article any errors in date formats which can be fixed by translators, like missing punctuation marks or incorrect day/month order then please contact the translators of the respective applications.

Internationalization FAD, Pune 2017

For the second time in a short period of time I participated in an important Fedora event. November 20–22, 2017, an Internationalization FAD was organized by a group of Fedora contributors from Red Hat Pune. FAD stands for Fedora Activity Day, it is a mini-conference. It differs from large conferences like Flock because it is attended by small number of people and it is focused on one subject.

Day #0

November 19, 2017

Actually I should write Day #-1 (November 18) and Day #0 because my as well as some other attendees’ travel lasted more than 24 hours. Due to the time zone difference and all the mess it’s difficult to define when a day ended and when it began. In general, the travel went smoothly and without any problem except one: I spent 1.5 hours in a huge queue to the immigration desk at the Mumbai Airport. Somewhere far behind me there was Mike Fabian and even further behind him there were Takao Fujiwara, Akira Tagoh, and Peng Wu, who arrived little later than me.

The long queue in Mumbai Airport

The long queue in Mumbai Airport

I really don’t know why it took so long. Probably because several large jumbo jets with many foreign tourists arrived in a short time. The immigration officers worked rather fast and without unnecessary delays. However, we all met and left Mumbai only after 4 AM local time and we reached our hotel in Pune before 8 AM. Big shouts to Sundeep Anand and Parag Nemade who despite the night and the weekend were contacting us online all the time, giving us advices and making sure that we were OK.

Our first day in India must have been spent on taking some rest after the journey. The hotel turned out to be very comfortable. Parag perfectly organized our time: first he let us take as long rest as we wanted and then in the afternoon he took us for a Red Hat office visit. That was my first Red Hat office visit ever so everything was impressing for me. A brand new office building, some places still being finished, everything in a perfect order.

Day #1

November 20, 2017

The actual first day of the FAD was for presentations. It started with an official opening and self-introductions.

Opening and self-introductions

Opening and self-introductions: Jens Petersen, Pooja Yadav, and Pravin Satpute.

Next everyone had an opportunity to present their current works. It turns out that each of us works on a tasks which are personally familiar. Takao Fujiwara, Akira Tagoh and Peng Wu work on rendering (Pango library) and input (IBus) of the text in East Asian languages. Unfortunately, I know almost nothing about these languages so I don’t understand much of their work – except obvious things like that it’s more complex than in European languages and needed for their speakers. But, on the other hand I spoke about my current work on formatting dates in inflected languages. Each time I talk about it to the foreign people I have a feeling that the audience don’t know that I’m talking about. I guess that time it was the same.

My talk about formatting dates. Photo credits: Jens Petersen.

My talk about formatting dates. Photo credits: Jens Petersen.

Inflection is an original feature of Proto-Indo-European language which disappeared totally or almost totally in most of the contemporary Indo-European languages. However, it still exists in Slavic and Baltic languages, also in Greek, Sanskrit and several more. But this diversity of the discussed topics only means that the term “Internationalization” is very broad, it includes features local to some groups of languages. There is a place for both inflected languages and logographic scripts and more phenomenons than you can think of.

There were more familiar for me topics discussed as well by Mike Fabian with whom I have been working directly since July this year on the maintenance of locale data in glibc project, and Jens Petersen who works on improving the localization support in Fedora (separation of translation packages from the main software packages, installing them depending on the languages chosen by the administrator etc.)

It’s nice that Mike Fabian, Takao Fujiwara and others work on a better support (input and displays) of emojis in Fedora.

Transtats project is getting more and more interesting. While at this, I learned that Sundeep Anand is not working on it alone. FAD was attended by several people from the Quality Assurance team of Red Hat who support him. Those people also actively test other projects, like IBus and East Asian fonts.

Day #2

November 21, 2017
Working on our projects

Working on our projects. Photo credits: Jens Petersen.

The second and the third day were meant for the common work on our project. Most of the time I spent working with Mike Fabian. Despite my initial plans we were neither working on my project of formatting dates in inflected languages nor on the automatic locale data import from CLDR to glibc. Mike says that my work is basically completed and we can’t add anything more, we can only wait for more positive reviews. Instead of this we were working on fixing the collation orders in Latvian and Polish, the nearest plans include more languages, like Czech and Upper Sorbian. It’s a really hard and dirty work. In most of the languages there are established rules of collation order of the letters of their proper alphabets but what should we do if there are foreign letters? Language scientists are free to say “this is unlikely to happen” or “we don’t define how to handle this” but we developers must be able to handle every Unicode string. Moreover, some languages have really unusual collation rules. Usually the rules say that we should compare the letters starting from the beginning and towards the end. If there is a difference between letters it determines the collation order. If the letters differ in the diacritical marks only then some languages treat them like different letters and some like the same letters. But in French language there is, or rather there was a rule saying that if two words differ in diacritical marks only then for the collation order we must take the diacritics… counting from the end of the word! This rule is so weird that finally it has been rejected from most of the French variants but it is still in use in Canadian French. How to deal with this? But Mike has managed to fix it.

We were talking with Jens about the Fedora bug 1401096. While installing Fedora Workstation you can select the user interface language but the localization packages are not installed because they are missing from the installation disk. They must be downloaded from the net. This problem does not occur with the network installation which by definition downloads the packages. I think that we need a way to mark in the package management system that some packages are required and they should be installed in future, as soon as the network becomes available. It’s crucial that I understood the problem because in the past I contributed to gnome-software (and I still hope to contribute in future) and I think this is a task for that project or rather to powering it PackageKit.

Another unplanned task which we had together with Mike and Pravin Satpute was adding the Filipino language to Fedora. Actually all we had to do was to coordinate some tasks because most of them had been finished already or must wait until at least one application translation is ready.

After this hard working day we spent the evening bowling and having BBQ at Amanora Mall. We also celebrated Takao Fujiwara’s birthday.

Day #3

November 22, 2017

The last day of FAD was similar to the second one: we were working on our projects. Also I continued yesterday’s works with Mike. Besides this, I files some more suggestions of changes in glibc:

In the afternoon there was also the Fedora 27 Release Party. How was it? There came more people working in the same office and a large cake with the beautiful printed image was put on the table.

I have a feeling that the Release Party was dominated by us, the FAD attendees. The organizers asked the oldest of us, that means Mike, Jens Petersen and myself to cut the cake. It was really yummy!

That was, unfortunately, my last (so far!) day in India. I warmly thank the organizers for all their help, mostly I thank Satyabrata Maitra but also Parag Nemade, Pravin Satpute and Sundeep Anand. I really regret that I couldn’t stay longer.

Day #4

November 23, 2017

Most of that day I spent traveling which went absolutely without any problem. See you online or in real life! नमस्ते!

glibc 2.26: New and Updated Locales

On August 2, 2017 glibc (The GNU C library) version 2.26 has been released. Among others, many issues related with supported locales have been addressed, most of them shortly before the release. Let’s see what has been changed.

New locales

Compared to the previous version, this release introduces the support of 6 new languages: Aguaruna, Bislama, Fiji Hindi, Samoan, Tok Pisin, and Tongan as well as 2 new variants: South Azerbaijani for Iran, and Maithili for Nepal.

Aguaruna is a language spoken by about 38,000–45,000 indigenous people in Peru. Bislama is an official language of Vanuatu although spoken by about 10,000 people only. Fiji Hindi is a language descending from although different than Hindi. It is spoken by about 300,000 citizens of Fiji which makes about ⅓ of its total population and is one of the official languages of the country. It is written using both the Latin and the Devanagari script. This release introduces the Latin script only but Devanagari is also considered to be introduced in future. Tok Pisin is one of the official languages of Papua New Guinea. Although spoken by only 120,000 native speakers which makes 1.7% of total population it is the most widely used language of the country. No wonder since Papua New Guinea features about 850 native languages.

South Azerbaijani is a variant of Azerbaijani language spoken by about 13 million people (16% of total population) in Iran and Maithili is spoken by about 3 million people (11.5% of total population) in Nepal. Both have been previously represented by their variants for Azerbaijan and India, respectively. Now their users may enjoy more granularity.

Updates

Bugs in alphabetic sorting in Hungarian and Malayalam (see also: here) have been fixed. But lots of other fixes have been introduced in date and time elements, mostly in month names. Typos in either full or abbreviated or both names have been fixed, among others, in Arabic (many variants), Belarusian, Breton, Friulian, Hindi, Kannada, Konkani, Malayalam, Marathi, Mongolian, Northern Sami, Serbian (Latin only), Spanish (Peru and Uruguay), Uzbek, Yoruba, Zulu — total of 55 languages have been updated to the content of CLDR version 31. Weekday names have been updated in Arabic, Chechen, and Kashmiri — Saudi Arabian users had them displayed in English so far. Yes and no translated strings have been added or fixed in many languages.

Incorrectly appended trailing spaces have been removed in several locales, usually from weekday names. They mainly include languages of India but also Albanian (where the issue has been first spotted), Haitian, Maltese, and more. This change will polish date formatting in these locales.

Unicode 10.0

This version also introduces the full support of Unicode 10.0. The changes are mainly focused on new emoji characters.

It’s worth mentioning that the full Unicode 10.0 support has been added to glibc only 2 days after its official release by the Unicode Consortium.