This is default featured slide 1 title

This is default featured slide 1 title

You can completely customize the featured slides from the theme theme options page. You can also easily hide the slider from certain part of your site like: categories, tags, archives etc. More »

This is default featured slide 2 title

This is default featured slide 2 title

You can completely customize the featured slides from the theme theme options page. You can also easily hide the slider from certain part of your site like: categories, tags, archives etc. More »

This is default featured slide 3 title

This is default featured slide 3 title

You can completely customize the featured slides from the theme theme options page. You can also easily hide the slider from certain part of your site like: categories, tags, archives etc. More »

This is default featured slide 4 title

This is default featured slide 4 title

You can completely customize the featured slides from the theme theme options page. You can also easily hide the slider from certain part of your site like: categories, tags, archives etc. More »

This is default featured slide 5 title

This is default featured slide 5 title

You can completely customize the featured slides from the theme theme options page. You can also easily hide the slider from certain part of your site like: categories, tags, archives etc. More »

 

Welcome to our website. Neque porro quisquam est qui dolorem ipsum dolor.

Lorem ipsum eu usu assum liberavisse, ut munere praesent complectitur mea. Sit an option maiorum principes. Ne per probo magna idque, est veniam exerci appareat no. Sit at amet propriae intellegebat, natum iusto forensibus duo ut. Pro hinc aperiri fabulas ut, probo tractatos euripidis an vis, ignota oblique.

Ad ius munere soluta deterruisset, quot veri id vim, te vel bonorum ornatus persequeris. Maecenas ornare tortor. Donec sed tellus eget sapien fringilla nonummy. Mauris a ante. Suspendisse quam sem, consequat at, commodo vitae, feugiat in, nunc. Morbi imperdiet augue quis tellus.

glibc 2.27: New and Updated Locales

See also: glibc 2.26: New and Updated Locales.

The new version glibc 2.27 has been released on February 1, 2018 (or February 2, depending on your time zone). This is the much belated report of the new changes in locale support.

Collation

Major rework has been started on the correct alphabetic sorting using ISO 14651:2016 standard (click here to download a publicly available version). It has been finished only after the glibc 2.27 release but the work in progress had fixed collation rules in many languages including Mandarin Chinese (Taiwan), Croatian, Czech, Estonian, Canadian French, Icelandic, Latvian, Lithuanian, Polish, Turkish, and Upper Sorbian. Much of this work has been completed or at least started during the Internationalization FAD and therefore it has been sponsored by Fedora Project. Big thanks to Mike Fabian for his great contribution!

Correct Date Formats

Another major change which must be mentioned here is the introduction of date formats using the correct grammar forms in inflected languages. This feature needs a separate article which will be written later. Shortly: from now the glibc functions nl_langinfo() and strftime() from now can support not only two forms of month names (full and abbreviated) but four (for months as used in dates, which often means a genitive grammar case in inflected languages, and for months as used standalone, which often means a nominative case). For example, in Polish language the month May is maj but in order to express a date it is obligatory to use a genitive case: 29 maja. The feature is optional which means that the languages which don’t need it will not see any change.

Introduction of a software feature does not cause any changes until the locale data using it is provided. First Polish locale data has been updated, shortly followed by Ukrainian, and then Russian, Greek, Belarusian, Lithuanian, and finally Croatian. Ukrainian locale data has been using alternative digits feature to provide month names in a genitive case for last 11 years. This solution has been recognized as a dirty hack and removed, also it seems it was not widely known and therefore not widely used by actual users.

The change has appeared in the upstream repository only 10 days before the final release, there was not enough time to add more languages. The next release will include the updated locale data for Catalan, Czech, and few other languages.

New Locales

As every release, this adds new locales. There are 6 new languages: Kabyle, Karbi, Mauritian Creole (Morisyen), Miskito, Shan, and Yau (also called Uruwa), also 3 new variants: Bhojpuri for Nepal, English for the Seychelles, and Valencian (dialect of Catalan).

Kabyle is a language spoken by about 5 million people in Algeria, this makes it the third most spoken language of the country. Karbi is a minority language spoken by about 400,000 people in north-eastern India and north-eastern Bangladesh. Morisyen is the most spoken language of Mauritius (about 1 million speakers). Miskito is a native language spoken by about 150,000 people in Nicaragua and Honduras. Shan is a language spoken by more than 3 million people in Myanmar, this is the second most spoken language of the country. Yau is the smallest language added in this release, spoken by about 1,700 people in Papua New Guinea.

Bhojpuri is the third most spoken language of Nepal (6% of total population). It is also spoken in India and as such has been supported by glibc previously. Valencian Catalan language (ca_ES@valencia) is spoken by about 2.3 million people in Valencia, a community in Spain. It has been supported by some Linux distributions as a downstream patch for many years. From now it is officially in glibc. English does not need its introduction: of course, it has been present in computer industry since forever. It is also an official language of Seychelles along with French and Seychellois Creole.

Lots of Minor Fixes

There are also many other minor bug fixes in this release. The localized messages for yes and no and single-letter answers have been updated in many locales. Chinese, Japanese, and Korean accept full-width Y and N characters as valid answers. Some redundant data have been removed, for example all monetary data for all locales of India are now dynamically copied from Hindi. If there are bugs detected or changes are introduced in future it will be easy to change only one file. More updates include monetary and numerical formats, also less used data like phone number formats, address data, or ISBN numbers have been updated in many locales.

Finally, most of the Unicode sequences (like: <Uxxxx> where each x means a hexadecimal digit) in a source code of locale data have been replaced with ASCII characters, wherever possible. Nowadays nobody remembers why these sequences were required but plain ASCII turned out to be working perfectly. Of course, the characters from outside the basic ASCII range still remain encoded as the Unicode sequences.

Fedora 28: Updates for Czech, Catalan, Greek, and Lithuanian Users

Continuing my previous article I’d like to write about the more recent updates in date formats in glibc. These updates will be included in Fedora 28 final release. On March 29 a new version of glibc 2.27-8 has been released in f28 branch. Together with the unreleased version 2.27-7 it features the correct date formats in Czech, Catalan, Greek, and Lithuanian.

Unfortunately, these updates have not been included in the recently released Fedora 28 Beta ISO image so all Fedora 28 users must update their systems first.

Czech

Bugzilla link: https://sourceware.org/bugzilla/show_bug.cgi?id=22963.

These changes are the most controversial. While talking to my Czech friends I had various answers to my question whether a genitive form of a month name in a date is obligatory in Czech language or not. Is April 10 in Czech 10. dubna or 10. duben? Because of these doubts the changes for Czech language were not included in glibc 2.27 initial release (February 1). But since the Czech translator has added the genitive forms of the months names to glib2 (whose aim is to provide the same features for the systems which do not support genitive forms of months names) I decided that there is no reason to wait any more.

So, this is a short message for Czech users: if you can see a date formatted incorrectly in Czech language because a month name should be nominative rather than genitive, then you must change the date format specifier from "%B" to "%OB" in the translation of an application as soon as possible. I am sorry about the confusion but other inflected languages require a genitive case here. The "%OB" format specifier has been introduced in order to support the cases where a nominative form is required.

By the way: probably the same problem will be in Serbian and Slovak but so far no changes have been introduced in these languages. It would be good to make some decisions before glibc 2.28 is released which is planned on August 1 this year, and better not in the last minute – one month or more before would be recommended.

Catalan

Bugzilla link: https://sourceware.org/bugzilla/show_bug.cgi?id=22848.

We are in April which is a good time to discuss the Catalan language because April in Catalan is abril and the date April 10 is 10 d’abril. The next month will be May (Catalan: maig) and the date May 10 will be 10 de maig.

As I wrote in the previous article, this update had already landed in Fedora Rawhide but now it has been also included in Fedora 28 repository. However, this is not the only change. It turns out that in Catalan the de preposition (or d’ if the following noun begins with a vowel) obligatorily must be added before the abbreviated months names, so there is not only d’abril but also d’abr.

This causes some problem: the ls command line utility which displays file modification timestamp (ls -l) limits abbreviated months names to 5 letters. Let’s see how it looks in Catalan if we use the correct genitive case:

$ LANG=ca_ES.utf8 ls -l
total 0
-rw-rw-r--. 1 rl rl 0  1 de ge 00:00 20180101.test
-rw-rw-r--. 1 rl rl 0  2 de fe 00:00 20180202.test
-rw-rw-r--. 1 rl rl 0  3 de ma 00:00 20180303.test
-rw-rw-r--. 1 rl rl 0 14 d’abr  2018 20180414.test
-rw-rw-r--. 1 rl rl 0  5 de ma  2018 20180505.test
-rw-rw-r--. 1 rl rl 0  6 de ju  2018 20180606.test
-rw-rw-r--. 1 rl rl 0  7 de ju  2018 20180707.test
-rw-rw-r--. 1 rl rl 0  8 d’ag.  2018 20180808.test
-rw-rw-r--. 1 rl rl 0  9 de se  2018 20180909.test
-rw-rw-r--. 1 rl rl 0 10 d’oct  2018 20181010.test
-rw-rw-r--. 1 rl rl 0 11 de no  2018 20181111.test
-rw-rw-r--. 1 rl rl 0 12 de de  2018 20181212.test

March and May are both displayed as de ma and June and July as de ju. I have already filed the request for enhancement against the coreutils project and it has been added upstream – we are waiting for the coreutils 8.30 release which I suspect will be in a month. Will it make it to Fedora 28 before the final release?

Greek

Bugzilla link: https://sourceware.org/bugzilla/show_bug.cgi?id=22937.

These changes are not revolutionary but still interesting. Greek is an inflected language (same as Slavic languages and Latin) and the differences between the genitive and nominative cases are visible even in abbreviated forms of some months names. For example, the month May in Greek is Μάιος in the nominative case and the genitive case is Μαΐου; the abbreviated forms are Μάι and Μαΐ, respectively. From now this difference is correctly supported in Linux.

The change is also visible in an output of ls -l command:

$ LANG=el_GR.utf8 ls -l
σύνολο 0
-rw-rw-r--. 1 rl rl 0 Ιαν   1 00:00 20180101.test
-rw-rw-r--. 1 rl rl 0 Φεβ   2 00:00 20180202.test
-rw-rw-r--. 1 rl rl 0 Μαρ   3 00:00 20180303.test
-rw-rw-r--. 1 rl rl 0 Απρ  14  2018 20180414.test
-rw-rw-r--. 1 rl rl 0 Μαΐ   5  2018 20180505.test
-rw-rw-r--. 1 rl rl 0 Ιουν  6  2018 20180606.test
-rw-rw-r--. 1 rl rl 0 Ιουλ  7  2018 20180707.test
-rw-rw-r--. 1 rl rl 0 Αυγ   8  2018 20180808.test
-rw-rw-r--. 1 rl rl 0 Σεπ   9  2018 20180909.test
-rw-rw-r--. 1 rl rl 0 Οκτ  10  2018 20181010.test
-rw-rw-r--. 1 rl rl 0 Νοε  11  2018 20181111.test
-rw-rw-r--. 1 rl rl 0 Δεκ  12  2018 20181212.test

Lithuanian

Bugzilla link: https://sourceware.org/bugzilla/show_bug.cgi?id=22932.

These changes are minor. The Lithuanian translator had just asked to use in glibc the same abbreviated months names as he used in glib2 and which are also provided by CLDR – so for example the abbreviated name of April will be displayed as bal. rather than Bal now.

This change could be already visible in the ls -l output – unfortunately, for now only the numerical date formats are used:

$ LANG=lt_LT.utf8 ls -l
viso 0
-rw-rw-r--. 1 rl rl 0 2018-01-01 00:00 20180101.test
-rw-rw-r--. 1 rl rl 0 2018-02-02 00:00 20180202.test
-rw-rw-r--. 1 rl rl 0 2018-03-03 00:00 20180303.test
-rw-rw-r--. 1 rl rl 0 2018-04-14 20180414.test
-rw-rw-r--. 1 rl rl 0 2018-05-05 20180505.test
-rw-rw-r--. 1 rl rl 0 2018-06-06 20180606.test
-rw-rw-r--. 1 rl rl 0 2018-07-07 20180707.test
-rw-rw-r--. 1 rl rl 0 2018-08-08 20180808.test
-rw-rw-r--. 1 rl rl 0 2018-09-09 20180909.test
-rw-rw-r--. 1 rl rl 0 2018-10-10 20181010.test
-rw-rw-r--. 1 rl rl 0 2018-11-11 20181111.test
-rw-rw-r--. 1 rl rl 0 2018-12-12 20181212.test

Is this only because the Lithuanian translators did not like the abbreviated months names and they decided that the ls command line utility should display only numbers? If this was the reason then you can restore the text format now. I can’t speak Lithuanian but I would suggest this form:

$ LANG=lt_LT.utf8 ls -l --time-style +"%b %e"
viso 0
-rw-rw-r--. 1 rl rl 0 saus.  1 20180101.test
-rw-rw-r--. 1 rl rl 0 vas.   2 20180202.test
-rw-rw-r--. 1 rl rl 0 kov.   3 20180303.test
-rw-rw-r--. 1 rl rl 0 bal.  14 20180414.test
-rw-rw-r--. 1 rl rl 0 geg.   5 20180505.test
-rw-rw-r--. 1 rl rl 0 birž.  6 20180606.test
-rw-rw-r--. 1 rl rl 0 liep.  7 20180707.test
-rw-rw-r--. 1 rl rl 0 rugp.  8 20180808.test
-rw-rw-r--. 1 rl rl 0 rugs.  9 20180909.test
-rw-rw-r--. 1 rl rl 0 spal. 10 20181010.test
-rw-rw-r--. 1 rl rl 0 lapkr 11 20181111.test
-rw-rw-r--. 1 rl rl 0 gruod 12 20181212.test

For now the dots after lapkr and gruod do not fit but, as I wrote above while discussing the Catalan language, the problem has been already fixed upstream and sooner or later the update will land in Fedora.

Summary

After adding Catalan and Czech support now we have 9 languages which display the dates correctly using the required genitive case (with previously supported Belarusian, Croatian, Greek, Lithuanian, Polish, Russian, and Ukrainian). Belarusian and Russian are not the only which require the different genitive and nominative forms of abbreviated months names, the same is required in Catalan (because of the de or d’ preposition) and in Greek.

Same as previously, if you see in the screenshots in this article any errors in date formats which can be fixed by translators, like missing punctuation marks or incorrect day/month order then please contact the translators of the respective applications.

Fedora 28 and GNOME 3.28: New Features for Eastern Europe

This time this is not fake, edited, patched, nor a custom build from COPR but the real screenshots of the unmodified downstream Fedora 28 planned to be released on May 1 this year. Here is how the default calendar widget in GNOME Shell looks in Greek, Polish, and Ukrainian:

For those who can’t speak those languages: the major change here is that the month names are displayed in a correct grammatical form, both in dates and standalone. This is a new feature, or rather a new bugfix, in GNOME 3.28 which has been released on March 14 and pushed to Fedora 28 (prerelease) stable updates today. The series of bugfixes in GNOME was preceded by the similar bugfix in glibc 2.27 released earlier this year.

What Is Eastern Europe

This term must be explained because it is ambiguous. Usually when we say eastern Europe we mean the eastern end of our continent (as opposed to western, northern, southern, and, last but not least, central). But in this context I mean the eastern half of Europe (as opposed to western, and nothing else). I often strongly emphasize that this feature is not just for Slavic languages but also for other language groups of our region: Baltic, Greek, partially also Finnish, and even some western languages like Catalan or Scottish Gaelic.

More Applications

Of course, dates are now displayed correctly in all applications, not just GNOME Shell. In most of them this happened automagically. Few of them, however, needed some minor updates to make sure that the month names are displayed in a genitive case only where needed, not just everywhere. Here is an example of a correct month names display in GNOME Calendar, this time in Croatian:

Please note the difference between the nominative name for March (ožujak) and its correct genitive case as used in date (ožujka; literally: of March).

Western European Languages

English does not have any unsupported features but, while at this, I have examined the date displays in some other western European languages and few features were not supported. For example, some Romance languages (Spanish, Portuguese, etc.) also use the genitive case of both the month name and the year number but they construct it just adding the de preposition before. This feature although so simple was not yet supported so far but now it has been added to GNOME 3.28. Here is a screenshot of the same calendar widget in Spanish:

Please note the correct header saying diciembre de 2017 as opposed by the incorrect diciembre 2017 which is displayed by the older versions.

More Languages

The genitive case of month names is currently supported in Fedora 28 prerelease in only 7 languages: Belarusian, Croatian, Greek, Lithuanian, Polish, Russian, and Ukrainian. But the support of more languages is on the way: Catalan and Czech have been added to GLib and they are already used if the latest GNOME is ran on older systems. The support of these languages has been also pushed to glibc upstream and eventually will reach Fedora 28 but has not yet as of today. However, it has already reached Fedora Rawhide. If we have this chance, let’s take a look at the screenshot of GNOME in Fedora Rawhide in Catalan:

Please note the correct Catalan preposition of genitive case: de març (of March) vs. d’abril (of April).

Thanks

I’d like to thank all the people from Fedora and GNOME communities and from the outer world who supported me in this challenge: Piotr Drąg, Mike Fabian, Zack Weinberg, Carlos O’Donell, Masha Leonova, Ihar Hrachyshka, Dmitry Levin, Igor Gnatenko, Charalampos Stratakis, Robert Buj, Philip Withnall, and more.

PS. If some date formats in these screenshots are incorrect please approach the respective translation teams.

Some Bugs Are Really Funny

People learn from errors. Therefore bugs should be made public, not hidden.

GLib is a utility library originally developed for GNOME but also used by other projects. One of many functions it provides is g_date_set_parse(). It is really smart and simple. It accepts a string to parse but other than many date parsing functions it does not require a date format to be passed. Instead it tries to find numbers and month names in the parsed string and figure out what date they can represent. Of course, month names are recognized according to the current locale.

Let’s see how it works for Polish. Here is the list of month names:

Month # Full name Abbreviated name
1. Styczeń Sty
2. Luty Lut
3. Marzec Mar
4. Kwiecień Kwi
5. Maj Maj
6. Czerwiec Cze
7. Lipiec Lip
8. Sierpień Sie
9. Wrzesień Wrz
10. Październik Paź
11. Listopad Lis
12. Grudzień Gru

The loop implementing the algorithm iterates over all months and checks if the string being parsed contains a full or abbreviated month name as a substring. The first month which is found as a substring of the parsed string is recognized as a result. Let’s see what happens when a string containing the 9th month, September, which in Polish is wrzesień, is parsed by this algorithm:

Iteration # Full name Abbreviated name Does the string wrzesień contain it?
1. Styczeń Sty No
2. Luty Lut No
3. Marzec Mar No
4. Kwiecień Kwi No
5. Maj Maj No
6. Czerwiec Cze No
7. Lipiec Lip No
8. Sierpień Sie Yes: wrzesień!

So, as a result, the string wrzesień (September) is recognized as sierpień (August).

Is this severe at all?

To be honest, not really. The bug seems to have been around for 20 years now and nobody has complained so far. Parsing dates is not really useful. There are many good reasons why it may not work in localized texts, like incomplete or incorrect translations, varying orthographic rules, Unicode characters updates, etc. Probably no real applications actually use this.

Nevertheless, the problem has been reported to GNOME Bugzilla and will be worked on.

Internationalization FAD, Pune 2017

For the second time in a short period of time I participated in an important Fedora event. November 20–22, 2017, an Internationalization FAD was organized by a group of Fedora contributors from Red Hat Pune. FAD stands for Fedora Activity Day, it is a mini-conference. It differs from large conferences like Flock because it is attended by small number of people and it is focused on one subject.

Day #0

November 19, 2017

Actually I should write Day #-1 (November 18) and Day #0 because my as well as some other attendees’ travel lasted more than 24 hours. Due to the time zone difference and all the mess it’s difficult to define when a day ended and when it began. In general, the travel went smoothly and without any problem except one: I spent 1.5 hours in a huge queue to the immigration desk at the Mumbai Airport. Somewhere far behind me there was Mike Fabian and even further behind him there were Takao Fujiwara, Akira Tagoh, and Peng Wu, who arrived little later than me.

The long queue in Mumbai Airport

The long queue in Mumbai Airport

I really don’t know why it took so long. Probably because several large jumbo jets with many foreign tourists arrived in a short time. The immigration officers worked rather fast and without unnecessary delays. However, we all met and left Mumbai only after 4 AM local time and we reached our hotel in Pune before 8 AM. Big shouts to Sundeep Anand and Parag Nemade who despite the night and the weekend were contacting us online all the time, giving us advices and making sure that we were OK.

Our first day in India must have been spent on taking some rest after the journey. The hotel turned out to be very comfortable. Parag perfectly organized our time: first he let us take as long rest as we wanted and then in the afternoon he took us for a Red Hat office visit. That was my first Red Hat office visit ever so everything was impressing for me. A brand new office building, some places still being finished, everything in a perfect order.

Day #1

November 20, 2017

The actual first day of the FAD was for presentations. It started with an official opening and self-introductions.

Opening and self-introductions

Opening and self-introductions: Jens Petersen, Pooja Yadav, and Pravin Satpute.

Next everyone had an opportunity to present their current works. It turns out that each of us works on a tasks which are personally familiar. Takao Fujiwara, Akira Tagoh and Peng Wu work on rendering (Pango library) and input (IBus) of the text in East Asian languages. Unfortunately, I know almost nothing about these languages so I don’t understand much of their work – except obvious things like that it’s more complex than in European languages and needed for their speakers. But, on the other hand I spoke about my current work on formatting dates in inflected languages. Each time I talk about it to the foreign people I have a feeling that the audience don’t know that I’m talking about. I guess that time it was the same.

My talk about formatting dates. Photo credits: Jens Petersen.

My talk about formatting dates. Photo credits: Jens Petersen.

Inflection is an original feature of Proto-Indo-European language which disappeared totally or almost totally in most of the contemporary Indo-European languages. However, it still exists in Slavic and Baltic languages, also in Greek, Sanskrit and several more. But this diversity of the discussed topics only means that the term “Internationalization” is very broad, it includes features local to some groups of languages. There is a place for both inflected languages and logographic scripts and more phenomenons than you can think of.

There were more familiar for me topics discussed as well by Mike Fabian with whom I have been working directly since July this year on the maintenance of locale data in glibc project, and Jens Petersen who works on improving the localization support in Fedora (separation of translation packages from the main software packages, installing them depending on the languages chosen by the administrator etc.)

It’s nice that Mike Fabian, Takao Fujiwara and others work on a better support (input and displays) of emojis in Fedora.

Transtats project is getting more and more interesting. While at this, I learned that Sundeep Anand is not working on it alone. FAD was attended by several people from the Quality Assurance team of Red Hat who support him. Those people also actively test other projects, like IBus and East Asian fonts.

Day #2

November 21, 2017
Working on our projects

Working on our projects. Photo credits: Jens Petersen.

The second and the third day were meant for the common work on our project. Most of the time I spent working with Mike Fabian. Despite my initial plans we were neither working on my project of formatting dates in inflected languages nor on the automatic locale data import from CLDR to glibc. Mike says that my work is basically completed and we can’t add anything more, we can only wait for more positive reviews. Instead of this we were working on fixing the collation orders in Latvian and Polish, the nearest plans include more languages, like Czech and Upper Sorbian. It’s a really hard and dirty work. In most of the languages there are established rules of collation order of the letters of their proper alphabets but what should we do if there are foreign letters? Language scientists are free to say “this is unlikely to happen” or “we don’t define how to handle this” but we developers must be able to handle every Unicode string. Moreover, some languages have really unusual collation rules. Usually the rules say that we should compare the letters starting from the beginning and towards the end. If there is a difference between letters it determines the collation order. If the letters differ in the diacritical marks only then some languages treat them like different letters and some like the same letters. But in French language there is, or rather there was a rule saying that if two words differ in diacritical marks only then for the collation order we must take the diacritics… counting from the end of the word! This rule is so weird that finally it has been rejected from most of the French variants but it is still in use in Canadian French. How to deal with this? But Mike has managed to fix it.

We were talking with Jens about the Fedora bug 1401096. While installing Fedora Workstation you can select the user interface language but the localization packages are not installed because they are missing from the installation disk. They must be downloaded from the net. This problem does not occur with the network installation which by definition downloads the packages. I think that we need a way to mark in the package management system that some packages are required and they should be installed in future, as soon as the network becomes available. It’s crucial that I understood the problem because in the past I contributed to gnome-software (and I still hope to contribute in future) and I think this is a task for that project or rather to powering it PackageKit.

Another unplanned task which we had together with Mike and Pravin Satpute was adding the Filipino language to Fedora. Actually all we had to do was to coordinate some tasks because most of them had been finished already or must wait until at least one application translation is ready.

After this hard working day we spent the evening bowling and having BBQ at Amanora Mall. We also celebrated Takao Fujiwara’s birthday.

Day #3

November 22, 2017

The last day of FAD was similar to the second one: we were working on our projects. Also I continued yesterday’s works with Mike. Besides this, I files some more suggestions of changes in glibc:

In the afternoon there was also the Fedora 27 Release Party. How was it? There came more people working in the same office and a large cake with the beautiful printed image was put on the table.

I have a feeling that the Release Party was dominated by us, the FAD attendees. The organizers asked the oldest of us, that means Mike, Jens Petersen and myself to cut the cake. It was really yummy!

That was, unfortunately, my last (so far!) day in India. I warmly thank the organizers for all their help, mostly I thank Satyabrata Maitra but also Parag Nemade, Pravin Satpute and Sundeep Anand. I really regret that I couldn’t stay longer.

Day #4

November 23, 2017

Most of that day I spent traveling which went absolutely without any problem. See you online or in real life! नमस्ते!

Linux Autumn 2017

Autumn

Linux Autumn is an annual Polish conference dedicated to the free software and GNU/Linux. This year it was its 15th edition and this time it was held in Muflon Leisure Center in Ustroń.

Shortly speaking: the conference was interesting but my participation was limited due to a virus¹ attack.

Day #1: September 22

Not much has been planned for that day because the attendees were only arriving. The event started at 4 PM and the first speaker was Igor Gnatenko from Red Hat. He talked about the dependencies between the packages, especially about the new kinds of dependencies added in RPM 4.14. I was a little late to this talk but thanks to YouTube I know how it was like and I must admit that it was interesting. I like the idea of a talk which focuses on a small subject which do not requires advanced skills to understand it and at the same time provides important information to the attendees. It’s very worth to be mentioned here as it was the only talk in English:

The second speaker was myself. I talked about preparing an application for internationalization and avoiding typical errors. How it was – you should judge on your own. Unfortunately, this talk and all others were in Polish and English translations do not exist so I don’t provide links here.

My talk about preparing an application for internationalization. Photo: Igor Gnatenko.

In the evening there was a dinner and long conversation about professional and non-professional subjects.

Day #2: September 23

In the morning I woke up with a sore throat and I knew that the conference is actually over for me. Luckily, I had given my talk the previous day when I still had felt good. Despite this I pulled myself together and attended all talks. I’d like to mention two most interesting ones in my opinion. The first was Maciej Nabożny‘s talk about his libdinemic project. In his talk he included many subjects like cryptography, certificates, but first of all Maciej comprehensibly explained how blockchain works and how it powers bitcoin. The second talk was by Dariusz Puchalak about OpenSSH, Ansible and other network tools. Usually I’m less interested in administrative stuff than programming but Dariusz’ talk was really zestful and impressed me. I recommend his talks to everyone, he is a really great speaker.

Piotr Kliczewski from Red Hat talks about oVirt

Day #3: September 24

So this was really the end for me. In the night I had a fever, shortly after the breakfast I packed my things, said goodbye and went back home. I wish I could recommend you watching the videos on YouTube, unfortunately they are mostly in Polish. Please come next year, the more foreign speakers and attendees we have the more likely we switch to English.

PS. Regarding the virus, as it usually happens, the next day I felt much better and two days later I was quite good.


¹ Virus: a biological structure similar to but unrelated with computer viruses. They attack the cells of living organisms and are totally safe for computers.

Linux Autumn: Memories of the Last Year

Linux Autumn 2017 is coming, it’s starting in 3 days only. While waiting let’s take a look at one presentation of the last year.

Flock 2017

Flock to Fedora, the annual Fedora users and contributors conference, was held this year from August 29 to September 1 in Hyannis, MA, a tourist resort located at the Atlantic Ocean coast. As I was privileged to participate in this event here is my report about it.

Day #0

August 28, 2017

Usually this is a day of arrivals and hotel check-ins. Sometimes there are unofficial pre-party events organized, like hanging out in a bar together. This time additionally at 7 PM a training for people who volunteered to run the A/V equipment was organized. Unfortunately, I was unable to participate although I did volunteer. All I could to was to express my regret for being late at the discussion channel as I was still traveling from the airport to Hyannis. Several people traveling on the same bus with me responded “same here.” As a result, the organizers had no other choice but to repeat the training the next day at 8:00 AM.

Day #1

August 29, 2017

The day started with the short outstanding A/V training and the main conference began. Each day started with a short intro by Brian Exelbierd who announced the events planned for the day and which tickets to use to attend them. Then there was a keynote by Matthew Miller. As usually, this was a summary of Fedora popularity statistics. Matthew emphasized that the statistics may be incomplete because Fedora respects the users’ privacy, does not register their IDs, does not track their activity. The only source of information is download servers traffic. We don’t know how many persons share the same IP address. There is no world map of users, we don’t know what is their geographical distribution. The countries where the Internet access is expensive or simply unavailable may be underrepresented in the statistics. Besides that, only GNOME and KDE automatically check for updates, other desktop environments do not have these features. There are no statistics about Fedora spins and editions.

Matthew Miller and Fedora statistics

Next all speakers had an opportunity to advertise their sessions: give a pitch of their talks and workshops. The aim was to attract potential attendees. Great idea in my opinion, not everyone might be attracted by the title and summary alone or even worse might have skipped it.

A long queue of the speakers willing to give a pitch of their talks

Lunch was deliberately scheduled for 2 hours, the ogranizers aim was to make an opportunity for unofficial and spontaneous talks between attendees. Really great idea, how many times I had spent a conference time on long lobby discussions rather than on official talks in the past!

I attended Owen Taylor‘s talk “How to make your application into a Flatpak.” I must admit that although Flatpak is a very promising application distribution technology it is still mostly unknown to me. As it reminds me (I’m not sure if correctly, though) the distribution methods of OS X and Android I asked Owen what is the difference between them. Owen emphasized that Flatpak offers much better security. By default an application has no special permissions. They are granted only when needed. Besides that I also attended Adam Williamson‘s talk about automated test systems in Bodhi and another short talk about IoT by Peter Robinson.

Finally there was a session which I liked most. Dan Horák and Sinny Kumari talked about their experiences in debugging programs being ported to alternative architectures. According to Dan, the biggest problem is endianness: porting from popular little endian architectures to big endian. Other obstacles are different sizes of popular types, especially char, and different definitions of size_t: signed or unsigned. Sinny talked about difficulties while debugging hybrid programs written in Python but using shared libraries written in C. Debuggers so far cannot step into the native functions. Sometimes a solutoin of a bug can be as short as a single line but it is hidden very deep in the code which makes finding it really time consuming. Unfortunately, this session was attended by few people only. A day later I talked about it to Paul Frields who commented “That’s why we call them alternative architectures.” But it had its advantage: the session converted into a free conversation in which we all shared our own experiences. I was very curious about the big endian architectures. I never worked with them so I asked whether Fedora actually supports any of them. Dan gave the examples: IBM System z mainframes and legacy PowerPC. The session time was over, the camera went off, and we were continuing our small talks.

As a result I skipped the “Build Your Own Fedorator” workshop led by Sanqui which was held parallelly with the Game Night. When I finally arrived I saw assembled and working Fedorators. But I didn’t lose much: Sanqui and Nick Bebout, who was also my roommate, shortly explained me what is that Fedorator and helped me to test it. It is a device powered by Raspberry PI computer with a dedicated touch screen and a single USB port, all in a trapeze shaped 3D-printed box. The software runs on Raspbian and has only one application: it writes any selected Fedora boot image (yes, any spin, any architecture, etc.) to a pendrive. Very handy for fairs, shows and conferences when one can generate and give the attendees the working images.

Working Fedorator

Another event held in parallel with the Game Night was the International Candy Swap, this time officially organized by Justin Flory. I brought Mieszanka Krakowska (Krakow’s Assorted Jellies which could be read as the last year’s Flock memory) but the most tasty in my – and not just my – opinion was bakllava brought by Jona Azizaj. Very interesting was freeze-dried official astronaut ice cream produced by NASA brought by Suzanne Hillman.

Balkan sweets: bakllava (middle), llokume (above), accompanied by Polish Mieszanka Krakowska (right)

Day #2

August 30, 2017

As always, it started with an intro by Brian and some more talk pitches.

Immediately after this a workshop about Fedora Websites led by Andrea Masala and Robert Mayr started. It’s rather strange that I attended this workshop because I’m mostly not interested in web development but I had lots of fun with this new kind of activity. I learned that the Fedora websites use Genshi framework to handle localizations and the translations are provided by Zanata service. The authors had put much effort to ensure that translators are unable to break HTML elements such as links. I was really surprised to learn that the web pages use Python. During the workshop attendees had an opportunity to fix one or more issues reported in the pagure repository. I was asked to remove a link to Moldovan Fedora community because its website no longer existed. While at this, I checked if the links to all European local communities work – don’t worry, there were only few of them – and discovered that the Romanian page neither does work so I removed both of them.

After the lunch, from 2 PM to 4:30 PM Stephen Smoogen hosted two joint sessions about EPEL (Extra Packages for Enterprise Linux). EPEL has its 10th anniversary this year. It provides additional packages for Red Hat’s commercial distributions since RHEL 4.0. Stephen explained why EPEL exists: people use it to build large, massive things which he compared to building bridges. Those people don’t care about the newest versions of their software. They accept using orphaned, unsupported packages because they need them for the existing scripts in their projects. EPEL provides them the packages which they need: security tools, statistics tools (e.g., the R language), alernative web servers (e.g., nginx), monitoring tools, configuration management.

Stephen Smoogen presents his unrealistic expectations of EPEL growth

The second part or more precisely the second session about EPEL turned into a discussion about what EPEL users need for their future. Stephen announced that EPEL repositories will have the same structure as Fedora repositories have now: release + updates (tested) + updates-testing (for the testers). He also announced that RHEL 5.11 will be released althoug the original plans were to make 5.10 the last release of 5.x series.

The day ended with Wackenhammer’s Clockwork Arcade and Carousel evening event. There were arcade games, amusement park style carousel, food truck and drinks.

Games at Wackenhammer’s Clockwork Arcade

Day #3

August 31, 2017

The organizers had a reasonable idea to start each day a little later than the previous one. This day started at 9 AM. The first speaker was Michael McGrath who talked about what does Red Hat want from Fedora. Shortly, the users have miscellaneous demands. Some of them want software updates to be delivered faster, some want them slower, and some have mixed expectations: some software faster, some slower. Fedora, as meant by Red Hat, is not supposed to be a place to keep the things as they are. Users may expect fast changes.

At 2 PM a track about globalization, internationalization and localization began. First speakers were Parag Nemade, Jens Petersen and Pravin Satpute. They summarized the known aims: delivering translations and settings like date formatting, currency, alphabetic sorting, paper size etc., contained in the glibc locale data. They also talked about the changes in input methods.

Progress of input methods

The recent improvements introduced in Fedora include: langpacks installed with weak dependencies, locale data in glibc split into subpackages (there is no need to install and place in ISO images the locale data for all languages because they take too much disk space). Currently there are about 80 langpacks in Fedora and the list is not yet complete. Fedora 25 inroduced IBus Emoji typing and full Unicode 9.0 support. We had a short discussion about how users should configure their preferred language packages. My suggestion is not to place the functionality into GNOME Initial Setup nor GNOME Software because they are part of GNOME which is not the only one desktop environment in Fedora. It seems better to me to move it to initial-setup or maybe to PackageKit which attempts to be a universal solution for all desktop environments and even all distributions. Pravin Satpute summarized his work on fonts, he asked for more input from local communities.

Pravin Satpute talks about fonts

The next speaker was Sundeep Anand who introduced his project Transtats. It is very promising, its aim is to organize communication between developers, translators, and packagers, and facilitate distribution of the packages. I will write more about it shortly below.

The last session of this series was held by Pravin Satpute, Alex Eng and Jean-Baptiste Holcroft. Pravin summarized the current progress of localizations. Translation sprints gather more than 50 contributors each. I18N test days gather more local communities than any other such events. Alex Eng talked about obstacles of Zanata and future plans. As the last speaker Jean-Baptiste Holcroft really rocked the place with his short but interesting presentation suggestively illustrating what we had achieved so far and where are we going next.

Jean-Baptiste Holcroft…

Further plans of the localization team, especially the Transtats project, provoked a long discussion which attracted Brian Exelbierd. The discussion was so long that we had to move to another room because the next session was just about to begin. Transtats seems to be the tool of the future: it connects multiple services (Zanata, git, etc.) It will prevent translators from translating old strings which have been already removed by the developers, will automate the workflow from develoers to translators and to packagers, will help enforcing string freezes.

Day #4

September 1, 2017

This day was meant as a summary. Speakers had a chance to talk about what they had achieved during the conference. It was a really short day because at 11 AM we were supposed to check out from the rooms. However, the hotel staff was really nice and allowed us tho hang up a little longer to finish the talks and say goodbye to our friends. This did not change much, though. The bus to the airport and the airplane back home would not wait longer, 3 PM was the last minute to leave Hyannis.

Some (but not all!) of my friends

I am thankful to the organizers for inviting me, I send my greetings to old and new friends who I saw during the conference. See you all next year or maybe even sooner!

How Polish Plurals in MATE Went Broken

On March 13, 2017 the new version 1.18 of MATE Desktop was released. One of the last minute changes in the project was pulling the most recent translations from Transifex. Usually this is a good thing but apparently for the Polish language this turned out to be a little disaster because the plural rules have been (incorrectly) changed.

Plural rules

Foreign readers deserve an explanation here. Polish plural rules (as well as of several other Slavic languages) are a little more complex than English. There are three forms required:

  • 1 – singular – that’s obvious and similar to English and other Indo-European languages.
  • 2, 3, 4, and anything ending with 2, 3, 4 except 12, 13, 14 (for example: 22, 23, 24, 32, 33, 34 and so on). This group is sometimes referred to as few in some internationalization toolkits.
  • everything else (5 and greater except the numbers mentioned above). This group is sometimes referred to as many.

Plurals support in gettext package is good and complete. All we need is to write the correct rules in the header of a *.po file. This task should be done once and the rules can be reused for every translation into the same language because the grammar rules don’t change often, we can safely assume that they never change. Usually for Polish translations we use this formula:

"Plural-Forms: nplurals=3; plural=(n==1 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2);n"

This expression is neither simple nor complex. Just sufficient to describe what the language needs.

Here comes the disaster

On March 13 the commit synchronizing translations from Transifex changed the plural rules for Polish language. The new formula is:

“Plural-Forms: nplurals=4; plural=(n==1 ? 0 : (n%10>=2 && n%10<=4) && (n%100<12 || n%100>=14) ? 1 : n!=1 && (n%10>=0 && n%10<=1) || (n%10>=5 && n%10<=9) || (n%100>=12 && n%100<=14) ? 2 : 3);n" [/code] Now this is complex, isn't it? What's wrong with this expression:

  • it states that Polish language needs 4 forms to support plurals which is not true;
  • it is unnecessarily complex: if the expression states that n==1 belongs to the group 0 there is no need to make sure that n!=1 in the further part;
  • the complexity leads to one actual bug: the second group includes all numbers which end with 2, 3, 4 (correct), except 12 and 13 (incorrect, 14 must be excluded as well);
  • the result 3 is unreachable which is correct but confusing for translators.

As MATE Desktop is a large project consisting of multiple applications (like Caja file manager, Pluma text editor etc.) the same happened to every single application of the project.

Difficult to fix

The bug has been reported to the upstream immediately. The MATE project maintainres responded that the bug came from Transifex: it is pointless to fix it in the MATE source code repository because the next pull will overwrite the fix.

Unfortunately, it is not so easy to file a ticket in Transifex. It does not have Bugzilla nor any other ticket system. However, some people managed to contact Transifex team. They responded that they have pulled the plural rules from CLDR which lists 4 plural forms for the Polish language although they admitted that assigning the number 14 to the few plural group is their fault and fixed this. As MATE project continues pulling translations from Transifex more and more of their applications will start handling the number 14 correctly. Some of the applications have been updated recently, the update is a part of the 1.19 development release.

What CLDR says

Let’s look what CLDR database says about the Polish plural rules. Indeed, it lists 4 groups and there is a mysterious v parameter which has something in common with fractions because the sample expressions display the fractional forms. But as gettext supports integer values only we should drop the fractional cases totally.

The documentation of that v parameter is difficult to find but as soon as you find it you can read it means number of visible fraction digits in n, with trailing zeros. In this sentence, n is the number controlling the plural form itself.

Other languages

CLDR provides additional forms for fractions for other languages as well: Czech, Manx, Russian, Slovak, Ukrainian. For some other languages (Bosnian, Croatian, Filipino, Macedonian, Serbian, Lower and Upper Sorbian) the rules seem to be even more complex: fractional values belong to multiple integer groups.

This should be a warning for other languages that their rules might have been broken in Transifex as well. However, the further investigation of MATE Desktop source code does not reveal any recent changes in plural rules of other languages.

Conclusions

It seems that pulling plural rules from CLDR automatically is not a good idea.

Translators and language coordinators: please make sure that your plural rules are correct.

Transifex and other translation platforms: please don’t pull the translation rules from CLDR without a thorough analysis. Better ask the language communities and reuse the existing rules.

CLDR: please simplify your plural expressions and make the documentation of fractions support easier to access.

glibc 2.26: New and Updated Locales

On August 2, 2017 glibc (The GNU C library) version 2.26 has been released. Among others, many issues related with supported locales have been addressed, most of them shortly before the release. Let’s see what has been changed.

New locales

Compared to the previous version, this release introduces the support of 6 new languages: Aguaruna, Bislama, Fiji Hindi, Samoan, Tok Pisin, and Tongan as well as 2 new variants: South Azerbaijani for Iran, and Maithili for Nepal.

Aguaruna is a language spoken by about 38,000–45,000 indigenous people in Peru. Bislama is an official language of Vanuatu although spoken by about 10,000 people only. Fiji Hindi is a language descending from although different than Hindi. It is spoken by about 300,000 citizens of Fiji which makes about ⅓ of its total population and is one of the official languages of the country. It is written using both the Latin and the Devanagari script. This release introduces the Latin script only but Devanagari is also considered to be introduced in future. Tok Pisin is one of the official languages of Papua New Guinea. Although spoken by only 120,000 native speakers which makes 1.7% of total population it is the most widely used language of the country. No wonder since Papua New Guinea features about 850 native languages.

South Azerbaijani is a variant of Azerbaijani language spoken by about 13 million people (16% of total population) in Iran and Maithili is spoken by about 3 million people (11.5% of total population) in Nepal. Both have been previously represented by their variants for Azerbaijan and India, respectively. Now their users may enjoy more granularity.

Updates

Bugs in alphabetic sorting in Hungarian and Malayalam (see also: here) have been fixed. But lots of other fixes have been introduced in date and time elements, mostly in month names. Typos in either full or abbreviated or both names have been fixed, among others, in Arabic (many variants), Belarusian, Breton, Friulian, Hindi, Kannada, Konkani, Malayalam, Marathi, Mongolian, Northern Sami, Serbian (Latin only), Spanish (Peru and Uruguay), Uzbek, Yoruba, Zulu — total of 55 languages have been updated to the content of CLDR version 31. Weekday names have been updated in Arabic, Chechen, and Kashmiri — Saudi Arabian users had them displayed in English so far. Yes and no translated strings have been added or fixed in many languages.

Incorrectly appended trailing spaces have been removed in several locales, usually from weekday names. They mainly include languages of India but also Albanian (where the issue has been first spotted), Haitian, Maltese, and more. This change will polish date formatting in these locales.

Unicode 10.0

This version also introduces the full support of Unicode 10.0. The changes are mainly focused on new emoji characters.

It’s worth mentioning that the full Unicode 10.0 support has been added to glibc only 2 days after its official release by the Unicode Consortium.