This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Bug 10871 (genitive months names) summary and plans


This is a summary of the current state of my work on the bug 10871.

Let's define the success criteria.  There is only one main criterion:

    Provide an ability to display the months names in two different
    forms, depending on the context, according to the language rules.

Additional success criteria:

1. Maintain compatibility with existing solutions (BSD family, including
   OS X) and proposed future standards (POSIX).
2. Don't break any locale (language) which does not require this feature.
3. Don't break any existing application which displays the dates correctly.
4. Fix all existing applications which need a fix without any change
   in the application code.

Possible implementations have been discussed here: [1].

More details:

Although it is possible to achieve the main success (display two different
forms of a month name) without ensuring compatibility with other systems
I insist on ensuring it.  That means:

- implement "%OB" format specifier for strftime() corresponding to
  ALTMON_x constants for nl_langinfo(), they would generate the months
  names in a nominative form;
- existing "%B" and MON_x (which generate a nominative form now) would
  generate the months names in a genitive form in those languages which
  need it, they would keep providing a nominative form for those languages
  which don't need a genitive form.

If we chose a different solution then glibc would remain forever
incompatible with BSD [2] [3] and OS X [4] [5] and even with future POSIX
release. [6] It would cause problems for multiplatform libraries (e.g.,
glib2 [7]) which would have to deal with different and mutually opposite
standards in supported platforms.

Additional criterion 2. is possible by default: if a locale does not
need the genitive months names then they should not be provided in the
locale data.  The implementation will not create them magically.

I'm afraid that criteria 3. and 4. are impossible together: usually
we can't tell from inside glibc which form is needed.  Originally I thought
about implementing a smart algorithm trying to guess which form is
needed but it would fail (provide incorrect results) for the date
commandline utility which seems to me the most basic test.

In order to minimize the fallout I have provided an ABI versioning:
existing applications in binary form compiled against older version
of glibc would keep working as if there was no change.  Only those
applications will be broken which:

- will be recompiled from the source code against the newer version
  without a change,
- use the months names standalone (e.g., calendars).

Also please note that *majority* of the applications displaying dates
are already broken and would be fixed out of the box, without any change
in their source code.  And please note that by "broken" I don't mean
"not starting" or "crashing", I only mean that the months names may
have incorrect grammar.  At the moment all other applications already
generate incorrect months names and it seems to me that many people
are just used to it.

To be honest, I don't like my implementation of backward compatibility.
IMHO the source code is hardly readable and kinda dirty, especially
I don't like my trick to export one function as two different aliases:

strong_alias (__strftime_l_compat, __strftime_l_compat2)
compat_symbol (libc, __strftime_l_compat2, __strftime_l, GLIBC_2_3);
compat_symbol (libc, __strftime_l_compat, strftime_l, GLIBC_2_3);

An identifier with "2" at the end is dirty indeed.  Any comments
and suggestions how to implement it correctly without introducing
another function whose only purpose would be to call the original
function are welcome.

Now I'm afraid you guys hesitate to allow the "%OB" format specifier
to be added at all.  What else can I do to convince you to do this
step and allow the change?  I believe you will have to do it sooner
or later and even if you choose another solution you will have to
re-addopt this one.  Or convince all outer world to use your one. :-)

I wondered how to handle the locales which don't need multiple
versions or don't yet deliver them.  My initial idea was that in
these cases strftime("%OB") should generate the same as strftime("%B")
but nl_langinfo(ALTMON_x) should return NULL or an empty string.
Then the programmers should check if the result nl_langinfo(ALTMON_x)
is NULL or empty and return nl_langinfo(MON_x) instead, for example:

char *month = nl_langinfo(ALTMON_1 + n);
if (!month || !*month)
    month = nl_langinfo(MON_1 + n);

This would also give them a chance to determine if genitive months
names are provided or not.  Should it be implemented other way?
Should nl_langinfo() do the same substitution internally?  Should
we require locale data to provide always both versions of months names?

Note that this is the way nl_langinfo() reports an error: if you
define ALTMON_x constant and pass it to the currently existing
nl_langinfo() it will return an empty string.

However, after analyzing how FreeBSD supports it I discovered that
all their locale data contain all months names in both date-format
and standalone form, even if they are the same.  There is no algorithm:
retrieve nl_langinfo(ALTMON_x), then if it's NULL or empty then retrieve
nl_langinfo(MON_x).  The result of nl_langinfo(ALTMON_x) is guaranteed
to be ultimate.  If it's NULL or empty then probably it should be so.
This means that we should change *all* locale data, a minimum change
is to copy mon array into altmon array.  Since locale data are
part of glibc it is possible to do it in a consistent way.

After reading more about some western European languages (Romance)
which require "de" before a month name or "d’" if a month name begins
with a vowel I'm also tempted to provide abbreviated alternative months
names ("d’oct.", "de nov." and so on).  But so far I never heard
anybody asking for this and I'm not sure it will ever be useful at all.
Unless anybody asks for it now I suggest not to implement this at
the moment.

I'm not sure it is necessary to define the enum values as __ALTMON_*
(rather than ALTMON_*) and then (conditionally? #if __USE_GNU?)
#define them as ALTMON_*, as suggested in [8]. Do you guys think it is?

Documentation: I'm aware it is missing and definitely should be
provided.  As I said before, the only reason why I have not yet
provided it is that I'm not yet sure you guys would accept the change
in general.

We are approaching the freeze period but still have some weeks
to introduce the change.  I find it really necessary for many
languages from my part of the world.  I'd like to ask for more
support from more experienced glibc developers.  I appreciate
your work so far and I'm happy to share my experience, as far
as it could be helpful.

Regards,

Rafal


[1] https://sourceware.org/bugzilla/show_bug.cgi?id=10871#c7
[2] https://www.freebsd.org/cgi/man.cgi?query=strftime&sektion=3
[3] https://www.freebsd.org/cgi/man.cgi?query=nl_langinfo&sektion=3
[4]
https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man3/strftime.3.html
[5]
https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man3/nl_langinfo.3.html
[6] http://austingroupbugs.net/view.php?id=258
[7] https://bugzilla.gnome.org/show_bug.cgi?id=749206
[8] https://sourceware.org/ml/libc-alpha/2016-10/msg00303.html


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]