14641 – LC_NAME: deprecate locale category

Bug 14641 - LC_NAME: deprecate locale category

Summary: LC_NAME: deprecate locale category

Status:	REOPENED

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	localedata (show other bugs)
Version:	unspecified

Importance:	P2 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2012-09-28 11:33 UTC by Philip Withnall
Modified:	2019-01-02 11:59 UTC (History)
CC List:	6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Flags:	fweimer: security-

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Philip Withnall 2012-09-28 11:33:42 UTC

It’s useful that glibc’s locale data collects together different locales’ ways of formatting names as name_fmt (lh.2xlibre.net/values/name_fmt/). However, this is difficult to use, and requires programs to implement their own parser for the field descriptors. This causes duplication of code and means all the programs have to be kept up-to-date with any changes to the set of allowed field descriptors.

Would it be possible to add a function similar to strftime() which will parse a *name* format string and substitute values for its field descriptors?

Comment 1 keld@keldix.com 2012-09-28 12:05:59 UTC

On Fri, Sep 28, 2012 at 11:33:42AM +0000, bugzilla at tecnocode dot co.uk wrote:
> 
> http://sourceware.org/bugzilla/show_bug.cgi?id=14641
> 
>              Bug #: 14641
>            Summary: Add a strftime()-like function for formatting human
>                     names
>            Product: glibc
>            Version: unspecified
>             Status: NEW
>           Severity: enhancement
>           Priority: P2
>          Component: localedata
>         AssignedTo: unassigned@sourceware.org
>         ReportedBy: bugzilla@tecnocode.co.uk
>                 CC: libc-locales@sources.redhat.com
>     Classification: Unclassified
> 
> 
> It???s useful that glibc???s locale data collects together different locales??? ways
> of formatting names as name_fmt (lh.2xlibre.net/values/name_fmt/). However,
> this is difficult to use, and requires programs to implement their own parser
> for the field descriptors. This causes duplication of code and means all the
> programs have to be kept up-to-date with any changes to the set of allowed
> field descriptors.
> 
> Would it be possible to add a function similar to strftime() which will parse a
> *name* format string and substitute values for its field descriptors?

yes, this is possible. I think it would be better to have one standardized way than several
homegrown. Coul we find out what are the different current APIs vailable in different
implementations?

Comment 2 Rich Felker 2012-09-28 12:21:06 UTC

I think this feature is fundamentally misguided. There is no universal way to represent a "broken down" human name; the whole "first name last name" or "family name" concept is rooted in particular cultures and does not even apply to others. I don't think glibc has any business perpetuating the concept that it's reasonable for applications to store names like this.

Properly written applications should simply be storing a single "name" field as one string, possibly tagged with a subrange of the string to be used as a primary sort key. Other useful tagging would be rules specific to the named persons' culture for how to transform their name into a salutation, etc. But this is specific to the culture of the person being named, not the locale.

Comment 3 Simon McVittie 2013-11-06 12:52:02 UTC

(In reply to Rich Felker from comment #2)
> There is no universal way
> to represent a "broken down" human name; the whole "first name last name" or
> "family name" concept is rooted in particular cultures and does not even
> apply to others.

I agree that whenever more information is available, it's much better to use it, but applications don't always have that luxury.

The current motivation for this feature request is Folks, a library to aggregate contact/address-book information from various sources, much of it modelled on vCard syntax. If we know the formatted name (vCard's "FN") we do use it in preference to the structured name (vCard's "N", which might look like "N:Bach;Johann;Sebastian;Herr;"), but if all we know is the structured name, we need to display it somehow, and the user's locale seems a more reasonable guess than "they're probably American".

Comment 4 Rich Felker 2013-11-06 15:09:41 UTC

On Wed, Nov 06, 2013 at 12:52:02PM +0000, simon.mcvittie at collabora dot co.uk wrote:
> The current motivation for this feature request is Folks, a library to
> aggregate contact/address-book information from various sources, much of it
> modelled on vCard syntax. If we know the formatted name (vCard's "FN") we do
> use it in preference to the structured name (vCard's "N", which might look like
> "N:Bach;Johann;Sebastian;Herr;"), but if all we know is the structured name, we
> need to display it somehow, and the user's locale seems a more reasonable guess
> than "they're probably American".

I can't see the justification for having a library's behavior depend
on the user's locale, especially when the library's purpose is this
kind of information processing. Name formatting logic really belongs
in your library, with the options either as a setting (possibly with
an ability to request a reasonable default based on the locale) or
based on heuristics using the rest of the contact's info (like
country).

BTW I think this is getting mildly off-topic for the bug tracker, but
the reason I posted it here is that I think it's a justification for
which this functionality does not belong in libc -- the idea that
using the user's locale rather than a locale associated with the
individual name is the wrong thing to be doing, and that in library
code working with this kind of data, it's the third-party library's
job to do this hard work of getting it right.

Comment 5 Simon McVittie 2013-11-06 16:08:37 UTC

OK, so is your position that _NL_NAME_NAME_FMT should never be used, and if we need similar functionality, we should invent our own?

Comment 6 Rich Felker 2013-11-06 16:30:28 UTC

On Wed, Nov 06, 2013 at 04:08:37PM +0000, simon.mcvittie at collabora dot co.uk wrote:
> OK, so is your position that _NL_NAME_NAME_FMT should never be used, and if we
> need similar functionality, we should invent our own?

My opinion on this is not authoritative for glibc, but yes, my
position is that this locale property should be considered deprecated
and that new features using it should not be added. My reasoning is
that treating name formatting as a property of the user's locale is
fundamentally wrong. The way you format a person's name is a function
of _their_ cultural conventions, not your cultural conventions. Since
libc's locale system does not and cannot know the conventions
associated with the name being formatted, it cannot help you get the
correct results.

In some sense, _NL_NAME_NAME_FMT is less of an offense because it
might help programs know the right formatting (or a right default to
try) for new names introduced by the user. If a program takes the
format string and does its own formatting, it can also accept other
non-default formats. But if a program requests that the libc do the
formatting based on the current locale, there is no way to handle
non-default formats. In this sense, I would object less to a function
like strftime for names that took the name format string as an
explicit argument, rather than using the current locale's format
string. This is certainly an option that could be proposed and
discussed.

Comment 7 Ondrej Bilka 2013-11-06 20:33:09 UTC

On Wed, Nov 06, 2013 at 04:08:37PM +0000, simon.mcvittie at collabora dot co.uk wrote:
> --- Comment #5 from Simon McVittie <simon.mcvittie at collabora dot co.uk> ---
> OK, so is your position that _NL_NAME_NAME_FMT should never be used, and if we
> need similar functionality, we should invent our own?
>
Providing good support for formatting names is out of scope of libc. You
would want to get name declension, detect swapped first name (by looking
to name database), recognizing sex based on first name, which should be
handled by separate library.

I did not do a google querry but somebody probably already wrote library
that does this.

Comment 8 keld@keldix.com 2013-11-06 23:57:48 UTC

On Wed, Nov 06, 2013 at 04:30:28PM +0000, bugdal at aerifal dot cx wrote:
> http://sourceware.org/bugzilla/show_bug.cgi?id=14641
> 
> --- Comment #6 from Rich Felker <bugdal at aerifal dot cx> ---
> On Wed, Nov 06, 2013 at 04:08:37PM +0000, simon.mcvittie at collabora dot co.uk
> wrote:
> > OK, so is your position that _NL_NAME_NAME_FMT should never be used, and if we
> > need similar functionality, we should invent our own?
> 
> My opinion on this is not authoritative for glibc, but yes, my
> position is that this locale property should be considered deprecated
> and that new features using it should not be added. My reasoning is
> that treating name formatting as a property of the user's locale is
> fundamentally wrong. The way you format a person's name is a function
> of _their_ cultural conventions, not your cultural conventions. Since
> libc's locale system does not and cannot know the conventions
> associated with the name being formatted, it cannot help you get the
> correct results.
> 
> In some sense, _NL_NAME_NAME_FMT is less of an offense because it
> might help programs know the right formatting (or a right default to
> try) for new names introduced by the user. If a program takes the
> format string and does its own formatting, it can also accept other
> non-default formats. But if a program requests that the libc do the
> formatting based on the current locale, there is no way to handle
> non-default formats. In this sense, I would object less to a function
> like strftime for names that took the name format string as an
> explicit argument, rather than using the current locale's format
> string. This is certainly an option that could be proposed and
> discussed.

Well, the intention with this specification is to be able to format an address according
to the local conventions for the specific language and territory.
In most cases the information is also usable as a user's set of locale preferences,
but you are right that an address is mostly useful in the format of the 
local conventions for that address.

The intended use is then to switch to the locale of the address in question,
for eg formatting of an address for a postal letter. 

To find the correct locale for a given address is not straightforward.
You would often have a country associated with the address and then you could
find a locale related to that country. 

The information does relate to a i18n problem and does a much better job than the
often seen US formatting of addresses. I do not think it should be depreciated.

best regards
keld

Comment 9 Rich Felker 2013-11-07 02:26:39 UTC

On Wed, Nov 06, 2013 at 11:57:48PM +0000, keld at keldix dot com wrote:
> The intended use is then to switch to the locale of the address in question,
> for eg formatting of an address for a postal letter. 

This is not the way locales are supposed to be used. You don't just
keep switching them around at runtime. In your specific example of
formatting a letter, it's wrong, because you want the address
formatted according to the cultural conventions of the place in which
it's being sent, but the name written the way the recipient's name is
supposed to be written.

> To find the correct locale for a given address is not straightforward.
> You would often have a country associated with the address and then you could
> find a locale related to that country. 

But that has nothing to do with how the name should be formatted, only
with how the address should be formatted. Also, depending on your
locale, the matter of formatting an address can depend on the
conventions of the recipient's country or the sender's. In any case
this logic is all way outside the scope of libc locale.

Comment 10 keld@keldix.com 2013-11-07 12:29:55 UTC

On Thu, Nov 07, 2013 at 02:26:39AM +0000, bugdal at aerifal dot cx wrote:
> http://sourceware.org/bugzilla/show_bug.cgi?id=14641
> 
> --- Comment #9 from Rich Felker <bugdal at aerifal dot cx> ---
> On Wed, Nov 06, 2013 at 11:57:48PM +0000, keld at keldix dot com wrote:
> > The intended use is then to switch to the locale of the address in question,
> > for eg formatting of an address for a postal letter. 
> 
> This is not the way locales are supposed to be used. You don't just
> keep switching them around at runtime. In your specific example of
> formatting a letter, it's wrong, because you want the address
> formatted according to the cultural conventions of the place in which
> it's being sent, but the name written the way the recipient's name is
> supposed to be written.

If there are different users then it is only natural to switch to each user's
locale, eg when printing a name, or printing an address.
When printing a namei in an address, one should  follw the IPU standard
for this. This standard has several options. Either French or the local
language of the receiving country. And then there may be
more than one convention in a country, eg with multiple official langages.

A recipient may want a name to be wrttten in ways. Eg in the  Chinese, Indian, Arabic
or the Latin script. Also dependent on script the family name could be placed first or last.
Even with the Latin script, sometimes the family name is written first.
Or the family name is written in all capitals.

> > To find the correct locale for a given address is not straightforward.
> > You would often have a country associated with the address and then you could
> > find a locale related to that country. 
> 
> But that has nothing to do with how the name should be formatted, only
> with how the address should be formatted. Also, depending on your
> locale, the matter of formatting an address can depend on the
> conventions of the recipient's country or the sender's. In any case
> this logic is all way outside the scope of libc locale.

Yes it is mostly how the address should be formatted. But it could also
be used for the name formatting, eg in a letter, where you have written the text
in the language of the receiver, and you also want to format the name in that language.

I believe this is in scope of libc, meaning that this is to make an application
culturally adaptable. It is just a more advanced use than the normal i18n,
because we want to accomodate different users' cultural conventions.

Still, it can be used just for one set of user preferences,
eg in my country, Denmark, if I would send out a letter to many
people, they would almost all be Danish, and then a few Sewdes  and Norwegians
and possibly German adressees, who share the same cultural conventions
wrt naming and addresses. 

best regards
keld

Comment 11 Rich Felker 2013-11-07 15:00:40 UTC

On Thu, Nov 07, 2013 at 12:29:55PM +0000, keld at keldix dot com wrote:
> If there are different users then it is only natural to switch to each user's
> locale, eg when printing a name, or printing an address.

No. Locale names (and whether they even exist) are
implementation-defined. A correct application cannot use locales by
name, but can only use the user's configured locale or the C/POSIX
locale. Applications which assume the existence of particular locale
names are not portable, and even if you only cared about them working
on GNU/Linux systems, many such desktop systems only have one locale
installed (the user's own locale).

Even if you could assume the names and existence of locales, their
definitions may vary slightly between systems, which means the
interpretation of your data would not be portable. The key here is
that name "formatting" is not just presentation, it's actually
interpretation.

> I believe this is in scope of libc, meaning that this is to make an application
> culturally adaptable. It is just a more advanced use than the normal i18n,
> because we want to accomodate different users' cultural conventions.

No, the cultural conventions in question are not cultural conventions
of any users of the system. The data you're working with has been
encoded (I would go so far as to say "corrupted") in a way that's
dependent on the cultural conventions of the person whom it names (or
sometimes not even that, but the cultural conventions imposed on that
person by virtue of where they're living and their legal status
there). The problem is decoding it to the person's actual name.

Comment 12 keld@keldix.com 2013-11-07 19:02:45 UTC

On Thu, Nov 07, 2013 at 03:00:40PM +0000, bugdal at aerifal dot cx wrote:
> http://sourceware.org/bugzilla/show_bug.cgi?id=14641
> 
> --- Comment #11 from Rich Felker <bugdal at aerifal dot cx> ---
> On Thu, Nov 07, 2013 at 12:29:55PM +0000, keld at keldix dot com wrote:
> > If there are different users then it is only natural to switch to each user's
> > locale, eg when printing a name, or printing an address.
> 
> No. Locale names (and whether they even exist) are
> implementation-defined. A correct application cannot use locales by
> name, but can only use the user's configured locale or the C/POSIX
> locale. Applications which assume the existence of particular locale
> names are not portable, and even if you only cared about them working
> on GNU/Linux systems, many such desktop systems only have one locale
> installed (the user's own locale).

If you have an application that depends on more locales, then you need to have those
locales installed. That is the case with everything, you do need to have
the appropiate software installed to solve your job.

Anyway there are standards for locale names, that libc should honour,
such as ISO 15897. Given that we are talking libc, it is reasonable
to assume that the locales of libc can be present, and that naming of locales
of libc can be used.


> Even if you could assume the names and existence of locales, their
> definitions may vary slightly between systems, which means the
> interpretation of your data would not be portable. The key here is
> that name "formatting" is not just presentation, it's actually
> interpretation.

As long as the locales conform to standards, the results generated
should be culturally acceptable, even if the locale data differ slightly.

> > I believe this is in scope of libc, meaning that this is to make an application
> > culturally adaptable. It is just a more advanced use than the normal i18n,
> > because we want to accomodate different users' cultural conventions.
> 
> No, the cultural conventions in question are not cultural conventions
> of any users of the system. The data you're working with has been
> encoded (I would go so far as to say "corrupted") in a way that's
> dependent on the cultural conventions of the person whom it names (or
> sometimes not even that, but the cultural conventions imposed on that
> person by virtue of where they're living and their legal status
> there). The problem is decoding it to the person's actual name.

Your interpretation is not in line with POSIX or ISO i18n model
(TR 11017) nor ISO C. And it is not in line with IPU recommendations. 

best regards
keld

Comment 13 Florian Weimer 2014-06-17 04:17:33 UTC

Closing per previous discussion.

Comment 14 Philip Withnall 2014-06-17 07:32:46 UTC

(In reply to Florian Weimer from comment #13)
> Closing per previous discussion.

As per comments #5 and #6, I think the proper solution is to deprecate name_fmt, since it seems to be fundamentally incorrect to have it in libc, and keeping it un-deprecated just encourages people to use it incorrectly or unsuccessfully.

Comment 15 Florian Weimer 2014-06-17 07:48:21 UTC

(In reply to Philip Withnall from comment #14)
> (In reply to Florian Weimer from comment #13)
> > Closing per previous discussion.
> 
> As per comments #5 and #6, I think the proper solution is to deprecate
> name_fmt, since it seems to be fundamentally incorrect to have it in libc,
> and keeping it un-deprecated just encourages people to use it incorrectly or
> unsuccessfully.

Fair enough.  That would extend to LC_NAME, LC_ADDRESS, LC_TELEPHONE, I suppose.  What about LC_MEASUREMENT and LC_IDENTIFICATION?

Related identifiers include _NL_NAME_*, _NL_ADDRESS_*, _NL_TELEPHONE_*, plus the _NL_NUM_* identifiers.  Anything else?

Comment 16 keld@keldix.com 2014-06-17 16:54:56 UTC

On Tue, Jun 17, 2014 at 07:48:21AM +0000, fweimer at redhat dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=14641
> 
> --- Comment #15 from Florian Weimer <fweimer at redhat dot com> ---
> (In reply to Philip Withnall from comment #14)
> > (In reply to Florian Weimer from comment #13)
> > > Closing per previous discussion.
> > 
> > As per comments #5 and #6, I think the proper solution is to deprecate
> > name_fmt, since it seems to be fundamentally incorrect to have it in libc,
> > and keeping it un-deprecated just encourages people to use it incorrectly or
> > unsuccessfully.
> 
> Fair enough.  That would extend to LC_NAME, LC_ADDRESS, LC_TELEPHONE, I
> suppose.  What about LC_MEASUREMENT and LC_IDENTIFICATION?
> 
> Related identifiers include _NL_NAME_*, _NL_ADDRESS_*, _NL_TELEPHONE_*, plus
> the _NL_NUM_* identifiers.  Anything else?

Hmm, what is the reason for this? Name layout and the other issues are
cultural conventions that should be adaptable in software.

Best regards
Keld

Comment 17 Rich Felker 2014-06-17 17:08:15 UTC

There are two different arguments for deprecation of name_fmt that can be made based on comments in this pr:

1. Imposing a structure of first name, middle name, last name, etc. on names is itself a cultural convention that's far from universal. Providing a culture-specific way to format these poorly-thought-out name components into a combined string does not solve the problem of making a program compatible with diverse cultural conventions since such storage is already wrongly imposing particular conventions.

2. Locales should deal with the cultural conventions of the user's cultural environment, not the conventions associated with a particular piece of data.

Of these, #1 only applies directly to name_fmt. If similar arguments apply to other items, that may be a good argument for their deprecation, but that would be a separate discussion. Argument #2 on the other hand applies to a much broader class of items, but I think it's also less clear-cut that it's correct. With the existence of uselocale/locale_t and de-facto conventions for locale names, one could argue that it's reasonable to use the locale system for dealing with data records where each record has associated with it a cultural context in which it's to be interpreted. I think this is probably a bad design (for example, many systems omit installation of all locales except the user's, which would affect what data they can process, and having a full locale installation is a lot more costly than just having external data on country-code-specific telephone number formatting, etc.) but I can see where some people would prefer it.

Comment 18 Philip Withnall 2014-06-19 22:47:13 UTC

(In reply to Florian Weimer from comment #15)
> (In reply to Philip Withnall from comment #14)
> > (In reply to Florian Weimer from comment #13)
> > > Closing per previous discussion.
> > 
> > As per comments #5 and #6, I think the proper solution is to deprecate
> > name_fmt, since it seems to be fundamentally incorrect to have it in libc,
> > and keeping it un-deprecated just encourages people to use it incorrectly or
> > unsuccessfully.
> 
> Fair enough.  That would extend to LC_NAME, LC_ADDRESS, LC_TELEPHONE, I
> suppose.  What about LC_MEASUREMENT and LC_IDENTIFICATION?

LC_MEASUREMENT and LC_IDENTIFICATION are in active use, and seem reasonably well-defined and useful (for example, knowing which temperature units to use in the current locale). LC_TELEPHONE also seems useful, giving international call codes in and out of the current country.

I guess I’d say the cutoff is whether the formatting depends on the origin of the data. For phone numbers it doesn’t (calling codes are internationally defined), but for names and addresses.

> Related identifiers include _NL_NAME_*, _NL_ADDRESS_*, _NL_TELEPHONE_*, plus
> the _NL_NUM_* identifiers.  Anything else?

I can’t find anything else.

Comment 19 keld@keldix.com 2014-06-20 05:48:35 UTC

On Tue, Jun 17, 2014 at 05:08:15PM +0000, bugdal at aerifal dot cx wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=14641
> 
> --- Comment #17 from Rich Felker <bugdal at aerifal dot cx> ---
> There are two different arguments for deprecation of name_fmt that can be made
> based on comments in this pr:
> 
> 1. Imposing a structure of first name, middle name, last name, etc. on names is
> itself a cultural convention that's far from universal. Providing a
> culture-specific way to format these poorly-thought-out name components into a
> combined string does not solve the problem of making a program compatible with
> diverse cultural conventions since such storage is already wrongly imposing
> particular conventions.

Well, the LC_MAME category was designed on research on very many countries and cultures. 
Do you have examples where this does not work?

We also have some areas which do not always work culturally, for example 
dates, where we do not cover the islamic calender. Era calenders
are not covered either.

> 2. Locales should deal with the cultural conventions of the user's cultural
> environment, not the conventions associated with a particular piece of data.

Well, then we should have a look on who is actually the user.
In the case of names you could regard the receiver of a letter as the final user.

The LC_NAME (and LC_ADDTESS) category addresses a real need for providing
culturally acceptable software. Really many cultures use the Family name,
first name convention, which is offending to me for my name. The world is rather
split in two halves on the order of given name and family name.

The same with the other categories. They can be used to format culturally
correct data for really many cultures. But they are not fully complete
to cover all of the circumstances in the world. I would say that LC_TIME
is worse off than LC_NAME in the percentage of cases in the world it covers.

> Of these, #1 only applies directly to name_fmt. If similar arguments apply to
> other items, that may be a good argument for their deprecation, but that would
> be a separate discussion. Argument #2 on the other hand applies to a much
> broader class of items, but I think it's also less clear-cut that it's correct.
> With the existence of uselocale/locale_t and de-facto conventions for locale
> names, one could argue that it's reasonable to use the locale system for
> dealing with data records where each record has associated with it a cultural
> context in which it's to be interpreted. I think this is probably a bad design
> (for example, many systems omit installation of all locales except the user's,
> which would affect what data they can process, and having a full locale
> installation is a lot more costly than just having external data on
> country-code-specific telephone number formatting, etc.) but I can see where
> some people would prefer it.

The locale data does occupy quite some space. But in these days
disk space is pretty cheap, except for the telephones. If we only install the
locale data itself, and not all the message catalogues, then the extra cost is reasonable.
glibc is not installed normally on android anyway. 

I don't know how many apps are using the LC_NAME category, but I estimate it is not
many. I could see it handy in an address book, or telephone directory. And also in a word
processing environment, for writing letters. I think what we have in LC_NAME is the
best information in the market, and thus it gives glibc and edge. 
For the users that use the data it is really good that it is supported. 

Best regards
Keld

Comment 20 Rich Felker 2014-06-20 06:16:42 UTC

On Fri, Jun 20, 2014 at 05:48:35AM +0000, keld at keldix dot com wrote:
> The LC_NAME (and LC_ADDTESS) category addresses a real need for providing
> culturally acceptable software. Really many cultures use the Family name,
> first name convention, which is offending to me for my name. The world is
> rather
> split in two halves on the order of given name and family name.

That's the naive and mistaken view that resulted in the creation of
LC_NAME/name_fmt to begin with. In reality there are a lot more
conventions than just these two, but the prevalence of information
systems that require names to be stored as a first/given name and a
last/family name seems to have made their existence much less visible.
Here are some other cultural conventions:

- Only one name. Attempting to force these into a given/family pattern
  is offensive and results in things like being assigned a "first
  name" of "FNU" and having your given name treated as a family name.

- Multiple names, but all given. Attempting to force these into a
  given/family pattern is almost always offensive because it will
  result in things like "Mr./Ms. Second-name", presenting a name
  backwards as "Second-name First-name", etc.

- Multiple family names. I don't even know how this is typically
  handled.

Wikipedia has some detailed coverage of the issues:

http://en.wikipedia.org/wiki/Family_name
http://en.wikipedia.org/wiki/Mononymous_person

Comment 21 keld@keldix.com 2014-06-20 11:04:18 UTC

On Fri, Jun 20, 2014 at 06:16:42AM +0000, bugdal at aerifal dot cx wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=14641
> 
> --- Comment #20 from Rich Felker <bugdal at aerifal dot cx> ---
> On Fri, Jun 20, 2014 at 05:48:35AM +0000, keld at keldix dot com wrote:
> > The LC_NAME (and LC_ADDTESS) category addresses a real need for providing
> > culturally acceptable software. Really many cultures use the Family name,
> > first name convention, which is offending to me for my name. The world is
> > rather
> > split in two halves on the order of given name and family name.
> 
> That's the naive and mistaken view that resulted in the creation of
> LC_NAME/name_fmt to begin with. In reality there are a lot more
> conventions than just these two, but the prevalence of information
> systems that require names to be stored as a first/given name and a
> last/family name seems to have made their existence much less visible.
> Here are some other cultural conventions:
> 
> - Only one name. Attempting to force these into a given/family pattern
>   is offensive and results in things like being assigned a "first
>   name" of "FNU" and having your given name treated as a family name.
> 
> - Multiple names, but all given. Attempting to force these into a
>   given/family pattern is almost always offensive because it will
>   result in things like "Mr./Ms. Second-name", presenting a name
>   backwards as "Second-name First-name", etc.
> 
> - Multiple family names. I don't even know how this is typically
>   handled.
> 
> Wikipedia has some detailed coverage of the issues:
> 
> http://en.wikipedia.org/wiki/Family_name
> http://en.wikipedia.org/wiki/Mononymous_person

I think all of these examples can be covered with the existing LC_NAME spec.
And they were all known at the time of specification.

Best regards
keld

Comment 22 Philip Withnall 2014-06-20 13:01:22 UTC

(In reply to keld@keldix.com from comment #21)
> I think all of these examples can be covered with the existing LC_NAME spec.
> And they were all known at the time of specification.

Have you got a link to that and the associated discussion? The best I can find is
    https://sourceware.org/git/?p=glibc.git;a=commit;f=locale/langinfo.h;h=4b10dd6c1959577f57850ca427a94fe22b9f3299
which is not very informative.

Comment 23 keld@keldix.com 2014-06-21 18:35:58 UTC

On Fri, Jun 20, 2014 at 01:01:22PM +0000, bugzilla at tecnocode dot co.uk wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=14641
> 
> --- Comment #22 from Philip Withnall <bugzilla at tecnocode dot co.uk> ---
> (In reply to keld@keldix.com from comment #21)
> > I think all of these examples can be covered with the existing LC_NAME spec.
> > And they were all known at the time of specification.
> 
> Have you got a link to that and the associated discussion? The best I can find
> is
>    
> https://sourceware.org/git/?p=glibc.git;a=commit;f=locale/langinfo.h;h=4b10dd6c1959577f57850ca427a94fe22b9f3299
> which is not very informative.

This was discussion in ISO Sc22/WG20. The main input paper was:
http://www.cicc.or.jp/english/hyoujyunka/databook/databook.pdf
Some was discussed orally in WG20 meetings, I do not think there are 
a specific recording of that discussion.

Best regards
Keld

Comment 24 Florian Weimer 2014-06-23 07:44:34 UTC

(In reply to keld@keldix.com from comment #21)
> I think all of these examples can be covered with the existing LC_NAME spec.
> And they were all known at the time of specification.

I think the discrimination of married vs unmarried women in name formatting is now considered obsolete and perhaps even slightly offensive.

In any case, proper name formatting is not something related very strongly to culture anymore, but to individual persons and the relationships among them.  Locales are more or less country-based, so they are a poor way to select name formatting rules.

Even telephone number formatting isn't as straightforward as it may seem.  In Germany, there are three major ways of formatting phone numbers, and it seems that de_DE.UTF-8 uses neither of them (it's difficult to tell because the formatting codes are undocumented).

Comment 25 keld@keldix.com 2014-06-23 13:02:24 UTC

On Mon, Jun 23, 2014 at 07:44:34AM +0000, fweimer at redhat dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=14641
> 
> --- Comment #24 from Florian Weimer <fweimer at redhat dot com> ---
> (In reply to keld@keldix.com from comment #21)
> > I think all of these examples can be covered with the existing LC_NAME spec.
> > And they were all known at the time of specification.
> 
> I think the discrimination of married vs unmarried women in name formatting is
> now considered obsolete and perhaps even slightly offensive.

Then you can make them the same (Ms).

> In any case, proper name formatting is not something related very strongly to
> culture anymore, but to individual persons and the relationships among them. 
> Locales are more or less country-based, so they are a poor way to select name
> formatting rules.

The locales are culturally oriented. I cannot speak for every culture, but 
proper name formatting still seems relevant in many cultures.
What we have is not perfect, but it can be used to provide more culturally 
acceptable results. But I think there is room for improvement, 
I invite such suggestions for improvement. Do you have any suggestions?

> Even telephone number formatting isn't as straightforward as it may seem.  In
> Germany, there are three major ways of formatting phone numbers, and it seems
> that de_DE.UTF-8 uses neither of them (it's difficult to tell because the
> formatting codes are undocumented).

The idea was at least to get rid of the USA way, which is not culturally acceptable
in many countries. Then a number of ways would be kind of OK - in my country
there are also some common examples, and then there is an official recommendation.

Best regards
keld

Comment 26 Marko Myllynen 2014-06-23 15:42:46 UTC

(In reply to Florian Weimer from comment #24)
> 
> Even telephone number formatting isn't as straightforward as it may seem. 
> In Germany, there are three major ways of formatting phone numbers, and it
> seems that de_DE.UTF-8 uses neither of them (it's difficult to tell because
> the formatting codes are undocumented).

For documentation of the codes please see for example http://www.open-std.org/JTC1/sc22/WG20/docs/n972-14652ft.pdf or http://man7.org/linux/man-pages/man5/locale.5.html.

Comment 27 keld@keldix.com 2014-06-23 21:07:31 UTC

On Mon, Jun 23, 2014 at 01:02:24PM +0000, keld at keldix dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=14641
> 
> --- Comment #25 from keld at keldix dot com <keld at keldix dot com> ---
> On Mon, Jun 23, 2014 at 07:44:34AM +0000, fweimer at redhat dot com wrote:
> > https://sourceware.org/bugzilla/show_bug.cgi?id=14641
> > 
> > --- Comment #24 from Florian Weimer <fweimer at redhat dot com> ---
> > (In reply to keld@keldix.com from comment #21)
> > > I think all of these examples can be covered with the existing LC_NAME spec.
> > > And they were all known at the time of specification.
> > 
> > I think the discrimination of married vs unmarried women in name formatting is
> > now considered obsolete and perhaps even slightly offensive.
> 
> Then you can make them the same (Ms).

One could actually introduce a new keyword for women unmarried+married.
This is a convention found in many cultures.

Best regards
keld

Comment 28 Rich Felker 2014-06-23 21:19:57 UTC

> One could actually introduce a new keyword for women unmarried+married.
> This is a convention found in many cultures.

This would be a very bad change from my perspective. The entire aim of the locale system should be avoiding offending users by presenting information in a way that's culturally inappropriate. While in many cultures there is such a historical distinction in titles, it's generally not necessary to use such titles at all, and there will be a segment of members of the given culture who are offended by it, consider it backwards, misogynist, etc. like Florian mentioned. The locale system should not be reinforcing or giving preference to conservative elements of the cultures it's modelling. It should be neutral and acceptable to as diverse a group of people within the culture as possible.

On a related issue, even storing people's gender or sex in your data is a bad idea unless it's absolutely essential. What do you do when the person's gender is ambiguous (particularly a problem in information systems where an employee, rather than the person being identified, enters their information into the system), or when the gender on their legal documents does not match the gender they identify as? Many systems nowadays seem to ask users to choose their title rather than asking them for gender, which seems like a thinly-veiled way of asking for gender, but even that has problems. For example you risk non-native speakers of the language not understanding what title means or what the choices are, then getting offended later when they're called by a gender-inappropriate title they (accidentally) selected.

Anyway perhaps this is all tangential, but my point is that the locale system should be deprecating all of these things rather than reinforcing them.

Comment 29 keld@keldix.com 2014-06-23 22:12:53 UTC

On Mon, Jun 23, 2014 at 09:19:57PM +0000, bugdal at aerifal dot cx wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=14641
> 
> --- Comment #28 from Rich Felker <bugdal at aerifal dot cx> ---
> > One could actually introduce a new keyword for women unmarried+married.
> > This is a convention found in many cultures.
> 
> This would be a very bad change from my perspective. The entire aim of the
> locale system should be avoiding offending users by presenting information in a
> way that's culturally inappropriate. While in many cultures there is such a
> historical distinction in titles, it's generally not necessary to use such
> titles at all, and there will be a segment of members of the given culture who
> are offended by it, consider it backwards, misogynist, etc. like Florian
> mentioned. The locale system should not be reinforcing or giving preference to
> conservative elements of the cultures it's modelling. It should be neutral and
> acceptable to as diverse a group of people within the culture as possible.
> 
> On a related issue, even storing people's gender or sex in your data is a bad
> idea unless it's absolutely essential. What do you do when the person's gender
> is ambiguous (particularly a problem in information systems where an employee,
> rather than the person being identified, enters their information into the
> system), or when the gender on their legal documents does not match the gender
> they identify as? Many systems nowadays seem to ask users to choose their title
> rather than asking them for gender, which seems like a thinly-veiled way of
> asking for gender, but even that has problems. For example you risk non-native
> speakers of the language not understanding what title means or what the choices
> are, then getting offended later when they're called by a gender-inappropriate
> title they (accidentally) selected.
> 
> Anyway perhaps this is all tangential, but my point is that the locale system
> should be deprecating all of these things rather than reinforcing them.

Well, my take is that we are not political with the locales. We just record
what cultural things that are in use. Then we leave it to implementers
and users to do what they want to do, in a way that is culturally acceptable.

I myself do not use titles normally in my work, for example in my ISO
work, but many of my collegues do. I then provide for their use, and
allow them to do what is natural to them. I also recognize that in a culture,
there may be several levels of politeness. An invitation to a formal
anniversary, or a legal letter may have other levels of politeness than a SMS:

In some way the locales are conservative, preserving the different cultures
of the world in the digital society. In some other way the locales are
liberating, allowing a culturally acceptable appearance of applications in
most of the cultures of the world. At least the locales should enable us to 
get rid of English oriented conventions, which are not acceptable in may
cultures of the world.

Best regards
Keld

Comment 30 Florian Weimer 2014-06-24 07:37:57 UTC

(In reply to keld@keldix.com from comment #29)
> Well, my take is that we are not political with the locales. We just record
> what cultural things that are in use. Then we leave it to implementers
> and users to do what they want to do, in a way that is culturally acceptable.

It's impossible to have apolitical locales.  We don't have one for Palestine as far as I can see, and our data clearly takes sides in the disputes about Macedonia and Taiwan (which is probably unavoidable one way or the other).

The lack of a way to encode royal and noble ranks in names could be considered a political statement as well.

We should move this to libc-alpha if there is anything more to discuss, but it seems unlikely we will reach agreement, so there is probably no point.

Comment 31 Mike Frysinger 2016-02-19 07:02:22 UTC

stripping down the locale data files sounds great to me

should we go as far as making the default return "" ?  nl_langinfo indicates that is the "not valid" return value which is what we want to indicate.

Comment 32 Marko Myllynen 2016-02-19 15:40:49 UTC

Is the discussion now about deprecating the name_fmt keyword only or the whole LC_NAME category?

In any case, if we deprecate name_fmt then the whole LC_NAME becomes useless for certain locales, for example fi_FI:

LC_NAME
name_fmt    "<U0025><U0064><U0025><U0074><U0025><U0067><U0025><U0074>/
<U0025><U006D><U0025><U0074><U0025><U0066>"
% Finnish equivalents for Mr/Mrs/Miss/Ms are herra/rouva/rouva/neiti
% but they are practically never used, thus we don't define them here.
END LC_NAME

Thanks.

Comment 33 Mike Frysinger 2016-02-19 16:15:02 UTC

(In reply to Marko Myllynen from comment #32)

my take away from this bug is that none of the fields should be used, plus it's pretty inconsistent whether locales even define them, or that culturally the locale/language even lines up well with the narrowly defined fields we've added (either having fewer or none, or having more).

Comment 34 keld@keldix.com 2016-02-19 18:30:25 UTC

As you may expect, I would like to retain the category.
I see use of this item regularily, but not in programs.
But in webforms and in normal text.

I am travelling now, and they ask me a number of times on this,
eg. for the flights and at the hotel.
So IMHO it is a category that is needed, and it is good
that is is supported in glibc, and that we have data
for it in many places.

Best regards
Keld

On Fri, Feb 19, 2016 at 04:15:02PM +0000, vapier at gentoo dot org wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=14641
> 
> --- Comment #33 from Mike Frysinger <vapier at gentoo dot org> ---
> (In reply to Marko Myllynen from comment #32)
> 
> my take away from this bug is that none of the fields should be used, plus it's
> pretty inconsistent whether locales even define them, or that culturally the
> locale/language even lines up well with the narrowly defined fields we've added
> (either having fewer or none, or having more).
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

Comment 35 Mike Frysinger 2016-02-19 19:00:45 UTC

sorry, but i don't find that to be a compelling use case.  locale categories/data in glibc should pass a high bar, especially when they are non-standard.  this data does not pass that bar imo since the use case it attempts to satisfy is not portable across locales -- which is the entire point of locales in the first place.

Comment 36 keld@keldix.com 2016-02-19 23:26:37 UTC

On Fri, Feb 19, 2016 at 01:30:45PM +0000, vapier at gentoo dot org wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=14641
> 
> --- Comment #35 from Mike Frysinger <vapier at gentoo dot org> ---
> sorry, but i don't find that to be a compelling use case.  locale
> categories/data in glibc should pass a high bar, especially when they are
> non-standard.  this data does not pass that bar imo since the use case it
> attempts to satisfy is not portable across locales -- which is the entire point
> of locales in the first place.

Well LC_NAME isi a standard ISO 30112 category.

Best regrads
Keld

Comment 37 Mike Frysinger 2016-02-20 04:50:29 UTC

(In reply to keld@keldix.com from comment #36)

i was looking at POSIX:
  http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html

ISO 30112 does not appear to be public, and i'm not about to waste $200 on private standards.  `man 5 locale` is about the only public reference i can find.

i don't see this limited bit of info being useful beyond toy programs, and even those are limited to specific locales.  if people are serious about localizing, they're better off with a real library.  having these fields in glibc misleads them into thinking they're more useful than they actually are.  better to cut them off sooner and drive them to fuller solutions.

Comment 38 keld@keldix.com 2016-02-22 08:10:38 UTC

On Sat, Feb 20, 2016 at 04:50:29AM +0000, vapier at gentoo dot org wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=14641
> 
> --- Comment #37 from Mike Frysinger <vapier at gentoo dot org> ---
> (In reply to keld@keldix.com from comment #36)
> 
> i was looking at POSIX:
>   http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html
> 
> ISO 30112 does not appear to be public, and i'm not about to waste $200 on
> private standards.  `man 5 locale` is about the only public reference i can
> find.

Well, the ISO C standard is not public either, and we do try to follow it.

There at copies of working drafts of ISO 14652 and ISO 30112 available:
http://www.open-std.org/JTC1/SC22/WG20/docs/n972-14652ft.pdf
http://www.open-std.org/JTC1/SC35/WG5/docs/30112d10.pdf
Both specs have explicit reference to glibc, and some specific cooperation with glibc.

> i don't see this limited bit of info being useful beyond toy programs, and even
> those are limited to specific locales.  if people are serious about localizing,
> they're better off with a real library.  having these fields in glibc misleads
> them into thinking they're more useful than they actually are.  better to cut
> them off sooner and drive them to fuller solutions.

As I wrote, this information is what is commonly sought for eg. flights and hotels.

What do you have in mind of improvements?

Best regards
Keld

Comment 39 Mike Frysinger 2016-02-22 09:05:34 UTC

(In reply to keld@keldix.com from comment #38)

the standard itself already admits defeat:
   The specification below should be regarded as a starting point for this problem.

i doubt people writing booking systems are relying on the locale data provided by glibc.  instead they're using databases they found online or are provided by another company.

i don't think this level of detail is appropriate anymore in any standards body or C library.  these processes are glacial at best.  having focused libs that are actively developed/updated is the common process nowadays.  the CLDR as an example does two full releases a year (with point fixes and betas on top of that).  by comparison, ISO 3122 has like one release every 5+ years ?

Comment 40 Pander 2018-12-20 08:38:37 UTC

Also in the light of more and more countries officially allowing a third gender https://en.wikipedia.org/wiki/Third_gender this has also influences on salutations such as https://en.wikipedia.org/wiki/Mx_(title) At the same time, specific salutations to indicate if a woman is married or not are becoming archaic https://en.wikipedia.org/wiki/Salutation#Dutch

So better migrate the content of LC_NAME to https://salsa.debian.org/iso-codes-team/iso-codes and after that has gone live in e.g. Debian, Ubuntu, Fedora, etc. remove it from glibc.

Comment 41 keld@keldix.com 2019-01-02 11:59:42 UTC

if you dont belive in th work we do her then go join cldr - if you are allowed to participate!!
and let us do our work


your remarko on this being a bginning - indicates that you do not grasp the nature of i18n
there are always things to do.

keld


On Mon, Feb 22, 2016 at 09:05:34AM +0000, vapier at gentoo dot org wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=14641
> 
> --- Comment #39 from Mike Frysinger <vapier at gentoo dot org> ---
> (In reply to keld@keldix.com from comment #38)
> 
> the standard itself already admits defeat:
>    The specification below should be regarded as a starting point for this
> problem.
> 
> i doubt people writing booking systems are relying on the locale data provided
> by glibc.  instead they're using databases they found online or are provided by
> another company.
> 
> i don't think this level of detail is appropriate anymore in any standards body
> or C library.  these processes are glacial at best.  having focused libs that
> are actively developed/updated is the common process nowadays.  the CLDR as an
> example does two full releases a year (with point fixes and betas on top of
> that).  by comparison, ISO 3122 has like one release every 5+ years ?
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.