This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: glibc: standard date/time format patch [drifting OT]


Eduardo Pérez Ureta <eperez@it.uc3m.es> writes:

> On 2002-08-17 09:10:24 -0400, Owen Taylor wrote:
> > Russ Allbery <rra@stanford.edu> writes:
> > > Eduardo Pérez Ureta <eperez@it.uc3m.es> writes:
> > > > Sure, POSIX says so. But POSIX should follow International Standards
> > > > instead American Standards.
> > > 
> > > > I don't want my system following Standards that only apply to America.
> > > 
> > > > POSIX should be corrected about that.
> > > 
> > > The appropriate time to ask vendors to change is after POSIX has been
> > > changed, I think.  It's better to comply with POSIX than to diverge from
> > > it even if the divergence seems to make more logical sense.
> > > 
> > > When you take up this issue with the POSIX working group, you'll discover
> > > that they had good reasons for standardizing the date format that they
> > > did, and also have some ideas about how to transition to more standard
> > > international dates (basically by having users use locales more than they
> > > do now).
> > 
> > And there are, in fact, numerous other ways that the C locale 
> > isn't fully functional; the two most obvious being:
> > 
> >  - The character set is ASCII. (So, many programs won't be
> >    able to display 'Eduardo Pérez Ureta'! :-)
> 
> You are right. But, as the main pango programmer, you know that
> there's a confusion between the terminal charset and the string encoding
> charset. The program has to know in what charset the characters come and
> in what charset the characters you should output. If you have the fonts
> why can't you see accentuated Latin characters or Japanese text in the C
> locale on a xterm.

I guess what you are saying is that the terminal encoding can be
different from the LC_CTYPE that the *terminal* process is running
under.

(Please don't think this is a question of fonts; it so happens that
you can pick a single byte encoding for xterm by selecting the 
right XLFD, but this is completely an implementation detail and
makes no sense with any less brain-dead font system.)

Yes, this is occasionally useful ... the most common case where
it is useful is when you telnet/ssh from one machine, to a different
machine running with a different LC_CTYPE.

So, we can have:

 Encoding for the terminal display != LC_CTYPE of terminal process

But that doesn't mean we can have:

 Encoding for the terminal display != LC_CTYPE of application

If the application has some text to display: say, 'Eduardo Pérez Ureta',
then it has to know:

 A) What encoding the text is in.
 B) What encoding it needs to be converted to for terminal display.

A) is an application detail. For B), the application needs to look
at LC_CTYPE. In the 'C' locale, the answer for this is 'ASCII',
so the application simply has no way of display the 'é'. 

>  - strcol() uses a nonsensical ordering.

strcoll(3) [ two l's, sorry for the typo ] is supposed to compare to 
strings in a 'linguistically' (*) meaningful way. For the C locale, 
strcoll() is identical to strcmp(), which gives an ordering which is not
remotely close to the right ordering for *any* locality.

Regards,
                                        Owen


(*) Not really the right term, what I mean is 'according to the
    conventions of lexicographers, telephone book editors, etc,
    in a particular locality.'


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]