This is the mail archive of the libc-hacker@sourceware.cygnus.com mailing list for the glibc project.

Note that libc-hacker is a closed list. You may look at the archives of this list, but subscription and posting are not open.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

fnmatch change


I just checked in a change to fnamtch which changes its behaviour wrt
to ranges.

Currently you are surprised if you do something like

	rm [a-c]*

if the locale != C.  In some locales this will remove also the files
beginning with uppercase characters.

The problem was that strcoll() was used.  This seemed correct since
the standard mentioned collation sequence order decides about the
range.  But the problem is the collation sequence order is not
collation order.

I talked with the original designer of the POSIX i18n interfaces two
weeks ago and he explained it.  They realized at that time that the
collation order is not suitable.  Therefore they are using collation
sequence order.  The problem is they are not defining this.

From the talks with the guy I learned that the collation sequence
order is the order of the collation definitions in the source file.
It's a nice way out and I have implemented this now.

One problem remains: the locale definitions must now be corrected.
Currently the definitions look like this:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<a>	<A>;<NONE>;<SMALL>;IGNORE
<A>	<A>;<NONE>;<CAPITAL>;IGNORE
<-a>	<A>;<NONE>;<-a>;IGNORE
<a'>	<A>;<ACUTE>;<SMALL>;IGNORE
<A'>	<A>;<ACUTE>;<CAPITAL>;IGNORE
<a!>	<A>;<GRAVE>;<SMALL>;IGNORE
<A!>	<A>;<GRAVE>;<CAPITAL>;IGNORE
<a!!>	<A>;<DOUBLE-GRAVE>;<SMALL>;IGNORE
<A!!>	<A>;<DOUBLE-GRAVE>;<CAPITAL>;IGNORE
<a(>	<A>;<BREVE>;<SMALL>;IGNORE
<A(>	<A>;<BREVE>;<CAPITAL>;IGNORE
<a('>	<A>;<BREVE+ACUTE>;<SMALL>;IGNORE
<A('>	<A>;<BREVE+ACUTE>;<CAPITAL>;IGNORE
<a(!>	<A>;<BREVE+GRAVE>;<SMALL>;IGNORE
<A(!>	<A>;<BREVE+GRAVE>;<CAPITAL>;IGNORE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

What has to happen is that upper- and lower-case character must be
defined in separate blocks.  This has no effect on the collation
order, but it ensures that [a-b] does not match A or B since the lines
with A or B in the locale source file are not between the lines with
the definitions for a or b.

I'm a bit reluctant to spend much time on the old locale descriptions.
Instead I'll check in in a few moments a ISO 14651 collation
description which already pays attention to this.  With the rewrite of
the locale data to the new format we can also switch over to using
this data.

-- 
---------------.      drepper at gnu.org  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]