This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Inline function definitions for isdigit and isxdigit?


On 09/15/2016 08:02 PM, Joseph Myers wrote:
On Thu, 15 Sep 2016, Florian Weimer wrote:

Can we provide inline definitions for isdigit and xdigit?

POSIX says (for the digit character class):

“
In a locale definition file, only the digits <zero>, <one>, <two>, <three>,
<four>, <five>, <six>, <seven>, <eight>, and <nine> shall be specified, and in
contiguous ascending sequence by numerical value. The digits <zero> to <nine>
of the portable character set are automatically included in this class.
”

This means it's fixed to '0' .. '9' for our purposes (our locales must be
ASCII-transparent at least as far as the digits are concerned).

localedef should then disallow charmap files that aren't ASCII-transparent
for digits, if it doesn't already.

Right. There is a warning already (“character map `%s' is not ASCII compatible, locale not ISO C compliant”), but the triggering condition is not quite clear to me.

I'm basing my assertion above on the fact that we want that isdigit ('0'), ..., isdigit ('9') are all true based on the specification in C99 and C11. There are only ten decimal digits according to POSIX, and it does not appear to be possible to have a symbolic name such as <zero> to stand for more than one encoded character sequence, so the set of decimal digits is clearly fixed.

For xdigit, one can have more than two sequences of 'A' .. 'F' letters:

“
In a locale definition file, only the characters defined for the class digit
shall be specified, in contiguous ascending sequence by numerical value,
followed by one or more sets of six characters representing the hexadecimal
digits 10 to 15 inclusive, with each set in ascending order (for example, <A>,
<B>, <C>, <D>, <E>, <F>, <a>, <b>, <c>, <d>, <e>, <f>).
”

But I wonder how useful this is in practice.  One might be tempted to define

It seems perfectly valid in accordance with POSIX, and we support users
defining locales, so can't restrict functions to what's valid only for the
locales shipped with glibc (whereas support for alternative charmaps is
implementation-defined, so we can limit what we allow in charmap files).

For isxdigit, C99 and C11 make a final determination that only '0' … '9', 'a' … 'f' and 'A' … 'F' are hexadecimal digits. But POSIX allows more symbolic names in the xdigit character class. Much hand-waving is still required to make this C99/C11 compliant because the standard only lists 22 hexadecimal digits. One could perhaps argue that the additional digits introduced by a locale are alternative representations of the six letters.

We do have users of locales which are not ISO-C-compliant because they are not completely ASCII-transparent. They are rather iffy from a security perspective because string escaping is ambiguous if multi-byte character sequences can contain characters which need escaping. But I'm not sure if we have to extend this possibility to isdigit and isxdigit.

Florian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]