This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: strxfrm and man
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Ulrich Drepper <drepper at redhat dot com>, Nikita Shulga <shulga at jscc dot ru>
- Cc: libc-alpha at sourceware dot org
- Date: Wed, 8 Nov 2006 23:03:18 +0100
- Subject: Re: strxfrm and man
- References: <58606DA8-DE21-4E12-A71B-7410C0044D20@jscc.ru>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Wed, Nov 08, 2006 at 11:36:01AM +0300, Nikita Shulga wrote:
> strxfrm man page said that return value are number of bytes required
> to store transformed string excluding terminating \0? character.
> But this is not always true - if third argument is less then number
> of bytes required to store results and locale is,for example,
> "en_US.utf8"
> than return value is length of transformed string including
> terminating character, but if locale is C or POSIX it behaves as
> described in man-page.
> For example, strxfrm(NULL,"a",0)<>strxfrm(buf,"a",10) for
> "en_US.utf8" locale, but return values are equal if locale is "C".
>
> Do you think it's OK? Or bug report should be filed to glibc bugzilla?
This is caused by glibc strxfrm optimization where it removes
a trailing \1, but only removes it when the third argument is big enough.
The current glibc behavior looks like a bug to me, neither ISO C99 nor
POSIX wording seem to allow returning different values depending on
what third argument was passed to it (as long as the source string
is identical and the locale is the same too).
The following patch should fix it, by checking the length of the
last rule's additions instead of checking whether the last char before
'\0' is '\1'.
2006-11-08 Jakub Jelinek <jakub@redhat.com>
* string/strxfrm_l.c (STRXFRM): Do the trailing \1 removal
optimization even if needed > n.
--- libc/string/strxfrm_l.c.jj 2005-10-15 22:49:18.000000000 +0200
+++ libc/string/strxfrm_l.c 2006-11-08 22:18:38.000000000 +0100
@@ -1,4 +1,5 @@
-/* Copyright (C) 1995,96,97,2002, 2004, 2005 Free Software Foundation, Inc.
+/* Copyright (C) 1995, 1996, 1997, 2002, 2004, 2005, 2006
+ Free Software Foundation, Inc.
This file is part of the GNU C Library.
Written by Ulrich Drepper <drepper@gnu.org>, 1995.
@@ -95,7 +96,7 @@ STRXFRM (STRING_TYPE *dest, const STRING
const USTRING_TYPE *extra;
const int32_t *indirect;
uint_fast32_t pass;
- size_t needed;
+ size_t needed, last_needed;
const USTRING_TYPE *usrc;
size_t srclen = STRLEN (src);
int32_t *idxarr;
@@ -197,6 +198,7 @@ STRXFRM (STRING_TYPE *dest, const STRING
this is true for all of them. */
int position = rule & sort_position;
+ last_needed = needed;
if (position == 0)
{
for (idxcnt = 0; idxcnt < idxmax; ++idxcnt)
@@ -426,11 +428,11 @@ STRXFRM (STRING_TYPE *dest, const STRING
a `position' rule at the end and if no non-ignored character
is found the last \1 byte is immediately followed by a \0 byte
signalling this. We can avoid the \1 byte(s). */
- if (needed <= n && needed > 2 && dest[needed - 2] == L('\1'))
+ if (needed > 2 && needed == last_needed + 1)
{
/* Remove the \1 byte. */
- --needed;
- dest[needed - 1] = L('\0');
+ if (--needed < n)
+ dest[needed - 1] = L('\0');
}
/* Free the memory if needed. */
Jakub