This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

bug in intl/hash-string.h for non-ASCII msgids


Hi,

The hash_string() function, used by dcigettext() for looking up a message
translation in a .mo file, uses a cast from 'char' to 'unsigned long int'
and therefore depends on signedness of 'char'. I.e. it produces a different
hash code on i386 platforms than on PowerPC platforms. But the GNU .mo file
format is meant to be platform independent.

The effect of the bug is that in .mo files created on some platforms and
used on other platforms, messages whose msgid has non-ASCII characters will
not be translated.

There are not many such .mo files since xgettext-0.11.x (rolled out for
nearly two years) gives an error when it sees a non-ASCII msgid, and I didn't
get many requests for this feature in that time.

In GNU gettext 0.12.2 I will change the hash function so that works as if
'char' were unsigned; this is consistent with the _dl_elf_hash function in
glibc and the elfHash function in Qt.

Here is a patch to make glibc work the same way. The patch has the effect
that non-ASCII msgids will stop working for .mo files which were generated
on 'char is signed' platforms; but these .mo files have to be regenerated
anyway in order to work across all platforms.

The alternative to this patch is to bump the version number of the .mo files
and use the appropriate hash function depending on the .mo file's version
number; but I think the complexity is not worth it, given the small use
of non-ASCII msgids up to now.


2003-10-19  Bruno Haible  <bruno@clisp.org>

	* intl/hash-string.h (hash_string): Zero-extend each char from the
	string; the old code did a sign-extend on some platforms.

--- glibc-20030425/intl/hash-string.h.bak	2002-12-16 12:45:53.000000000 +0100
+++ glibc-20030425/intl/hash-string.h	2003-10-19 14:32:07.000000000 +0200
@@ -1,5 +1,5 @@
 /* Implements a string hashing function.
-   Copyright (C) 1995, 1997, 1998, 2000 Free Software Foundation, Inc.
+   Copyright (C) 1995, 1997, 1998, 2000, 2003 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -48,7 +48,7 @@
   while (*str != '\0')
     {
       hval <<= 4;
-      hval += (unsigned long int) *str++;
+      hval += (unsigned char) *str++;
       g = hval & ((unsigned long int) 0xf << (HASHWORDBITS - 4));
       if (g != 0)
 	{


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]