This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
speeding up localedef
- From: Bruno Haible <haible at ilog dot fr>
- To: libc-alpha at sources dot redhat dot com
- Date: Thu, 22 Nov 2001 13:43:34 +0100 (CET)
- Subject: speeding up localedef
Hi,
Here is a patch that speeds up localedef on an UTF-8 locale by a
factor of 2.5. On my machine, it reduces the creation time for such a
locale from 19.9 sec to 7.9 sec.
For details, experimental and theoretical, see
http://www.clisp.org/haible/hashfunc.html
2001-11-19 Bruno Haible <bruno@clisp.org>
* locale/programs/simple-hash.c (insert_entry_2): Resize at 75%, not
90%.
(compute_hashval): Shift by 9 bits, not by 4 bits. This drastically
improves the quality of the hash function, especially for short
strings.
--- glibc-20011110/locale/programs/simple-hash.c.bak Tue Jul 10 22:59:07 2001
+++ glibc-20011110/locale/programs/simple-hash.c Mon Nov 19 02:36:51 2001
@@ -1,5 +1,5 @@
/* Implement simple hashing table with string based keys.
- Copyright (C) 1994, 1995, 1996, 1997, 2000 Free Software Foundation, Inc.
+ Copyright (C) 1994-1997, 2000, 2001 Free Software Foundation, Inc.
This file is part of the GNU C Library.
Written by Ulrich Drepper <drepper@gnu.ai.mit.edu>, October 1994.
@@ -163,9 +163,11 @@
}
++htab->filled;
- if (100 * htab->filled > 90 * htab->size)
+ if (100 * htab->filled > 75 * htab->size)
{
- /* Table is filled more than 90%. Resize the table. */
+ /* Table is filled more than 75%. Resize the table.
+ Experiments have shown that for best performance, this threshold
+ must lie between 40% and 85%. */
unsigned long int old_size = htab->size;
htab->size = next_prime (htab->size * 2);
@@ -346,22 +348,18 @@
size_t keylen;
{
size_t cnt;
- unsigned long int hval, g;
+ unsigned long int hval;
/* Compute the hash value for the given string. The algorithm
- is taken from [Aho,Sethi,Ullman]. */
+ is taken from [Aho,Sethi,Ullman], modified to reduce the number of
+ collisions for short strings with very varied bit patterns.
+ See http://www.clisp.org/haible/hashfunc.html. */
cnt = 0;
hval = keylen;
while (cnt < keylen)
{
- hval <<= 4;
+ hval = (hval << 9) | (hval >> (LONGBITS - 9));
hval += (unsigned long int) *(((char *) key) + cnt++);
- g = hval & ((unsigned long) 0xf << (LONGBITS - 4));
- if (g != 0)
- {
- hval ^= g >> (LONGBITS - 8);
- hval ^= g;
- }
}
return hval != 0 ? hval : ~((unsigned long) 0);
}