This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: DT_GNU_HASH latest patches

Hello, i am writing you this time about the first variant.after looking at the benchmark results i noted that there have been a reduction in the number of l1 cache misses; a reduction in l1 cache misses means a win of 12 cycles ; the difference between the latency of l2 cache and that of l1 cache 15-3.on the other hand replacing a division by a binary and & is a win of at least 25 cycles, so it think that avoiding tthe division in the common case may improve performance.
changing the size of the main hash table is not an ideal solution because it is difficult to choose a good power of 2 size and there is no gurantee that a power of 2 size will give short chains.this also will introduce a great variability in the size nbcukets, they may be oversized which may give a big waste of space.on the other hand the solution of a filter before the main hash table that is a power of 2 has several advantages:
1-it is small: on average its size is nsymbols bytes, which has more chance to fit in the l1 cache than 3*nsymols or 4*nsymbols bytes for tha current hash table.
2-because it is small oversizing or undersizeing will not have a big impact on the overall space utilization.
3-the benchmark of take3 shows an increase in l2 cache misses because of the enlarged size of the hash table; the reduced size of the filter hash table may do less l2 cache misses.
4-in the take3 hash table there is a waste of space because the symbol index word inside the hash table is useless. according to the charts of yesterday there is between 5% and 12% empty buckets inside each hash table.for each empty bucket the symbol index will have no role and will be wasted. with the filter hash table every added word will participate in reducing the probability to go to the second hash table.
when an access is done to the filter hash table and the bit is found to be set, there will be an access to the normal hash table.there is still a chance to avoid a third memory access if the bucket is found empty(between 5% and 12 %) so the average number of memeory accesses may not increase too much compared to take3.
I think that nbucket must not be a power of two, otherwise every access to the normal hash table will access a non empty bucket and third memory access to the hash chains will be done.
the following is a function that computes the nearest power of 2 to a given integer number.
static inline unsigned pow2_round(unsigned int input)
unsigned log2=32-__builtin_clz(input);
if ((1<<(log2-2))&(input))
return 1<<log2;
return 1<<(log2-1);
the corresponding size of the filter hash table would be 8*pow2_round(nsymbols) bits or pow2_round(nsymbols)/4 32 bits integers.
best regards.

Testez Windows Llive Mail Beta !

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]