This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 6/7] stdlib: Optimization qsort{_r} swap implementation
On 22/01/2018 15:15, Paul Eggert wrote:
> On 01/22/2018 02:55 AM, Adhemerval Zanella wrote:
>> On 22/01/2018 06:27, Paul Eggert wrote:
>>> Adhemerval Zanella wrote:
>>>> +static inline bool
>>>> +check_alignment (const void *base, size_t align)
>>>> +{
>>>> + return _STRING_ARCH_unaligned || ((uintptr_t)base % (align - 1)) == 0;
>>>> +}
>>> Surely the '(align - 1)' was supposed to be 'align'. Has this been tested on an architecture that does not allow unaligned access?
>> Yes, I checked on sparc64 machine. This test is similar to the Linux kernel one
>> at lib/sort.c.
>
> The Linux kernel lib/sort.c test is (((unsigned long)base & (align - 1)) == 0), which is correct. The test above uses '% (align - 1)' instead, which is clearly wrong; the "%" should be "&". As the tests evidently did not catch the error, they need to be improved to catch it.
Indeed, Alexander Monakov pointed out this issue is his message and I have fixed it
locally.
>
>> It might be that your approach is faster for other architectures which do not have
>> ifunc mempcpy, however I do not want to over-engineer this code since most real
>> word correspond to key sizes of 4 and 8.
>
> Thanks, that all makes sense.
>
> One other question. Would it improve performance to partially evaluate qsort for the case where the key size is that of a pointer, to allow the swap to be done inline with four insns? I would imagine that this is the most common case.
>
I noted that, at least for x86_64, calling a function pointer it slight faster
than embedded the test in a switch (as current msort). One option I have not
tested, and which will trade code side for performance; would parametrize
the qsort creation (as for the 7/7 patch in this set) to have qsort_uint32_t,
qsort_uint64_t, and qsort_generic for instance (which calls the swap inline).
So we will have something as:
void qsort (void *pbase, size_t total_elems, size_t size)
{
if (size == sizeof (uint32_t)
&& check_alignment (base, sizeof (uint32_t)))
return qsort_uint32_t (pbase, total_elems, size);
else if (size == sizeof (uint64_t)
&& check_alignment (base, sizeof (uint64_t)))
return qsort_uint64_t (pbase, total_elems, size);
return qsort_generic (pbase, total_elems, size);
}