This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 6/7] stdlib: Optimization qsort{_r} swap implementation
On 22/01/2018 16:29, Paul Eggert wrote:
> On 01/22/2018 09:48 AM, Adhemerval Zanella wrote:
>> One option I have not
>> tested, and which will trade code side for performance; would parametrize
>> the qsort creation (as for the 7/7 patch in this set) to have qsort_uint32_t,
>> qsort_uint64_t, and qsort_generic for instance (which calls the swap inline).
>>
>> So we will have something as:
>>
>> void qsort (void *pbase, size_t total_elems, size_t size)
>> {
>> if (size == sizeof (uint32_t)
>> && check_alignment (base, sizeof (uint32_t)))
>> return qsort_uint32_t (pbase, total_elems, size);
>> else if (size == sizeof (uint64_t)
>> && check_alignment (base, sizeof (uint64_t)))
>> return qsort_uint64_t (pbase, total_elems, size);
>> return qsort_generic (pbase, total_elems, size);
>> }
>
> Yes, that's the option I was thinking of, except I was thinking that the first test should be "if (size == sizeof (void *) && check_alignment (base, alignof (void *))) return qsort_voidptr (pbase, total_elems, size);" because sorting arrays of pointers is the most common. (Also, check_alignment's argument should use alignof not sizeof.)
>
I add the implementation size and the results are slight better:
Results for member size 8
Sorted
nmemb | base | patched | diff
32| 1173 | 1282 | 9.29
4096| 325485 | 332451 | 2.14
32768| 3232255 | 3293842 | 1.91
524288| 65645381 | 66182948 | 0.82
Repeated
nmemb | base | patched | diff
32| 2074 | 2034 | -1.93
4096| 948339 | 913363 | -3.69
32768| 8906214 | 8651378 | -2.86
524288| 173498547 | 166294093 | -4.15
MostlySorted
nmemb | base | patched | diff
32| 2211 | 2147 | -2.89
4096| 757543 | 739765 | -2.35
32768| 7785343 | 7570811 | -2.76
524288| 133912169 | 129728791 | -3.12
Unsorted
nmemb | base | patched | diff
32| 2219 | 2191 | -1.26
4096| 1017790 | 989068 | -2.82
32768| 9747216 | 9456092 | -2.99
524288| 191726744 | 185012121 | -3.50
At the cost of large text sizes and slight more code:
# Before
$ size stdlib/qsort.os
text data bss dec hex filename
2578 0 0 2578 a12 stdlib/qsort.os
# After
$ size stdlib/qsort.os
text data bss dec hex filename
6037 0 0 6037 1795 stdlib/qsort.os
I still prefer my version where generates shorter text segment and also
optimizes for uint32_t.