This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 6/7] stdlib: Optimization qsort{_r} swap implementation



On 22/01/2018 16:29, Paul Eggert wrote:
> On 01/22/2018 09:48 AM, Adhemerval Zanella wrote:
>> One option I have not
>> tested, and which will trade code side for performance; would parametrize
>> the qsort creation (as for the 7/7 patch in this set) to have qsort_uint32_t,
>> qsort_uint64_t, and qsort_generic for instance (which calls the swap inline).
>>
>> So we will have something as:
>>
>> void qsort (void *pbase, size_t total_elems, size_t size)
>> {
>>    if (size == sizeof (uint32_t)
>>      && check_alignment (base, sizeof (uint32_t)))
>>      return qsort_uint32_t (pbase, total_elems, size);
>>    else if (size == sizeof (uint64_t)
>>      && check_alignment (base, sizeof (uint64_t)))
>>      return qsort_uint64_t (pbase, total_elems, size);
>>    return qsort_generic (pbase, total_elems, size);
>> }
> 
> Yes, that's the option I was thinking of, except I was thinking that the first test should be "if (size == sizeof (void *) && check_alignment (base, alignof (void *))) return qsort_voidptr (pbase, total_elems, size);" because sorting arrays of pointers is the most common. (Also, check_alignment's argument should use alignof not sizeof.)
> 

I add the implementation size and the results are slight better:

Results for member size 8
  Sorted
  nmemb   |      base |   patched | diff
        32|      1173 |      1282 | 9.29
      4096|    325485 |    332451 | 2.14
     32768|   3232255 |   3293842 | 1.91
    524288|  65645381 |  66182948 | 0.82

  Repeated
  nmemb   |      base |   patched | diff
        32|      2074 |      2034 | -1.93
      4096|    948339 |    913363 | -3.69
     32768|   8906214 |   8651378 | -2.86
    524288| 173498547 | 166294093 | -4.15

  MostlySorted
  nmemb   |      base |   patched | diff
        32|      2211 |      2147 | -2.89
      4096|    757543 |    739765 | -2.35
     32768|   7785343 |   7570811 | -2.76
    524288| 133912169 | 129728791 | -3.12

  Unsorted
  nmemb   |      base |   patched | diff
        32|      2219 |      2191 | -1.26
      4096|   1017790 |    989068 | -2.82
     32768|   9747216 |   9456092 | -2.99
    524288| 191726744 | 185012121 | -3.50

At the cost of large text sizes and slight more code:

# Before
$ size stdlib/qsort.os
   text    data     bss     dec     hex filename
   2578       0       0    2578     a12 stdlib/qsort.os

# After
$ size stdlib/qsort.os
   text    data     bss     dec     hex filename
   6037       0       0    6037    1795 stdlib/qsort.os


I still prefer my version where generates shorter text segment and also
optimizes for uint32_t.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]