This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/19305] qsort() should return early if (nmemb <= 1)


https://sourceware.org/bugzilla/show_bug.cgi?id=19305

--- Comment #1 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, azanella/qsort-refactor has been created
        at  9241132d1668a61513c4b0856a51024d425940fd (commit)

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9241132d1668a61513c4b0856a51024d425940fd

commit 9241132d1668a61513c4b0856a51024d425940fd
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Tue Jan 16 14:24:53 2018 -0200

    stdlib: Remove undefined behavior from qsort implementation

    Internally qsort is implemented on top of __qsort_r by casting the
    function pointer to another type (__compar_fn_t tp __compar_d_fn_t)
    and passing a NULL extra argument.  Casting function pointer with
    different types for subsequent function call is undefined-behaviour
    (C11 6.3.2.3):

      "[8] A pointer to a function of one type may be converted to a pointer
      to a function of another type and back again; the result shall compare
      equal to the original pointer. If a converted pointer is used to call
      a function whose type is not compatible with the referenced type,
      the behavior is undefined."

    Also 'compatible' in this case also does not apply according to
    6.7.6.3 Function declarators (including prototypes):

      "[15] For two function types to be compatible, both shall specify
      compatible return types. (146) Moreover, the parameter type lists,
      if both are present, shall agree in the number of parameters and
      in use of the ellipsis terminator; corresponding parameters shall
      have compatible types. [...]"

    Although this works on all architectures glibc supports (mostly because
    it casts function pointers with similar calling conventions), I think
    it is worth to avoid it.  This patch fixes it by adding a common
    implementation (qsort_common.c) which redefines the function based
    on the required types.

    For x86_64 (i7-4790K, gcc 7.2.1) shows a slight better performance
    for qsort:

    Results for member size 4
      Sorted
      nmemb   |      base |   patched | diff
            32|      1304 |      1257 | -3.60
          4096|    330707 |    302235 | -8.61
         32768|   3300210 |   3020728 | -8.47
        524288|  65673289 |  59306436 | -9.69

      Repeated
      nmemb   |      base |   patched | diff
            32|      1885 |      1873 | -0.64
          4096|    951490 |    904864 | -4.90
         32768|   9272366 |   8542801 | -7.87
        524288| 183337854 | 168426795 | -8.13

      MostlySorted
      nmemb   |      base |   patched | diff
            32|      1836 |      1776 | -3.27
          4096|    758359 |    709937 | -6.39
         32768|   7199982 |   6855890 | -4.78
        524288| 139242170 | 129385161 | -7.08

      Unsorted
      nmemb   |      base |   patched | diff
            32|      2073 |      1941 | -6.37
          4096|   1058383 |    969021 | -8.44
         32768|  10310116 |   9462116 | -8.22
        524288| 202427388 | 186560908 | -7.84

    Results for member size 8
      Sorted
      nmemb   |      base |   patched | diff
            32|      1224 |      1205 | -1.55
          4096|    336100 |    325554 | -3.14
         32768|   3539890 |   3264125 | -7.79
        524288|  67268510 |  66107684 | -1.73

      Repeated
      nmemb   |      base |   patched | diff
            32|      2096 |      2118 | 1.05
          4096|   1015585 |    979114 | -3.59
         32768|   9871981 |   9028606 | -8.54
        524288| 189710172 | 174903867 | -7.80

      MostlySorted
      nmemb   |      base |   patched | diff
            32|      2318 |      2346 | 1.21
          4096|    805051 |    759158 | -5.70
         32768|   8346363 |   7810444 | -6.42
        524288| 143597264 | 135900146 | -5.36

      Unsorted
      nmemb   |      base |   patched | diff
            32|      2364 |      2301 | -2.66
          4096|   1076998 |   1014018 | -5.85
         32768|  10442153 |   9888078 | -5.31
        524288| 206235337 | 192479957 | -6.67

    Results for member size 32
      Sorted
      nmemb   |      base |   patched | diff
            32|      1214 |      1184 | -2.47
          4096|    332449 |    325865 | -1.98
         32768|   3313274 |   3331750 | 0.56
        524288|  70786673 |  69067176 | -2.43

      Repeated
      nmemb   |      base |   patched | diff
            32|      4913 |      4813 | -2.04
          4096|   1693735 |   1624137 | -4.11
         32768|  17054760 |  15896739 | -6.79
        524288| 332149265 | 316328778 | -4.76

      MostlySorted
      nmemb   |      base |   patched | diff
            32|      5490 |      5332 | -2.88
          4096|   1394312 |   1312703 | -5.85
         32768|  12743599 |  12360726 | -3.00
        524288| 240249011 | 231603294 | -3.60

      Unsorted
      nmemb   |      base |   patched | diff
            32|      6251 |      6047 | -3.26
          4096|   1959306 |   1695241 | -13.48
         32768|  17204840 |  16430388 | -4.50
        524288| 342716199 | 329496913 | -3.86

    Checked on x86_64-linux-gnu.

        * stdlib/qsort.c: Move common code to stdlib/qsort_common.c
        and parametrize the function definition based wether to use
        the '_r' variant.
        * stdlib/qsort_common.c: New file.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=5c21b8d92bb8011a02901d61e6e16d7dd362b3ee

commit 5c21b8d92bb8011a02901d61e6e16d7dd362b3ee
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Tue Jan 16 11:19:15 2018 -0200

    stdlib: Optimization qsort{_r} swap implementation

    This patchs adds a optimized swap operation on qsort based in previous
    msort one.  Instead of byte operation, three variants are provided:

      1. Using uint32_t loads and stores.
      2. Using uint64_t loads and stores.
      3. Generic one with a temporary buffer and memcpy/mempcpy.

    The 1. and 2. option are selected only either if architecture defines
    _STRING_ARCH_unaligned or if base pointer is aligned to required type.
    This is due based on data for bench-qsort, usually programs calls
    qsort with array with multiple of machine word as element size.

    Benchmarking shows an increase performance:

    Results for member size 4
      Sorted
      nmemb   |      base |   patched | diff
            32|      1401 |      1958 | 39.76
          4096|    351333 |    368533 | 4.90
         32768|   3369386 |   3131712 | -7.05
        524288|  63192972 |  59807494 | -5.36

      MostlySorted
      nmemb   |      base |   patched | diff
            32|      2391 |      2061 | -13.80
          4096|   1124074 |    961816 | -14.43
         32768|  11196607 |   9410438 | -15.95
        524288| 215908169 | 185586732 | -14.04

      Unsorted
      nmemb   |      base |   patched | diff
            32|      4993 |      2021 | -59.52
          4096|   1113860 |    963126 | -13.53
         32768|  11251293 |   9518795 | -15.40
        524288| 217252237 | 185072278 | -14.81

    Results for member size 8
      Sorted
      nmemb   |      base |   patched | diff
            32|      1296 |      1267 | -2.24
          4096|    359418 |    334852 | -6.83
         32768|   3535229 |   3345157 | -5.38
        524288|  69847251 |  67029358 | -4.03

      MostlySorted
      nmemb   |      base |   patched | diff
            32|      2745 |      2340 | -14.75
          4096|   1222082 |   1014314 | -17.00
         32768|  12244800 |   9924706 | -18.95
        524288| 241557971 | 196898760 | -18.49

      Unsorted
      nmemb   |      base |   patched | diff
            32|      2972 |      2389 | -19.62
          4096|   1314861 |   1024052 | -22.12
         32768|  12397909 |  10120848 | -18.37
        524288| 241789262 | 193414824 | -20.01

    Results for member size 32
      Sorted
      nmemb   |      base |   patched | diff
            32|      1305 |      1287 | -1.38
          4096|    346332 |    347979 | 0.48
         32768|   3458244 |   3408058 | -1.45
        524288|  72793445 |  69973719 | -3.87

      MostlySorted
      nmemb   |      base |   patched | diff
            32|      5435 |      4890 | -10.03
          4096|   2032260 |   1688556 | -16.91
         32768|  19909035 |  16419992 | -17.52
        524288| 390339319 | 325921585 | -16.50

      Unsorted
      nmemb   |      base |   patched | diff
            32|      5833 |      5351 | -8.26
          4096|   2022531 |   1724961 | -14.71
         32768|  19842888 |  16588545 | -16.40
        524288| 388838382 | 324102703 | -16.65

    Checked on x86_64-linux-gnu.

        [BZ #19305].
        * stdlib/qsort.c (SWAP): Remove.
        (check_alignment, swap_u32, swap_u64, swap_generic,
        select_swap_func): New functions.
        (__qsort_r):

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=e183819e852e55c21d278c459ff2a3f49c4f8ed0

commit e183819e852e55c21d278c459ff2a3f49c4f8ed0
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Tue Jan 16 10:49:43 2018 -0200

    stdlib: Remove use of mergesort on qsort

    This patch removes the mergesort optimization on qsort{_r} implementation
    and use the quicksort instead.  The mergesort implementation has some
    issues:

      - It is not as-safe since it uses malloc.

      - The malloc call adds arbitrary latency (might even worse if malloc
        is interposed).

      - The swap check rely on system information that might be virtualized
        (for instance vms with overcommit memory) which leads to potentially
        use of swap (if system advertise more memory than actually has).
        The real memory also have the downside of issuing syscalls where
        none is really expected (although only once per execution).

      - The mergesort is suboptimal on already sorted array (BZ#21719).

    The quicksort implementation is already optimized to use O(1) space, so
    it should not show issues related to malloc call or stack overflow.

    Using bench-qsort (i7-4790K, gcc 7.2.1) shows the performance difference
    between mergesort (base) and quicksort (patched):

    Results for member size 4
      Sorted
      nmemb   |      base |   patched | diff
            32|      1447 |      1401 | -3.18
          4096|    315978 |    351333 | 11.19
         32768|   2559093 |   3369386 | 31.66
        524288|  46228488 |  63192972 | 36.70

      MostlySorted
      nmemb   |      base |   patched | diff
            32|      1974 |      2391 | 21.12
          4096|    922332 |   1124074 | 21.87
         32768|   9268671 |  11196607 | 20.80
        524288| 186856297 | 215908169 | 15.55

      Unsorted
      nmemb   |      base |   patched | diff
            32|      1978 |      4993 | 152.43
          4096|    916413 |   1113860 | 21.55
         32768|   9270003 |  11251293 | 21.37
        524288| 187606088 | 217252237 | 15.80

    Results for member size 8
      Sorted
      nmemb   |      base |   patched | diff
            32|      1424 |      1296 | -8.99
          4096|    299105 |    359418 | 20.16
         32768|   2737859 |   3535229 | 29.12
        524288|  53082807 |  69847251 | 31.58

      MostlySorted
      nmemb   |      base |   patched | diff
            32|      2129 |      2745 | 28.93
          4096|    969465 |   1222082 | 26.06
         32768|   9605227 |  12244800 | 27.48
        524288| 193353927 | 241557971 | 24.93

      Unsorted
      nmemb   |      base |   patched | diff
            32|      2194 |      2972 | 35.46
          4096|    958610 |   1314861 | 37.16
         32768|   9664246 |  12397909 | 28.29
        524288| 193758429 | 241789262 | 24.79

    Results for member size 32
      Sorted
      nmemb   |      base |   patched | diff
            32|      4477 |      1305 | -70.85
          4096|   1109492 |    346332 | -68.78
         32768|  11075976 |   3458244 | -68.78
        524288| 230773658 |  72793445 | -68.46

      MostlySorted
      nmemb   |      base |   patched | diff
            32|      5905 |      5435 | -7.96
          4096|   2568895 |   2032260 | -20.89
         32768|  24936755 |  19909035 | -20.16
        524288| 526442900 | 390339319 | -25.85

      Unsorted
      nmemb   |      base |   patched | diff
            32|      6004 |      5833 | -2.85
          4096|   2437943 |   2022531 | -17.04
         32768|  24789971 |  19842888 | -19.96
        524288| 525898556 | 388838382 | -26.06

    An increase in latency, however some performance difference is due the fact
    mergesort uses a slight improved swap operation than quicksort (which
following
    patch addresses).  This change also renders the BZ #21719 fix unrequired
(since
    it is to fix the sorted input performance degradation for mergesort).

    Checked on x86_64-linux-gnu.

        [BZ #21719]
        * stdlib/Makefile (routines): Remove msort.
        (CFLAGS-msort.c): Remove rule.
        * stdlib/msort.c: Remove file.
        * stdlib/qsort.c (_quicksort): Rename to __qsort_r and add weak_alias
        to qsort_r.
        (qsort): New symbol.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a707334fdfcac4ce7b479be6ac8727c45893bdab

commit a707334fdfcac4ce7b479be6ac8727c45893bdab
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Mon Jan 15 09:20:30 2018 -0200

    stdlib: Add more qsort{_r} coverage

    This patch adds a qsort and qsort_t (which glibc current lacks
    coverage).  The test check with random input (created using support
    random) with different internal types (uint8_t, uint16_t, uint32_t,
    and uint64_t) and with different set of element numbers (from 0
    to 262144).

    Checked on x86_64-linux-gnu.

        * stdlib/tst-qsort3.c: New file.
        * stdlib/Makefile (tests): Add tst-qsort3.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=99ce2951fdd6a711334dc927f1ecfd9193fdd506

commit 99ce2951fdd6a711334dc927f1ecfd9193fdd506
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Thu Aug 31 20:24:03 2017 -0300

    benchtests: Add bench-qsort

    This patch adds a qsort benchmark.  I tried to model the benchmark taking
in
    consideration the possible input variation in both internal element size,
    element numbers, and internal state for 1. real word cases and 2. possible
    scenarios based on hardware characteristics.

    For 1. I tracked qsort usage (using a simple preload library to dump its
usage
    and a script to pos-process it) on both GCC bootstrap and Firefox.  GCC 8
    bootstrap build shows 51786641 call to qsort with the following
characterics:

    Key: number of elements:
    key=2 : 39.74
    key=3 : 19.23
    key=4 : 9.77
    key=1 : 8.44
    key=0 : 6.60
    key=5 : 4.40
    key=7 : 2.37
    key=6 : 2.25
    key=9 : 1.48
    key=8 : 0.97

    Key: element size in bytes:
    key=8 : 91.74
    key=32 : 3.62
    key=4 : 2.42
    key=40 : 1.20
    key=16 : 0.67
    key=24 : 0.30
    key=48 : 0.05
    key=56 : 0.00
    key=1 : 0.00

    Key: total size (number of elements * element size):
    key=16 : 35.98
    key=24 : 18.67
    key=32 : 9.79
    key=8 : 8.28
    key=0 : 6.60
    key=40 : 4.21
    key=64 : 3.15
    key=48 : 2.24
    key=56 : 2.15
    key=80 : 1.45

    So for GCC:

      - 80% of total qsort usage are done with 10 elements of less.
      - All usages are done element size of maximum of 56 bytes.
      - 90% of calls are done with array of maximum size of 80 bytes or less.

    The Firefox usage was done with 2 hours of usage, with first 10 minutes
activelly
    openning and closing different types of sites.  It resulted in 21042 calls
with
    following characteristics:

    Key: number of elements:
    key=7 : 24.40
    key=1 : 10.44
    key=3 : 6.33
    key=4 : 5.81
    key=2 : 5.46
    key=6 : 4.80
    key=17 : 4.54
    key=0 : 3.07
    key=5 : 3.05
    key=9 : 2.51
    key=12 : 2.06

    Key: element size in bytes:
    key=8 : 94.49
    key=28 : 4.40
    key=2 : 0.70
    key=16 : 0.19
    key=36 : 0.07
    key=12 : 0.07
    key=40 : 0.07
    key=24 : 0.03

    Key: total size (number of elements * element size):
    key=56 : 24.20
    key=8 : 10.27
    key=24 : 6.36
    key=32 : 5.86
    key=16 : 5.46
    key=48 : 4.80
    key=476 : 3.75
    key=0 : 3.07
    key=40 : 3.05
    key=72 : 2.50

    So for Firefox:

      - 72% of total qsort usage are done with 18 elements of less.
      - All usages are done element size of maximum of 40 bytes.
      - 70% of calls are done with array of maximum size of 476 bytes or less.

    For 2. I used the idea of a machine with 3 levels of cache with sizes
    32kb (L1), 256kb (L2), and 4096 (L3).

    It resulted in a benchmark with following traits:

      * It checks four types of input arrays: sorted, mostly sorted, unsorted,
and
        repeated.  For 'sorted' the array is already sorted, 'mostly sorted'
the
        array will have a certain number of random elements with random values
        (current ratio used is 20%), for 'unsorted' the array will contain
random
        elements from full range based on used type, and for 'repeated' the
array
        will have random elements with a certain number (current ratio is 20%)
of
        a repeated element distributed randomly.

      * Three elements sizes are checked: uint32_t, uint64_t, and an element
with
        32 bytes (but using the uint64_t comparison checks).  These element
sizes
        are used to 1. to avoid include the comparison function itself and/or
        memory copy in sort benchmark itself, and 2. because key of size_t are
the
        most used for both GCC and Firefox.

      * Five different element numbers: 64 (which cover mostly of used element
        sizes for both GCC and Firefox), 4096/8192 (which cover 32 KB of L1 for
        32 and 64 bits), 32768/65536 (L2 with 256 KB), and 24288/1048576 (L3
with
        4 MB).  The sizes are configurable by --nelem option.

    Checked on x86_64-linux-gnu

        * benchtests/Makefile (stdlib-benchset): Add qsort.
        * benchtests/bench-qsort.c: New file.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=84e29aafc42d4c01aaf90f76762e6dc94579b9b0

commit 84e29aafc42d4c01aaf90f76762e6dc94579b9b0
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Mon Dec 25 10:53:13 2017 -0200

    support: Add Mersenne Twister pseudo-random number generator

    This patch adds support routines for pseudo-random number generation
    based on Mersenne Twister.  The libstc++ version is used as based and
    both 32 and 64 bits are provided.  It is used on following qsort tests
    and benchmarks.

    I decided to use a Mersenne Twister (MT) instead of random_r internal
    implementation, which uses a linear feedback shift register approach
    with trinomials, because it:

      - is used extensivelly in other implementations (like c+11);
      - has a quite larger period (2^219937-1) than the type 4 variantion
        of random (2^63 - 1);
      - does not the RAND_MAX limitation.

    Checked on x86_64-linux-gnu.

        * support/Makefile (libsupport-routines): Add support_random.
        (tests): Add tst-support_random.
        * support/support_random.c: New file.
        * support/support_random.h: Likewise.
        * support/tst-support_random.c: Likewise.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=d23a77dae549e95722f62d87d30787387fabff08

commit d23a77dae549e95722f62d87d30787387fabff08
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Fri Dec 15 09:49:37 2017 -0200

    stdlib: Adjust tst-qsort{2} to libsupport

        * stdlib/tst-qsort.c: Use libsupport.
        * stdlib/tst-qsort2.c: Likewise.

-----------------------------------------------------------------------

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]