This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/21719] stdlib/msort : optimizing merge sort


https://sourceware.org/bugzilla/show_bug.cgi?id=21719

--- Comment #5 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, azanella/qsort-refactor has been created
        at  ce67791dececc810a701814eea472d086f624ab4 (commit)

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ce67791dececc810a701814eea472d086f624ab4

commit ce67791dececc810a701814eea472d086f624ab4
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Tue Jan 16 14:24:53 2018 -0200

    stdlib: Remove undefined behavior from qsort implementation

    Internally qsort is implemented on top of __qsort_r by casting the
    function pointer to another type (__compar_fn_t tp __compar_d_fn_t)
    and passing a NULL extra argument.  Casting function pointer with
    different types for subsequent function call is undefined-behaviour
    (C11 6.3.2.3):

      "[8] A pointer to a function of one type may be converted to a pointer
      to a function of another type and back again; the result shall compare
      equal to the original pointer. If a converted pointer is used to call
      a function whose type is not compatible with the referenced type,
      the behavior is undefined."

    Also 'compatible' in this case also does not apply according to
    6.7.6.3 Function declarators (including prototypes):

      "[15] For two function types to be compatible, both shall specify
      compatible return types. (146) Moreover, the parameter type lists,
      if both are present, shall agree in the number of parameters and
      in use of the ellipsis terminator; corresponding parameters shall
      have compatible types. [...]"

    Although this works on all architectures glibc supports (mostly because
    it casts function pointers with similar calling conventions), I think
    it is worth to avoid it.  This patch fixes it by adding a common
    implementation (qsort_common.c) which redefines the function based
    on the required types.

    For x86_64 (i7-4790K, gcc 7.2.1) shows a slight better performance
    for qsort:

    Results for member size 4
      Sorted
      nmemb   |      base |   patched | diff
            32|      1304 |      1257 | -3.60
          4096|    330707 |    302235 | -8.61
         32768|   3300210 |   3020728 | -8.47
        524288|  65673289 |  59306436 | -9.69

      Repeated
      nmemb   |      base |   patched | diff
            32|      1885 |      1873 | -0.64
          4096|    951490 |    904864 | -4.90
         32768|   9272366 |   8542801 | -7.87
        524288| 183337854 | 168426795 | -8.13

      MostlySorted
      nmemb   |      base |   patched | diff
            32|      1836 |      1776 | -3.27
          4096|    758359 |    709937 | -6.39
         32768|   7199982 |   6855890 | -4.78
        524288| 139242170 | 129385161 | -7.08

      Unsorted
      nmemb   |      base |   patched | diff
            32|      2073 |      1941 | -6.37
          4096|   1058383 |    969021 | -8.44
         32768|  10310116 |   9462116 | -8.22
        524288| 202427388 | 186560908 | -7.84

    Results for member size 8
      Sorted
      nmemb   |      base |   patched | diff
            32|      1224 |      1205 | -1.55
          4096|    336100 |    325554 | -3.14
         32768|   3539890 |   3264125 | -7.79
        524288|  67268510 |  66107684 | -1.73

      Repeated
      nmemb   |      base |   patched | diff
            32|      2096 |      2118 | 1.05
          4096|   1015585 |    979114 | -3.59
         32768|   9871981 |   9028606 | -8.54
        524288| 189710172 | 174903867 | -7.80

      MostlySorted
      nmemb   |      base |   patched | diff
            32|      2318 |      2346 | 1.21
          4096|    805051 |    759158 | -5.70
         32768|   8346363 |   7810444 | -6.42
        524288| 143597264 | 135900146 | -5.36

      Unsorted
      nmemb   |      base |   patched | diff
            32|      2364 |      2301 | -2.66
          4096|   1076998 |   1014018 | -5.85
         32768|  10442153 |   9888078 | -5.31
        524288| 206235337 | 192479957 | -6.67

    Results for member size 32
      Sorted
      nmemb   |      base |   patched | diff
            32|      1214 |      1184 | -2.47
          4096|    332449 |    325865 | -1.98
         32768|   3313274 |   3331750 | 0.56
        524288|  70786673 |  69067176 | -2.43

      Repeated
      nmemb   |      base |   patched | diff
            32|      4913 |      4813 | -2.04
          4096|   1693735 |   1624137 | -4.11
         32768|  17054760 |  15896739 | -6.79
        524288| 332149265 | 316328778 | -4.76

      MostlySorted
      nmemb   |      base |   patched | diff
            32|      5490 |      5332 | -2.88
          4096|   1394312 |   1312703 | -5.85
         32768|  12743599 |  12360726 | -3.00
        524288| 240249011 | 231603294 | -3.60

      Unsorted
      nmemb   |      base |   patched | diff
            32|      6251 |      6047 | -3.26
          4096|   1959306 |   1695241 | -13.48
         32768|  17204840 |  16430388 | -4.50
        524288| 342716199 | 329496913 | -3.86

    Checked on x86_64-linux-gnu.

        * stdlib/qsort.c: Move common code to stdlib/qsort_common.c
        and parametrize the function definition based wether to use
        the '_r' variant.
        * stdlib/qsort_common.c: New file.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=fd3d9f81068303bb501cccddc80037d4df8c78b7

commit fd3d9f81068303bb501cccddc80037d4df8c78b7
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Tue Jan 16 11:19:15 2018 -0200

    stdlib: Optimization qsort{_r} swap implementation

    This patchs adds a optimized swap operation on qsort based in previous
    msort one.  Instead of byte operation, three variants are provided:

      1. Using uint32_t loads and stores.
      2. Using uint64_t loads and stores.
      3. Generic one with a temporary buffer and memcpy/mempcpy.

    The 1. and 2. option are selected only either if architecture defines
    _STRING_ARCH_unaligned or if base pointer is aligned to required type.
    This is due based on data for bench-qsort, usually programs calls
    qsort with array with multiple of machine word as element size.

    Benchmarking shows an increase performance:

    Results for member size 4
      Sorted
      nmemb   |      base |   patched | diff
            32|      1401 |      1958 | 39.76
          4096|    351333 |    368533 | 4.90
         32768|   3369386 |   3131712 | -7.05
        524288|  63192972 |  59807494 | -5.36

      MostlySorted
      nmemb   |      base |   patched | diff
            32|      2391 |      2061 | -13.80
          4096|   1124074 |    961816 | -14.43
         32768|  11196607 |   9410438 | -15.95
        524288| 215908169 | 185586732 | -14.04

      Unsorted
      nmemb   |      base |   patched | diff
            32|      4993 |      2021 | -59.52
          4096|   1113860 |    963126 | -13.53
         32768|  11251293 |   9518795 | -15.40
        524288| 217252237 | 185072278 | -14.81

    Results for member size 8
      Sorted
      nmemb   |      base |   patched | diff
            32|      1296 |      1267 | -2.24
          4096|    359418 |    334852 | -6.83
         32768|   3535229 |   3345157 | -5.38
        524288|  69847251 |  67029358 | -4.03

      MostlySorted
      nmemb   |      base |   patched | diff
            32|      2745 |      2340 | -14.75
          4096|   1222082 |   1014314 | -17.00
         32768|  12244800 |   9924706 | -18.95
        524288| 241557971 | 196898760 | -18.49

      Unsorted
      nmemb   |      base |   patched | diff
            32|      2972 |      2389 | -19.62
          4096|   1314861 |   1024052 | -22.12
         32768|  12397909 |  10120848 | -18.37
        524288| 241789262 | 193414824 | -20.01

    Results for member size 32
      Sorted
      nmemb   |      base |   patched | diff
            32|      1305 |      1287 | -1.38
          4096|    346332 |    347979 | 0.48
         32768|   3458244 |   3408058 | -1.45
        524288|  72793445 |  69973719 | -3.87

      MostlySorted
      nmemb   |      base |   patched | diff
            32|      5435 |      4890 | -10.03
          4096|   2032260 |   1688556 | -16.91
         32768|  19909035 |  16419992 | -17.52
        524288| 390339319 | 325921585 | -16.50

      Unsorted
      nmemb   |      base |   patched | diff
            32|      5833 |      5351 | -8.26
          4096|   2022531 |   1724961 | -14.71
         32768|  19842888 |  16588545 | -16.40
        524288| 388838382 | 324102703 | -16.65

    Checked on x86_64-linux-gnu.

        [BZ #19305].
        * stdlib/qsort.c (SWAP): Remove.
        (check_alignment, swap_u32, swap_u64, swap_generic,
        select_swap_func): New functions.
        (__qsort_r):

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=400d842ed9e6a13f616ff12fa7a2400c20495c3c

commit 400d842ed9e6a13f616ff12fa7a2400c20495c3c
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Tue Jan 16 10:49:43 2018 -0200

    stdlib: Remove use of mergesort on qsort

    This patch removes the mergesort optimization on qsort{_r} implementation
    and use the quicksort instead.  The mergesort implementation has some
    issues:

      - It is as-safe only for certain types sizes (if total size is less
        than 1 KB with large element sizes also forcing memory allocation)
        which contradicts the function documentation.  Although not required
        by the C standard, it is preferable and doable to have a O(1) space
        qsort implementation.

      - The malloc for certain element size and element number adds arbitrary
        latency (might even be worse if malloc is interposed).

      - To avoid trigger swap from memory allocation the implementation relies
        on system information that might be virtualized (for instance VMs with
        overcommit memory) which might leads to potentially use of swap even
        if system advertise more memory than actually has.  The check also have
        the downside of issuing syscalls where none is expected (although only
        once per execution).

      - The mergesort is suboptimal on already sorted array (BZ#21719).

    The quicksort implementation is already optimized to use constant extra
    space (due the limit of total number of elements from maximum VM size)
    and thus can be used to avoid the malloc usage issues.

    Using bench-qsort (i7-4790K, gcc 7.2.1) shows the performance difference
    between mergesort (base) and quicksort (patched):

    Results for member size 4
      Sorted
      nmemb   |      base |   patched | diff
            32|      1447 |      1401 | -3.18
          4096|    315978 |    351333 | 11.19
         32768|   2559093 |   3369386 | 31.66
        524288|  46228488 |  63192972 | 36.70

      MostlySorted
      nmemb   |      base |   patched | diff
            32|      1974 |      2391 | 21.12
          4096|    922332 |   1124074 | 21.87
         32768|   9268671 |  11196607 | 20.80
        524288| 186856297 | 215908169 | 15.55

      Unsorted
      nmemb   |      base |   patched | diff
            32|      1978 |      4993 | 152.43
          4096|    916413 |   1113860 | 21.55
         32768|   9270003 |  11251293 | 21.37
        524288| 187606088 | 217252237 | 15.80

    Results for member size 8
      Sorted
      nmemb   |      base |   patched | diff
            32|      1424 |      1296 | -8.99
          4096|    299105 |    359418 | 20.16
         32768|   2737859 |   3535229 | 29.12
        524288|  53082807 |  69847251 | 31.58

      MostlySorted
      nmemb   |      base |   patched | diff
            32|      2129 |      2745 | 28.93
          4096|    969465 |   1222082 | 26.06
         32768|   9605227 |  12244800 | 27.48
        524288| 193353927 | 241557971 | 24.93

      Unsorted
      nmemb   |      base |   patched | diff
            32|      2194 |      2972 | 35.46
          4096|    958610 |   1314861 | 37.16
         32768|   9664246 |  12397909 | 28.29
        524288| 193758429 | 241789262 | 24.79

    Results for member size 32
      Sorted
      nmemb   |      base |   patched | diff
            32|      4477 |      1305 | -70.85
          4096|   1109492 |    346332 | -68.78
         32768|  11075976 |   3458244 | -68.78
        524288| 230773658 |  72793445 | -68.46

      MostlySorted
      nmemb   |      base |   patched | diff
            32|      5905 |      5435 | -7.96
          4096|   2568895 |   2032260 | -20.89
         32768|  24936755 |  19909035 | -20.16
        524288| 526442900 | 390339319 | -25.85

      Unsorted
      nmemb   |      base |   patched | diff
            32|      6004 |      5833 | -2.85
          4096|   2437943 |   2022531 | -17.04
         32768|  24789971 |  19842888 | -19.96
        524288| 525898556 | 388838382 | -26.06

    An increase in latency, however some performance difference is due the fact
    mergesort uses a slight improved swap operation than quicksort (which
following
    patch addresses).  This change also renders the BZ #21719 fix unrequired
(since
    it is to fix the sorted input performance degradation for mergesort).  The
    manual is also updated to indicate the function is not async-cancel safe.

    Checked on x86_64-linux-gnu.

        [BZ #21719]
        * stdlib/Makefile (routines): Remove msort.
        (CFLAGS-msort.c): Remove rule.
        * stdlib/msort.c: Remove file.
        * stdlib/qsort.c (_quicksort): Rename to __qsort_r and add weak_alias
        to qsort_r.
        (qsort): New symbol.
        * manual/argp.texi: Remove qsort @acu* annotation.
        * manual/locale.texi: Likewise.
        * manual/search.texi: Likewise.

-----------------------------------------------------------------------

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]