This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug libc/19305] qsort() should return early if (nmemb <= 1)
- From: "cvs-commit at gcc dot gnu.org" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Tue, 23 Jan 2018 18:47:08 +0000
- Subject: [Bug libc/19305] qsort() should return early if (nmemb <= 1)
- Auto-submitted: auto-generated
- References: <bug-19305-131@http.sourceware.org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=19305
--- Comment #3 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, azanella/qsort-refactor has been created
at ce67791dececc810a701814eea472d086f624ab4 (commit)
- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ce67791dececc810a701814eea472d086f624ab4
commit ce67791dececc810a701814eea472d086f624ab4
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date: Tue Jan 16 14:24:53 2018 -0200
stdlib: Remove undefined behavior from qsort implementation
Internally qsort is implemented on top of __qsort_r by casting the
function pointer to another type (__compar_fn_t tp __compar_d_fn_t)
and passing a NULL extra argument. Casting function pointer with
different types for subsequent function call is undefined-behaviour
(C11 6.3.2.3):
"[8] A pointer to a function of one type may be converted to a pointer
to a function of another type and back again; the result shall compare
equal to the original pointer. If a converted pointer is used to call
a function whose type is not compatible with the referenced type,
the behavior is undefined."
Also 'compatible' in this case also does not apply according to
6.7.6.3 Function declarators (including prototypes):
"[15] For two function types to be compatible, both shall specify
compatible return types. (146) Moreover, the parameter type lists,
if both are present, shall agree in the number of parameters and
in use of the ellipsis terminator; corresponding parameters shall
have compatible types. [...]"
Although this works on all architectures glibc supports (mostly because
it casts function pointers with similar calling conventions), I think
it is worth to avoid it. This patch fixes it by adding a common
implementation (qsort_common.c) which redefines the function based
on the required types.
For x86_64 (i7-4790K, gcc 7.2.1) shows a slight better performance
for qsort:
Results for member size 4
Sorted
nmemb | base | patched | diff
32| 1304 | 1257 | -3.60
4096| 330707 | 302235 | -8.61
32768| 3300210 | 3020728 | -8.47
524288| 65673289 | 59306436 | -9.69
Repeated
nmemb | base | patched | diff
32| 1885 | 1873 | -0.64
4096| 951490 | 904864 | -4.90
32768| 9272366 | 8542801 | -7.87
524288| 183337854 | 168426795 | -8.13
MostlySorted
nmemb | base | patched | diff
32| 1836 | 1776 | -3.27
4096| 758359 | 709937 | -6.39
32768| 7199982 | 6855890 | -4.78
524288| 139242170 | 129385161 | -7.08
Unsorted
nmemb | base | patched | diff
32| 2073 | 1941 | -6.37
4096| 1058383 | 969021 | -8.44
32768| 10310116 | 9462116 | -8.22
524288| 202427388 | 186560908 | -7.84
Results for member size 8
Sorted
nmemb | base | patched | diff
32| 1224 | 1205 | -1.55
4096| 336100 | 325554 | -3.14
32768| 3539890 | 3264125 | -7.79
524288| 67268510 | 66107684 | -1.73
Repeated
nmemb | base | patched | diff
32| 2096 | 2118 | 1.05
4096| 1015585 | 979114 | -3.59
32768| 9871981 | 9028606 | -8.54
524288| 189710172 | 174903867 | -7.80
MostlySorted
nmemb | base | patched | diff
32| 2318 | 2346 | 1.21
4096| 805051 | 759158 | -5.70
32768| 8346363 | 7810444 | -6.42
524288| 143597264 | 135900146 | -5.36
Unsorted
nmemb | base | patched | diff
32| 2364 | 2301 | -2.66
4096| 1076998 | 1014018 | -5.85
32768| 10442153 | 9888078 | -5.31
524288| 206235337 | 192479957 | -6.67
Results for member size 32
Sorted
nmemb | base | patched | diff
32| 1214 | 1184 | -2.47
4096| 332449 | 325865 | -1.98
32768| 3313274 | 3331750 | 0.56
524288| 70786673 | 69067176 | -2.43
Repeated
nmemb | base | patched | diff
32| 4913 | 4813 | -2.04
4096| 1693735 | 1624137 | -4.11
32768| 17054760 | 15896739 | -6.79
524288| 332149265 | 316328778 | -4.76
MostlySorted
nmemb | base | patched | diff
32| 5490 | 5332 | -2.88
4096| 1394312 | 1312703 | -5.85
32768| 12743599 | 12360726 | -3.00
524288| 240249011 | 231603294 | -3.60
Unsorted
nmemb | base | patched | diff
32| 6251 | 6047 | -3.26
4096| 1959306 | 1695241 | -13.48
32768| 17204840 | 16430388 | -4.50
524288| 342716199 | 329496913 | -3.86
Checked on x86_64-linux-gnu.
* stdlib/qsort.c: Move common code to stdlib/qsort_common.c
and parametrize the function definition based wether to use
the '_r' variant.
* stdlib/qsort_common.c: New file.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=fd3d9f81068303bb501cccddc80037d4df8c78b7
commit fd3d9f81068303bb501cccddc80037d4df8c78b7
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date: Tue Jan 16 11:19:15 2018 -0200
stdlib: Optimization qsort{_r} swap implementation
This patchs adds a optimized swap operation on qsort based in previous
msort one. Instead of byte operation, three variants are provided:
1. Using uint32_t loads and stores.
2. Using uint64_t loads and stores.
3. Generic one with a temporary buffer and memcpy/mempcpy.
The 1. and 2. option are selected only either if architecture defines
_STRING_ARCH_unaligned or if base pointer is aligned to required type.
This is due based on data for bench-qsort, usually programs calls
qsort with array with multiple of machine word as element size.
Benchmarking shows an increase performance:
Results for member size 4
Sorted
nmemb | base | patched | diff
32| 1401 | 1958 | 39.76
4096| 351333 | 368533 | 4.90
32768| 3369386 | 3131712 | -7.05
524288| 63192972 | 59807494 | -5.36
MostlySorted
nmemb | base | patched | diff
32| 2391 | 2061 | -13.80
4096| 1124074 | 961816 | -14.43
32768| 11196607 | 9410438 | -15.95
524288| 215908169 | 185586732 | -14.04
Unsorted
nmemb | base | patched | diff
32| 4993 | 2021 | -59.52
4096| 1113860 | 963126 | -13.53
32768| 11251293 | 9518795 | -15.40
524288| 217252237 | 185072278 | -14.81
Results for member size 8
Sorted
nmemb | base | patched | diff
32| 1296 | 1267 | -2.24
4096| 359418 | 334852 | -6.83
32768| 3535229 | 3345157 | -5.38
524288| 69847251 | 67029358 | -4.03
MostlySorted
nmemb | base | patched | diff
32| 2745 | 2340 | -14.75
4096| 1222082 | 1014314 | -17.00
32768| 12244800 | 9924706 | -18.95
524288| 241557971 | 196898760 | -18.49
Unsorted
nmemb | base | patched | diff
32| 2972 | 2389 | -19.62
4096| 1314861 | 1024052 | -22.12
32768| 12397909 | 10120848 | -18.37
524288| 241789262 | 193414824 | -20.01
Results for member size 32
Sorted
nmemb | base | patched | diff
32| 1305 | 1287 | -1.38
4096| 346332 | 347979 | 0.48
32768| 3458244 | 3408058 | -1.45
524288| 72793445 | 69973719 | -3.87
MostlySorted
nmemb | base | patched | diff
32| 5435 | 4890 | -10.03
4096| 2032260 | 1688556 | -16.91
32768| 19909035 | 16419992 | -17.52
524288| 390339319 | 325921585 | -16.50
Unsorted
nmemb | base | patched | diff
32| 5833 | 5351 | -8.26
4096| 2022531 | 1724961 | -14.71
32768| 19842888 | 16588545 | -16.40
524288| 388838382 | 324102703 | -16.65
Checked on x86_64-linux-gnu.
[BZ #19305].
* stdlib/qsort.c (SWAP): Remove.
(check_alignment, swap_u32, swap_u64, swap_generic,
select_swap_func): New functions.
(__qsort_r):
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=400d842ed9e6a13f616ff12fa7a2400c20495c3c
commit 400d842ed9e6a13f616ff12fa7a2400c20495c3c
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date: Tue Jan 16 10:49:43 2018 -0200
stdlib: Remove use of mergesort on qsort
This patch removes the mergesort optimization on qsort{_r} implementation
and use the quicksort instead. The mergesort implementation has some
issues:
- It is as-safe only for certain types sizes (if total size is less
than 1 KB with large element sizes also forcing memory allocation)
which contradicts the function documentation. Although not required
by the C standard, it is preferable and doable to have a O(1) space
qsort implementation.
- The malloc for certain element size and element number adds arbitrary
latency (might even be worse if malloc is interposed).
- To avoid trigger swap from memory allocation the implementation relies
on system information that might be virtualized (for instance VMs with
overcommit memory) which might leads to potentially use of swap even
if system advertise more memory than actually has. The check also have
the downside of issuing syscalls where none is expected (although only
once per execution).
- The mergesort is suboptimal on already sorted array (BZ#21719).
The quicksort implementation is already optimized to use constant extra
space (due the limit of total number of elements from maximum VM size)
and thus can be used to avoid the malloc usage issues.
Using bench-qsort (i7-4790K, gcc 7.2.1) shows the performance difference
between mergesort (base) and quicksort (patched):
Results for member size 4
Sorted
nmemb | base | patched | diff
32| 1447 | 1401 | -3.18
4096| 315978 | 351333 | 11.19
32768| 2559093 | 3369386 | 31.66
524288| 46228488 | 63192972 | 36.70
MostlySorted
nmemb | base | patched | diff
32| 1974 | 2391 | 21.12
4096| 922332 | 1124074 | 21.87
32768| 9268671 | 11196607 | 20.80
524288| 186856297 | 215908169 | 15.55
Unsorted
nmemb | base | patched | diff
32| 1978 | 4993 | 152.43
4096| 916413 | 1113860 | 21.55
32768| 9270003 | 11251293 | 21.37
524288| 187606088 | 217252237 | 15.80
Results for member size 8
Sorted
nmemb | base | patched | diff
32| 1424 | 1296 | -8.99
4096| 299105 | 359418 | 20.16
32768| 2737859 | 3535229 | 29.12
524288| 53082807 | 69847251 | 31.58
MostlySorted
nmemb | base | patched | diff
32| 2129 | 2745 | 28.93
4096| 969465 | 1222082 | 26.06
32768| 9605227 | 12244800 | 27.48
524288| 193353927 | 241557971 | 24.93
Unsorted
nmemb | base | patched | diff
32| 2194 | 2972 | 35.46
4096| 958610 | 1314861 | 37.16
32768| 9664246 | 12397909 | 28.29
524288| 193758429 | 241789262 | 24.79
Results for member size 32
Sorted
nmemb | base | patched | diff
32| 4477 | 1305 | -70.85
4096| 1109492 | 346332 | -68.78
32768| 11075976 | 3458244 | -68.78
524288| 230773658 | 72793445 | -68.46
MostlySorted
nmemb | base | patched | diff
32| 5905 | 5435 | -7.96
4096| 2568895 | 2032260 | -20.89
32768| 24936755 | 19909035 | -20.16
524288| 526442900 | 390339319 | -25.85
Unsorted
nmemb | base | patched | diff
32| 6004 | 5833 | -2.85
4096| 2437943 | 2022531 | -17.04
32768| 24789971 | 19842888 | -19.96
524288| 525898556 | 388838382 | -26.06
An increase in latency, however some performance difference is due the fact
mergesort uses a slight improved swap operation than quicksort (which
following
patch addresses). This change also renders the BZ #21719 fix unrequired
(since
it is to fix the sorted input performance degradation for mergesort). The
manual is also updated to indicate the function is not async-cancel safe.
Checked on x86_64-linux-gnu.
[BZ #21719]
* stdlib/Makefile (routines): Remove msort.
(CFLAGS-msort.c): Remove rule.
* stdlib/msort.c: Remove file.
* stdlib/qsort.c (_quicksort): Rename to __qsort_r and add weak_alias
to qsort_r.
(qsort): New symbol.
* manual/argp.texi: Remove qsort @acu* annotation.
* manual/locale.texi: Likewise.
* manual/search.texi: Likewise.
-----------------------------------------------------------------------
--
You are receiving this mail because:
You are on the CC list for the bug.