This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 5/6][BZ #11588] x86_64: Remove assembly implementations for pthread_cond_*


On Tue, 2014-07-29 at 19:31 -0500, gratian.crisan@ni.com wrote: 
> From: Gratian Crisan <gratian.crisan@ni.com>
> 
> Switch x86_64 from using assembly implementations for pthread_cond_signal,
> pthread_cond_broadcast, pthread_cond_wait, and pthread_cond_timedwait to
> using the generic C implementation. Based on benchmarks results (see below)
> the C implementation is comparable in performance, easier to maintain, less
> bug prone, and supports priority inheritance for associated mutexes.
> Note: the bench-pthread_cond output was edited to fit within 80 columns by
> removing some white space and the 'variance' column.
> 
> C implementation, quad core Intel(R) Xeon(R) CPU E5-1620 @3.60GHz, gcc 4.7.3
> pthread_cond_[test]     iter/threads   mean       min    max        std. dev
> ----------------------------------------------------------------------------
> signal (w/o waiters)    1000000/100    93.002     57     6519657    2679.6
> broadcast (w/o waiters) 1000000/100    96.6929    57     10231506   2996.06
> signal                  1000000/1      2833.97    532    92328      1348.39
> broadcast               1000000/1      3317.85    704    172804     1108.65
> signal/wait             100000/100     7726.83    3388   23269308   22286.5
> signal/timedwait        100000/100     8148.47    3888   23172368   18712.9
> broadcast/wait          100000/100     7895.33    3888   14886020   14894.2
> broadcast/timedwait     100000/100     8362.07    3924   18439204   19950.1
> 
> Assembly implementation, quad core, Intel(R) Xeon(R) CPU E5-1620 @ 3.60GHz
> pthread_cond_[test]     iter/threads   mean       min    max        std. dev
> ----------------------------------------------------------------------------
> signal (w/o waiters)    1000000/100    94.1301    57     69489528   8016.01
> broadcast (w/o waiters) 1000000/100    104.562    57     300175497  39393.4
> signal                  1000000/1      2868.11    510    157149     1363.98
> broadcast               1000000/1      3057.23    688    180376     1192.49
> signal/wait             100000/100     7676.12    3340   24017028   20393.1
> signal/timedwait        100000/100     8157.42    3856   28700448   22368
> broadcast/wait          100000/100     7871.86    3648   27913676   21203.7
> broadcast/timedwait     100000/100     8300.47    4188   27813444   24769.8
> 
> C implementation, dual core Intel(R) Atom(TM) CPU E3825 @ 1.33GHz, gcc 4.7.3
> pthread_cond_[test]     iter/threads   mean       min    max        std. dev
> ----------------------------------------------------------------------------
> signal (w/o waiters)    1000000/100    95.077     90     28960      33.3326
> broadcast (w/o waiters) 1000000/100    114.874    90     13820      78.6426
> signal                  1000000/1      6704.17    3510   49390      3537.21
> broadcast               1000000/1      6726.35    3850   55430      3297.21
> signal/wait             100000/100     16888.2    12240  6682020    15045.4
> signal/timedwait        100000/100     19246.6    13560  6874950    15969.5
> broadcast/wait          100000/100     17228.5    12390  6461480    14780.2
> broadcast/timedwait     100000/100     19414.5    13910  6656950    15681.8
> 
> Assembly implementation, dual core Intel(R) Atom(TM) CPU E3825 @ 1.33GHz
> pthread_cond_[test]     iter/threads   mean       min    max        std. dev
> ----------------------------------------------------------------------------
> signal (w/o waiters)    1000000/100    263.81     70     120171680  90138
> broadcast (w/o waiters) 1000000/100    264.213    70     160178010  91861.4
> signal                  1000000/1      15851.7    3800   13372770   13889
> broadcast               1000000/1      16095.2    5900   14940170   16346.7
> signal/wait             100000/100     33151      7930   252746080  475402
> signal/timedwait        100000/100     34921.1    10950  147023040  270191
> broadcast/wait          100000/100     33400.2    11810  247194720  455105
> broadcast/timedwait     100000/100     35022.1    13610  161552720  30328

It seems the assembly implementation (or the runs where you used it)
suffer from very large delays which seem to be outliers; max is several
orders of magnitude higher.  This seems to be the case on the Xeon too
to some extent.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]