This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: pthread_mutex_unlock takes 15x longer than pthread_mutex_lock


I am really interested in this.


Have you looked at the code generated by the compiler for unlocking?
Is this slowdown reproducible across different platforms? Perhaps
doing some differential analysis might be interesting to see if the
problem exists on only one platform which might indicate that this is
really hardware-dependent as other emails have hypothesized.

Sorry to top post, but I thought that this didn't really fit anywhere cleanly.

Will

On Mon, Oct 2, 2017 at 4:28 PM, Akash Baid <akashbaid@gmail.com> wrote:
> Thanks for the insights!
>
> I tested with pthread_spin_lock and also with reducing the critical-section
> to a single counter-increment but couldn't change the results at all. See
> extracted results from a 10 sec VTune snapshot below. I am starting to
> think if the difference b/w unlock and lock calls can be explained by the
> interaction of the stack and function calls to the dynamic library somehow?
>
> By the way, I am using the default mutex, which is defined
> as PTHREAD_MUTEX_NORMAL, and this is what I use to initialize them:
>
>     pthread_spin_init(&m_spinLock, PTHREAD_PROCESS_PRIVATE);
>     pthread_mutex_init(&m_mutexLock, NULL);
>
>
> -------------------
> Line     Source       Total time
> -------------------
> 379    pthread_spin_lock(&m_spinLock); 0.052s
> 380    i = 1000;
> 381    pthread_spin_unlock(&m_spinLock); 0.592s
> 382    i = 2000;
> 383    pthread_mutex_lock(&m_mutexLock); 0.037s
> 384    i = 3000;
> 385    pthread_mutex_unlock(&m_mutexLock); 0.615s
> 386    i = 4000; 0.001s
>
>
> On Mon, Oct 2, 2017 at 3:07 AM, Ralph Loader <ralph1loader@gmail.com> wrote:
>
>> Hi,
>>
>> At least when there is no contention on the lock, the important part of
>> both the lock and unlock on x86 is an atomic operation to update the lock
>> state.
>>
>> The time that an atomic operation takes on x86 is highly variable. I
>> believe that this is because pending data in processor write buffers are
>> flushed into the memory hierarchy by the atomic operation.
>>
>> I am guessing that your application does a memcpy() or similar of the
>> packet before the unlock, and flushing that out of the write buffers is
>> causing the timing you observe.
>>
>> (Incidently, I learnt about this stuff because I too observed very similar
>> behavior in a packet processing application.)
>>
>> One possible mitigation is to use a pthread spinlock instead of a pthread
>> mutex; the former is released (on x86) by a normal write, not an atomic op.
>> This will make the unlock faster (but possibly take longer for the unlock
>> to be observed by other threads).
>>
>> Cheers,
>> Ralph.
>>
>>
>> On Mon, Oct 2, 2017 at 9:20 AM, Akash Baid <akashbaid@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have a packet processing application which takes and releases a
>>> pthread_mutex_lock for every packet (around 200k times a second). When
>>> profiling the performance through Intel VTune, I noticed that the unlock
>>> operation takes about 370ns compared to only 25ns for the lock operation.
>>> The numbers are averaged over millions of operations and are independent
>>> of
>>> the size of the critical section.
>>>
>>> I suspected lock contention to be playing a role here and thus have
>>> ensured
>>> that no other thread accesses this lock but the numbers still hover around
>>> the same mark. Any ideas as to whether this is expected behavior and if
>>> not
>>> what might be going on? My understanding was that the unlock operation
>>> under no contention should be free of system calls and extremely fast. In
>>> fact single stepping through the instructions suggest that there aren't
>>> any
>>> system calls but maybe the LOCK assembly instruction inside the low-level
>>> lock lll_unlock routine is taking long in the unlock case for some reason?
>>>
>>> The application is using libpthread dynamic shared library, glibc
>>> version 2.13 on a x86_64 GNU/Linux VM on an Intel Xeon E5 2.5Ghz processor
>>> machine.
>>>
>>
>>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]