[PATCH v2 3/4] Move libio lock single-thread optimization to generic libc-lock (BZ #27842)

Mon May 16 16:17:11 GMT 2022

Hi Adhemerval,

> The main problem is _IO_enable_locks is a clunky interface because it requires
> flockfile to set _flags2 outside a lock region leading a possible racy issue 
> (BZ #27842).  Moving to lock itself it will pretty much:

It should be fine if we use a boolean instead of a flag. IIRC the IO structure was
externally exposed in Emacs, but if that is no longer the case then we could
change the structure safely.

A quick check of rand() shows the following results for various locks and optimizations
(relative performance compared to unlocked case):

100% - rand() without locking
317% - standard lock (= current code)
108% - add SINGLE_THREAD_P optimization inside rand 
112% - single threaded optimization in _libc_lock_lock
373% - standard recursive lock
129% - recursive lock with single thread optimization

Locks are expensive and adding a single-thread optimization is important!
It looks recursive locks remain expensive (~20% slower) compared to specialized
single-thread optimization, but normal locks might be fast enough in most cases.

> I think ideally we would like to model all internal locks to a futex-like
> so we can use the congestion optimization as described by Jens Gustedt
> paper [1] which would allows us to move the counter and the lock to
> same word.  I don't think we can improve recursive locks without a 64-bit
> futex operation. 

How much lock recursion exists in GLIBC? An 8-bit counter is likely sufficient...
With a 64-bit lock you could combine owner and count into a single value -
this may not help the multi-threaded case much, but could reduce the overhead
of the single-thread case compared to non-recursive locks.

Cheers,
Wilco