This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH RFC] introduce dl_iterate_phdr_parallel



On 02/08/2016 07:46, Gleb Natapov wrote:
> On Mon, Aug 01, 2016 at 05:23:47PM -0300, Adhemerval Zanella wrote:
>>>> Now, I see we have two options here:
>>>>
>>>> 1. Work on your proposal for dl_iterate_phdr_parallel and *document* that
>>>>    current dl_iterate_phdr serializes callback execution and programs that
>>>>    would like to scale its execution should move to dl_iterate_phdr_parallel.
>>>>    We would need to further document that scalability uses an internal 64
>>>>    locks and using depending of thread and number of processor this limit
>>>>    can be hit.
>>> 64 locks is implementation detail. It could be change to lock per thread
>>> where a writer has to lock thread list + all thread locks.
>>
>> My point here is more to document current behaviour and performance assumption
>> to actually discuss the choice of 64 for lock numbers.
>>
> OK.
> 
>>>
>>>>
>>>> 2. Another option is to push to cleanup dl_iterate_phdr interface to *not*
>>>>    require callback serialization and use a rdlock while accessing the
>>>>    maps list.  With current rwlock implementation performance won't change
>>>>    as you noticed, however since we are reworking and changing to a more
>>>>    scalable one [1] the read only pass should *much* better: the 
>>>>    __pthread_rwlock_rdlock_full should issue just a atomic_fetch_add_acquire
>>>>    for uncontented read case.
>>> I saw new rwlock implementation and tested it. It is much better that
>>> current rwlock, but still almost twice slower than multiple locks.
>>> Profiling shows that exchange in unlock takes a lot of cpu. No wonder
>>> since congested locking operation is very expensive. IMO ideal solution
>>> is array of rwlocks.
>>
>> I would expect so since the unlock phase on new algorithm will yield an
>> 'atomic_compare_exchange_weak_release' in a loop it will be potentially 
>> slower than just a 'lll_unlock' (which issues a lock decl on x86_64).
>>
>> Performance-wise an array of rdlocks should be best approach.  What I am
>> not comfortable with is adding another glibc specific interface to clear
>> circle around a bad interface specification.
>>
> I proposed new interface to be on a safe side as an RFC, I am not against
> fixing existing one. What is the consequences of modifying current behaviour
> though? Old libgcc will stop working with new glibc, but as far as I
> understand that can be fixed with symbol versioning (how?). What about other
> applications if any?
> 

First we need to define if we should change dl_iterate_phdr interface to
remove or not the requirement of running callback mutually exclusive.  If
this idea is to keep with current behaviour I see we need to work on
make this behaviour explicit in documentation and check if it is possible
to add the lock array mechanism without the add of parallel version.

If we decide that locking of callback call are not meant to be the interface
requirement, we can add a new version (2.25 for instance) that does not
have this behaviour while still providing an older one (2.2.5) that still
does locking.  Main problem is building old libstdc++ with newer glibc might
not work, so I am not sure if this is the best course of action.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]