This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH RFC] introduce dl_iterate_phdr_parallel


On Mon, Aug 01, 2016 at 10:19:55PM +0200, Torvald Riegel wrote:
> On Mon, 2016-08-01 at 23:07 +0300, Gleb Natapov wrote:
> > On Mon, Aug 01, 2016 at 09:46:35PM +0200, Torvald Riegel wrote:
> > > If we built something custom for this and are willing to make the
> > > wrlock / exclusive-access case much more costly, we can decrease this
> > > overhead.  This could be roughly similar to one lock per thread or a set
> > > of rwlocks as you mentioned, but with less space overhead.
> > > 
> > IMO space overhead is negligible. More efficient rwlock is, of course,
> > better and can be useful in many more places. If you have something to
> > test I am willing to do so, but if custom rwlock will take time to
> > materialize we may start with lock array and change it later. The lock
> > array is not part of the interface, but implementation detail. What I
> > would like to avoid is stalling the afford while waiting for something
> > better. Exception scalability is very pressing issue for us.
> 
> I haven't looked at the workload in detail, and so can't say off-hand
> what would be best and how long it would take.  Is there a summary
> describing the synchronization that is necessary?  Once I know that, I
> can give a closer estimate.  It could be simple to do, so it might be
> worth having a look before trying the (rw?)lock-array approach.
> 
rwlock semantics is pretty much what is needed. There is a list of
loaded objects that changes rarely (only during dlopen/dlclose), but it
is accessed for read on each exception. Exceptions on different threads
are independent, so they should not interfere with each other. Having
only one lock (even rw one) means that all threads will access same
atomic variable for synchronisation which has a lot of overhead on
modern cpus. lock-array approach is a way to allow multiple threads to
process completely independently without any inter-core communication at
all. It is hard to beat. New rwlock is substantially slower on only 4
cores (it is much better than current rwlock since it avoids entering
kernel if there are only readers).

> The POSIX rwlock's complexity is partly due to having to support the
> different modes (eg, reader preference), so a custom rwlock for a
> specific use case could be much simpler.  I would guess that it looks
> more like reference counting, or if the snapshots of the readers are
> suitable, it could also be seqlock based (and thus require no memory
> modification by readers at all).
> 
I evaluated seqlock approach, but snapshoting of the object list requires
pretty deep copy (for exception handling case) and the data accessed
varies for each user of dl_iterate_phdr (elements of the list point into
mmaped parts of .so files).

--
			Gleb.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]