This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH RFC] introduce dl_iterate_phdr_parallel
- From: Torvald Riegel <triegel at redhat dot com>
- To: Florian Weimer <fweimer at redhat dot com>
- Cc: Gleb Natapov <gleb at scylladb dot com>, Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, libc-alpha at sourceware dot org
- Date: Wed, 03 Aug 2016 18:12:18 +0200
- Subject: Re: [PATCH RFC] introduce dl_iterate_phdr_parallel
- Authentication-results: sourceware.org; auth=none
- References: <20160725142326.GM1018@scylladb.com> <579A6F54.2080709@linaro.org> <20160731091642.GF2502@scylladb.com> <579F8FA8.9060009@linaro.org> <20160801184946.GL17903@scylladb.com> <1470080795.19224.101.camel@localhost.localdomain> <113b9545-292b-e089-c00c-072da711c7ec@redhat.com>
On Wed, 2016-08-03 at 12:53 +0200, Florian Weimer wrote:
> On 08/01/2016 09:46 PM, Torvald Riegel wrote:
> > The new rwlock is built so that it supports process-shared usage, which
> > means that we have to put everything into struct pthread_rwlock_t. This
> > will lead to contention if you rdlock it frequently from many threads.
> > There is potential for tuning there because we haven't looked closely at
> > adding back-off in the CAS loop (and if you tested on an arch without
> > direct HW support for fetch-add, the CAS loop used instead of that might
> > also be suboptimal).
>
> The rwlock doesn't eliminate the contention at the hardware level.
It does not, of course, as I wrote. But there are ways to decrease the
contention (eg, by proper back-off), which has not been added yet.
> If that causes a performance issue, we could reuse Ingo Molnar's brlock
> approach: per-thread, readers acquire their own lock, writers acquire
> the locks of all threads. This is fairly efficient in the read case
> (and I suspect you can't get much better than that in a non-managed run
> tine), but the write case is obviously extremely costly. This could be
> the right trade-off here, though.
You can do better, for example if all that the rdlock critical sections
do is snapshot data for which the underlying memory will nto be
unmapped. That's why I asked about the details of the synchronization
problem :)