This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Unify pthread_once (bug 15215)


On Thu, 2014-04-10 at 17:38 -0400, Carlos O'Donell wrote:
> On 04/07/2014 08:37 AM, Torvald Riegel wrote:
> > On Fri, 2014-03-28 at 19:29 -0400, Carlos O'Donell wrote:
> >> David, Marcus, Joseph, Mike, Andreas, Steve, Chris,
> >>
> >> We would like to unify all C-based pthread_once implmentations
> >> per the plan in bug 15215 for glibc 2.20.
> >>
> >> Your machines are on the list of C-based pthread_once implementations.
> >>
> >> See this for the intial discussions on the unified pthread_once:
> >> https://sourceware.org/ml/libc-alpha/2013-05/msg00210.html
> >>
> >> The goal is to provide a single and correct C implementation of 
> >> pthread_once. Architectures can then build on that if they need more 
> >> optimal implementations, but I don't encourage that and I'd rather
> >> see deep discussions on how to make one unified solution where
> >> possible.
> >>
> >> I've also just reviewed Torvald's new pthread_once microbenchmark which
> >> you can use to compare your previous C implementation with the new
> >> standard C implementation (measures pthread_once latency). The primary
> >> use of this test is to help provide objective proof for or against the
> >> i386 and x86_64 assembly implementations.
> >>
> >> We are not presently converting any of the machines with custom
> >> implementations, but that will be a next step after testing with the
> >> help of the maintainers for sh, i386, x86_64, powerpc, s390 and alpha.
> >>
> >> If we don't hear any objections we will go forward with this change
> >> in one week and unify ia64, hppa, mips, tile, sparc, m68k, arm
> >> and aarch64 on a single pthread_once implementation based on sparc's C
> >> implementation.
> 
> This version looks good to me.
> 
> Please check it in after fixing the one nit where you needed double 
> space after a period.

Committed.

[...]

> * Send another notification to the maintainers about the change.
>   - This gives them another chance to look at the benchmark numbers.
> * Work with any arch maintainers to look at performance losses.

Please get in touch with me if you feel your arch has been affected in a
negative way.

> >>> +   When forking the process, some threads can be interrupted during the second
> >>> +   state; they won't be present in the forked child, so we need to restart
> >>> +   initialization in the child.  To distinguish an in-progress initialization
> >>> +   from an interrupted initialization (in which case we need to reclaim the
> >>> +   lock), we look at the fork generation that's part of the second state: We
> >>> +   can reclaim iff it differs from the current fork generation.
> >>> +   XXX: This algorithm has an ABA issue on the fork generation: If an
> >>> +   initialization is interrupted, we then fork 2^30 times (30b of once_control
> >>
> >> What's "30b?" 30 bits? Please spell it out.
> >>
> >>> +   are used for the fork generation), and try to initialize again, we can
> >>> +   deadlock because we can't distinguish the in-progress and interrupted cases
> >>> +   anymore.  */
> >>
> >> Would you mind filing a bug for this in the upstream bugzilla?
> > 
> > https://sourceware.org/bugzilla/show_bug.cgi?id=16816
> > 
> >> It's a distinct bug from this unification work, but a valid problem.
> >>
> >> Can this be fixed by detecting generation counter overflow in fork
> >> and failing the function call?
> > 
> > Yes, but this would prevent us from doing more than 2^30 fork calls.
> > That may not be a problem in practice -- but if so, then we won't hit
> > the ABA either :)
> 
> It's probably not a problem, because 2^30 forks of even a 1MB process
> is going to need 1 Petabyte or more of memory/swap, but still...

The forked processes don't need to be still running, right?  So the
overall time necessary to do the forks might be a more useful bound.

> A security issue is introduced here in that early corruption of the fork
> generation counter could lead to deadlock. We close that window slightly
> by doing a sanity check on the generation counter to detect overflow.
> It doesn't fix all cases, but it means you can't easily corrupt the gen
> counter early and then wait for the fork to deadlock. You now need to
> corrupt the fork generation counter after the check which is a smaller
> window.

I'm not sure this would really help.  The assert would terminate the
program, the deadlock would prevent forward progress.  Neither option is
always better than the other.  The deadlock on pthread_once might
actually be better.

There are other options to avoid the ABA issue, but all come with a bit
of complexity:
* Don't allow concurrent fork with pthread_once.  This is equivalent to
a reader/writer lock scheme with pthread_once being the readers, and
writers having priority.  The readers can either increment one global
var, or have per-thread flags that a fork scans.
* Block fork before overflow until no pthread_once is active. A
variation of the previous option.
* Rely on getting 64b mutexes eventually.  Those would be useful for the
condvars too.

> Either way I think an assert on overflow in fork.c is needed, but that's
> another fix that I expect you to submit after this one. Note that the
> implementation is in: nptl/sysdeps/unix/sysv/linux/fork.c, and the
> limit of 2^30 forks only applies to applications linked against libpthread
> which provides a strong definition of fork that overrides libc's weak
> definition (which does a lot less). In the dynamic case libpthread's
> version of fork is used because it is loaded first since it depends on libc
> (remember that weak/strong are not applied to dynamic libraries per ELF
> rules).

I'd wait with this until we have consensus what to do precisely (see
above).

> >>> +	    return 0;
> >>> +
> >>> +	  oldval = val;
> >>> +	  /* We try to set the state to in-progress and having the current
> >>> +	     fork generation.  We don't need atomic accesses for the fork
> >>> +	     generation because it's immutable in a particular process, and
> >>> +	     forked child processes start with a single thread that modified
> >>> +	     the generation.  */
> >>> +	  newval = __fork_generation | 1;
> >>
> >> OT: I wonder if Valgrind will report a benign race in accessing __fork_generation.
> > 
> > Perhaps.  I believe that eventually, lots of this and similar variables
> > should be atomic-typed and/or accessed with relaxed-memory-order atomic
> > loads.  This would clarify that we expect concurrent accesses and that
> > they don't constitute a data race.
> 
> I don't see how Valgrind would know this from the binary itself, but
> I guess this will just need to have per-glibc-version exceptions for
> Valgrind.

If Valgrind wants to be useful for any program that uses atomics to
synchronize (in contrast to just using pthread mutexes, for example), it
will have to somehow get aware of language-level synchronization (e.g.,
atomics).  IOW, I don't think this is a glibc-specific problem (and
would thus justify per-glibc-version exceptions).

> Your ChangeLog still needs to follow the normal format, including
> header line with date and name, blank line, and tab before text on
> lines thereafter.

Yeah, that was just a formatting issue when pasting to email.

> Don't forget to update NEWS

I don't think #15215 is fixed already.  (Hint: parts c) and d) need
review :)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]