This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug nptl/13065] New: Race condition in pthread barriers


http://sourceware.org/bugzilla/show_bug.cgi?id=13065

           Summary: Race condition in pthread barriers
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: nptl
        AssignedTo: drepper.fsp@gmail.com
        ReportedBy: bugdal@aerifal.cx


The glibc/NPTL implementation of pthread barriers has a race condition, whereby
threads exiting the barrier may access memory belonging to the barrier after
one or more of the pthread_barrier_wait calls has returned. At this point, per
POSIX, the barrier is supposed to be in "the state it had as a result of the
most recent pthread_barrier_init() function that referenced it." In particular,
it's valid to call pthread_barrier_destroy on the barrier then re-initialize it
with a new value, or to free/unmap the memory it's located in. The latter
operation would especially make sense for a process-shared barrier where the
caller is done using the barrier but other processes may continue to use it.

See the attachedment for a proof-of-concept that causes NPTL's
pthread_barrier_wait to crash. This usage is not quite "correct" (the barrier
should be destroyed before unmapping the memory, and only one thread should
destroy it) but these issues could easily be fixed by throwing in a mutex. I've
just made the code as simple as possible to demonstrate the bug.

Michael Burr proposed a solution
(http://stackoverflow.com/questions/5886614/how-can-barriers-be-destroyable-as-soon-as-pthread-barrier-wait-returns/5902671#5902671)
to this problem which which we have successfully incorporated into musl:

http://git.etalabs.net/cgi-bin/gitweb.cgi?p=musl;a=commitdiff;h=f16a3089be33a75ef8e75b2dd5ec3095996bbb87;hp=202911435b56fe007ca62fc6e573fa3ea238d337

However it only works for non-process-shared barriers, as it requires all
waiters to be able to access the first waiter's address space. I am not aware
of any fix for process-shared barriers that does not involve allocating shared
resources at pthread_barrier_wait time, which could of course fail and leave
the caller with no way to recover... I suspect fixing this robustly may require
adding a FUTEX_BARRIER operation to the kernel that does not return success
until "val" processes all call FUTEX_BARRIER on the same futex address.

Note that, unfortunately, process-shared barriers are the area where this bug
has the greatest chance of hitting real-world applications, since a process is
likely to unmap the barrier soon after it's done using it.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]