This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS

From: Alexandre Oliva <aoliva at redhat dot com>
To: Torvald Riegel <triegel at redhat dot com>
Cc: Andreas Schwab <schwab at linux-m68k dot org>, libc-alpha at sourceware dot org
Date: Fri, 05 Jun 2015 14:43:12 -0300
Subject: Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
Authentication-results: sourceware.org; auth=none
References: <orvbf5ffyt dot fsf at livre dot home> <1433326788 dot 21461 dot 81 dot camel at triegel dot csb> <ora8whexn9 dot fsf at livre dot home> <1433344426 dot 21461 dot 202 dot camel at triegel dot csb> <orwpzkee1h dot fsf at livre dot home> <1433419889 dot 21461 dot 294 dot camel at triegel dot csb> <ork2vidb0s dot fsf at livre dot home> <1433500798 dot 21461 dot 332 dot camel at triegel dot csb>

On Jun  5, 2015, Torvald Riegel <triegel@redhat.com> wrote:

>> Is the l_tls_offset field the data you're talking about?

> Yes, in this case.

> (There may be other data that is initialized once inside the critical
> section and then accessed outside.  Accesses to those data can use
> nonatomic accesses if data like l_tls_offset makes them data
> data-race-free.)

Ok, this means we can take them out of the analysis and focus on
l_tls_offset alone.  Let's do that, shall we?

Consider this:

enum spin { NEGATIVE = -1, UNKNOWN = 0, POSITIVE = 1 };

void *init () {
  static enum spin electron;
  take lock;
  electron = UNKNOWN;
  release lock;
  return &electron;
}

bool make_positive (void *x) {
  enum spin *p = x;
  take lock;
  if (*p == UNKNOWN)
    *p = POSITIVE;
  release lock;
  return *p == POSITIVE;
}

bool positive_p (void *x) {
  enum spin *p = x;
  if (*p != NEGATIVE) {
    take lock;
    if (*p == UNKNOWN)
      *p = NEGATIVE;
    release lock;
  }

  return *p == POSITIVE;
}

The program has to call init to get the pointer, so that happens before
any uses of the pointer.  From then on, any threads may call
make_positive or positive_p, however many times they wish, passing them
the pointer returned by init.  init should not be called again.  There
is no other data associated with the electron or the global lock (not
declared here) that guards it.  The enum type is opaque, not visible to
callers, and its alignment and size ensure it will always be loaded and
stored atomically.

This is supposed to implement a state machine in which the spin starts
as UNKNOWN, but once it is determined to be POSITIVE or NEGATIVE, it
won't ever change again.  make_positive will turn UNKNOWN spins into
positive, whereas positive_p will turn them into NEGATIVE.

This is analogous to the code we have in place right now.

The change I'm proposing is an optimization to positive_p, that changes
the following line:

  if (*p != NEGATIVE) {

to:

  if (*p == UNKNOWN) {

because we don't have to take the lock when *p has already advanced to
POSITIVE.

I have two questions for you about this proposed change:

Is it really that hard to see that the change is correct, safe, and a
strict performance improvement?

What is the *minimum* set of atomic loads and stores that we have to
introduce, replacing unadorned loads and stores, to make the whole thing
compliant with our standards, should this be GNU libc code?

>> > One thing to note is that acquire loads synchronize with release stores
>> > (or stronger) on the *same* memory location.  An acquire load does not
>> > synchronize with operations on a lock, unless the acquire load peeks
>> > into the lock and does an acquire load on the lock's state or such.
>> 
>> > Therefore, when you think about which effect an acquire load has,
>> > consider which release store you are actually thinking about.  An
>> > acquire operation does not have an effect on it's own, only in
>> > combination with other effects in the program.  This is also why we want
>> > to document which release store an acquire load is supposed to
>> > synchronize with.
>> 
>> > Thus, which release store are you thinking about in this case?
>> 
>> Nothing but l_tls_offset.

> I haven't see a release-MO atomic store to l_tls_offset yet.

Of course, there isn't any ATM.  What I'm saying is that l_tls_offset is
the only memory location that's relevant in this situation.

The part of the dynamic loader that loads the module that defines the
TLS variable is init, the part of the dynamic loader that resolves a TLS
relocation in a way that requires the variable to be in static TLS is
make_positive, and the part of tls_get_addr that I'm modifying in the
patch is is_positive.  Any static TLS offset maps to POSITIVE,
FORCED_DYNAMIC maps to NEGATIVE, and NO_TLS_OFFSET maps to UNKNOWN.

> Nor a release fence.

The release fences are the unlock operations after each and every one of
the stores.

>> >> Now, in order for any such access to take place, some relocation applied
>> >> by A must be seen by the observing thread, and if there isn't some
>> >> sequencing event that ensures the dlopen (or initial load) enclosing A
>> >> happens-before the use of the relocation, the whole thing is undefined;
>> >> otherwise, this sequencing event ought to be enough of a memory barrier
>> >> to guarantee the whole thing works.  It's just that the sequencing event
>> >> is not provided by the TLS machinery itself, but rather by the user, in
>> >> sequencing events after the dlopen, by the init code, in sequencing the
>> >> initial loading and relocation before any application code execution, or
>> >> by the thread library, sequencing any thread started by module
>> >> initializers after their relocation.

>> > If that's the situation in the static case

>> The paragraph quoted above applies to both cases.

> Huh?  Why would you then need the double-checked locking at all?

Because it's required to ensure no two threads take it upon themselves
to decide whether l_tls_offset will be static or dynamic.  The paragraph
above, about other data such as the TLS initializer block, applies to
both cases.

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

Follow-Ups:
- Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Torvald Riegel

References:
- [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Alexandre Oliva
- Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Torvald Riegel
- Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Alexandre Oliva
- Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Torvald Riegel
- Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Alexandre Oliva
- Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Torvald Riegel
- Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Alexandre Oliva
- Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Torvald Riegel

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]