This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS

From: Alexandre Oliva <aoliva at redhat dot com>
To: Torvald Riegel <triegel at redhat dot com>
Cc: Andreas Schwab <schwab at linux-m68k dot org>, libc-alpha at sourceware dot org
Date: Tue, 09 Jun 2015 00:01:22 -0300
Subject: Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
Authentication-results: sourceware.org; auth=none
References: <orvbf5ffyt dot fsf at livre dot home> <1433326788 dot 21461 dot 81 dot camel at triegel dot csb> <ora8whexn9 dot fsf at livre dot home> <1433344426 dot 21461 dot 202 dot camel at triegel dot csb> <orwpzkee1h dot fsf at livre dot home> <1433419889 dot 21461 dot 294 dot camel at triegel dot csb> <ork2vidb0s dot fsf at livre dot home> <1433500798 dot 21461 dot 332 dot camel at triegel dot csb> <orvbf2aw5r dot fsf at livre dot home> <1433709547 dot 21461 dot 410 dot camel at triegel dot csb>

On Jun  7, 2015, Torvald Riegel <triegel@redhat.com> wrote:

> OK.  init sets the value unconditionally, which seems either
> counterproductive, or init is called exactly once and that call happens
> before any use by another thread.

The latter, as I wrote.

> If the latter (and that seems to be
> the case as you indicate below):

And so you noticed.

> (1) the critical section is not required

Yes, it is.

> (2) we should use an atomic access for clarity of code

No opposition to that.

> OK.  Then this isn't really double-checked locking,

Well, you used this term for this pattern back when we discussed
lock-bypassing in stream orientation.  I'm trying hard to overcome my
different background and use your terminology, but this doesn't make it
any easier :-(

> If all one wants is
> single-memory-word consensus,

That's not all we want.  We also want to give the dynamic loader
priority in assigning a module to static TLS, while it's loading a set
of modules.  HW CAS atomic insns don't give us that AFAIK.

Also, we want to make sure we wait till the dynamic loader is done with
defining the TLS variable before we access it.  It might be the case
that some module's initializer recursively dlopens additional modules
(yuck), or start a thread that attempts to access the variable.  We want
to make those accesses wait for the loader to complete its job before
they get a chance to make the variable dynamic, that other modules being
loaded might want to use as static.

> The fact that there is no other data associated with your state machine
> must be pointed out in the comments.

There is all the initialization the dynamic loader performs when the
module that defines the variable is loaded.

> It's often not the case that something is indeed "freestanding", so if
> it is we want to briefly note why (using a comment in the code).

I don't think a random participant of an existing synchronization
pattern is the right place for this sort of comprehensive
synchronization documentation.  It's a cross-cutting concern, and as we
(I?) have learned from computational reflection and aspect-oriented
programming, there's no single right place in the code to document this;
it's a side document that the relevant pieces should reference.

>> The enum type is opaque, not visible to callers, and its alignment
>> and size ensure it will always be loaded and stored atomically.

> Careful here.  The alignment and size of the type may *allow* a compiler
> (and our atomic operations) to actually work and make access atomic.
> But the compiler is *not required* to do so if plain nonatomic accesses
> are used; it could load/store byte-wise, it could reload, or it could
> store speculative values.

*nod*, but not relevant:

- load/store of entire words byte-wise would quickly drive the compiler
  out of existence, or at least out of the marketplace for compilers of
  system libraries to be used in production

- reloads would not be a problem for the pattern used in the second
  version of the patch

- speculative stores that could affect this pattern are not permitted by
  the relevant standards AFAICT

> Unless you use atomic accesses, there's no guarantee that what looks
> like a single load or store in source code actually ends up as exactly
> one atomic, full-size load or store in the generated code.

Alignment, size, and compiler's interest in generating reasonable code
does.  But we both know you don't agree with that.

> Thus, if you'd add a comment on the type, you should say that it is
> compatible (or sufficient for) atomic accesses.

This is a very pervasive assumption in GNU libc.  The only reason I can
think of to add this to every situation in which it is relevant is to
make the source code tarballs get higher compression rates.

> OK.  Note that this was the clearest bit of your patch all way along.

Well, then, what are we waiting for, considering that this is *all* the
patch does?

> It is a performance improvement.  Using a CAS instead of a critical
> section would make it even faster for the first accesses.

But it wouldn't wait for the dynamic loader to complete the loading or
the relocation.

> If you want to keep the locks, I suggest mentioning the equivalent CAS
> anyway because it's conveys the intent more clearly in this case.

Since Carlos and Siddhesh took this over, I'll leave it for them too.

> Just to clarify on previous comments you make: And init() always
> happens-before make_positive or is_positive?

Yes.

> Hmm.  This still seems somewhat inconsistent.  You are arguing that you
> do not need an acquire load on the consensus implementation above,
> because l_tls_offset would stand on its own and program logic doesn't
> need ordering dependencies between this consensus and anything else.
> Yet you seem to think the release fence is necessary.  Do you think the
> release fence is necessary for something, and if so, what is it and
> where is the matching acquire?

We'd already determined the release fence was needed and taken care of
by other means.

> Also note that an unlock operation on a lock is *not* guaranteed to have
> the same effects as an atomic_thread_fence_release.

*nod*

> The hardware may or may not treat unlocks (ie,
> release stores on a *particular* memory location)

It is my understanding that, per POSIX, unlocks ought to release all
prior stores made by the thread, and locks must acquire all
previously-released stores by any thread.  I.e., they don't apply to
single memory locations.  Locks don't even know what particular memory
locations they guard, if any.  So I can't make sense of what distinction
you intend to point out above.

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

Follow-Ups:
- Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Torvald Riegel

References:
- [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Alexandre Oliva
- Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Torvald Riegel
- Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Alexandre Oliva
- Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Torvald Riegel
- Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Alexandre Oliva
- Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Torvald Riegel
- Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Alexandre Oliva
- Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Torvald Riegel
- Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Alexandre Oliva
- Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Torvald Riegel

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]