This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

From: Steven Munroe <munroesj at linux dot vnet dot ibm dot comcom>
To: "Carlos O'Donell" <carlos at redhat dot com>
Cc: munroesj at linux dot vnet dot ibm dot com, Rich Felker <dalias at libc dot org>, libc-alpha at sourceware dot org
Date: Sun, 05 Jul 2015 20:16:44 -0500
Subject: Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
Authentication-results: sourceware.org; auth=none
References: <55760314 dot 6070601 at linux dot vnet dot ibm dot com> <20150609163835 dot GI17573 at brightrain dot aerifal dot cx> <1435777940 dot 7125 dot 132 dot camel at oc7878010663> <5596C284 dot 9070108 at redhat dot com>
Reply-to: munroesj at linux dot vnet dot ibm dot com

On Fri, 2015-07-03 at 13:12 -0400, Carlos O'Donell wrote:
> On 07/01/2015 03:12 PM, Steven Munroe wrote:
> > If you think about the requirements for a while it becomes clear. As the
> > HWCAP cache would have to be defined and initialized in either libgcc or
> > libc, accept will be none local from any user library. So all the local
> > TLC access optimization's are disallowed. Add the requirement to support
> > dl_open() libraries leaves the general dynamic TLS model as the ONLY
> > safe option.
> 
> That's not true anymore? Alan Modra added pseudo-TLS descriptors to POWER
> just recently[1], which means __tls_get_addr call is elided and the offset
> returned immediately via a linker stub for use with tp+offset. However,
> I agree that even Alan's great work here is still going to be several
> more instructions than a raw tp+offset access. However, it would be
> interesting to discuss with Alan if his changes are sufficiently good
> that the out-of-order execution hides the latency of this additional
> instructions and his methods are a sufficient win that you *can* use
> TLS variables?
> 
I did discuss this with Alan and he agree that with the given
requirements the the standard TLS mechanism is always slower them my
original TCB proposal.

Why would you think I had not talked to Alan?

> > Now there were a lot of suggestions to just force the HWCAP TLS
> > variables into initial exec or local exec TLS model with an attribute.
> > This would resolve to direct TLS offset in some special reserved TLS
> > space?
> 
> It does. Since libc.so is always seen by the linker it can always allocate
> static TLS space for that library when it computes the maximum size of
> static TLS space.
> 
> > How does that work with a library loaded with dl_open()? How does that
> > work with a library linked with one toolchain / GLIBC on Distro X and
> > run on a system with a different toolchain and GLIBC on Distro Y? With
> > different versions of GLIBC? Will HWCAP get the same TLS offset? Do we
> > end up with .text relocations that we are also trying to avoid?
> 
> (1) Interaction with dlopen?
> 
> The two variables in question are always in libc.so.6, and therefore are
> always loaded first by DT_NEEDED, and there is always static storage
> reserved for that library.
> 
> There are 2 scenarios which are problematic.
> 
> (a) A static application accessing NSS / ICONV / IDN must dynamically
>     load libc.so.6, and there must be enough reserve static TLS space
>     for the allocated IE TLS variables or the dynamic loader will abort
>     the load indicating that there is not enough space to load any more
>     static TLS using DSOs. This is solved today by providing surplus
>     static TLS storage space.
> 
> (b) Use of dlmopen to load multiple libc.so.6's. In this case you could
>     load libc.so.6 into alternate namespaces and eventually run out of
>     surplus static TLS. We have never seen this in common practice because
>     there are very few users of dlmopen, and to be honest the interface
>     is poorly documented and fraught with problems.
> 
> Therefore in the average scenario it will work to use static TLS, or
> IE TLS variables in glibc in the average case. I consider the above
> cases to be outside the normal realm of user applications.
> 
> e.g.
> extern __thread int foo __attribute__((tls_model("initial-exec")));
> 
> (2) Distro to distro compatibility?
> 
> With my Red Hat on:
> 
> Let me start by saying you have absolutely no guarantee here at all
> provided by any distribution. As the Fedora and RHEL glibc maintainer
> your vendor is far outside the scope of support and such a scenario is
> never possible. You can wish it, but it's not true unless you remain
> very very low level and very very simple interfaces. That is to say
> that you have no guarantee that a library linked by a vendor with one
> toolchain in distro X will work in distro Y. If you need to do that
> then build in a container, chroot or VM with distro Y tools. No vendor
> I've ever talked to expects or even supports such a scenario.
> 
> With my hacker hat on:
> 
> Generally for simple features it just works as long as both distros
> have the same version of glibc. However, we're talking only about
> the glibc parts of the problem. Compatibility with other libraries
> is another issue.
> 
No! the version of GLIBC does not matter as long as the GLIBC supports
TLS (GLIBC-2.5?)

> (3) Different versions of glibc?
> 
> Sure it works, as long as all the versions have the same feature and
> are newer than the version in which you introduced the change. That's
> what backwards compatibility is for.
> 
> (4) Will HWCAP get the same TLS offset? 
> 
> That's up to the static linker. You don't care anymore though, the gcc
> builtin will reference the IE TLS variables like it would normally as
> part of the shared implementation, and that variable is resolved to glibc
> and normal library versioning hanppens. The program will now require that
> glibc or newer and you'll get proper error messages about that.
> 
> (5) Do we end up with .text relocations that we are also trying to avoid?
> 
> You should not. The offset is known at link time and inserted by the
> static linker.
> 
To avoid the text relocation I believe there is an extra GOT load of the
offset. If this is not true then Alan owes me an update to the ABI
document to explain how this would work. As the current Draft ELF2 ABI
update does not say this is supported.

> > Again the TCB avoids all of this as it provides a fixed offset defined
> > by the ABI and does not require any up-calls or indirection. And also
> > will work in any library without induced hazards. This clearly works
> > across distros including previous version of GLIBC as the words where
> > previously reserved by the ABI. Application libraries that need to run
> > on older distros can add a __built_cpu_init() to their library init or
> > if threaded to their thread create function.
> 
> You get a crash since previous glibc's don't fill in the data?
> And that crash gives you only some information to debug the problem,
> namely that you ran code for a processors you didn't support.
> 
There is NO crash. There never was a crash. There is no additional
security exposure. The only TCB fields that might be a security exposure
where already there, in every other platform.

The worst there can be is is fallback the to base implementation (the
bit is 0 when is should be 1).

As explained the dword is already there and initialized to 0 when the
page is allocate. So the load will work NOW for any GLIBC since TLS was
implemented.

As implemented by Alan and I.


> I've suggested to Carlos that this is a problem with the use of the
> TCB. If one uses the TCB, one should add a dummy symbol that is versioned
> and tracks when you added the feature, and thus you can depend upon it,
> but not call it, and that way you get the right versioning. The same
> problem happened with stack canaries and it's still painfully annoying
> at the distribution level.

This is completely unnecessary. The load associated with
__builtin_cpu_supports() will work with any GLIBC what support TLS and
the worst that will happen is it will load zeros.

You have not convince me that this is necessary.

You are trying to force to me to use a any number of techniques that
either don't actually work (on my ISA and ABI) or add unnecessary
overhead (exposure to pipeline hazards) for no added benefit.

The problems that are claimed either don't actually exist or are greatly
exaggerated.

I have explained all this in great deal. I really don't understand what
this is so hard to accept.

> It is true that you could use LD_PRELOAD to run __builtin_cpu_init()
> on older systems, but you need to *know* that, and use that. What
> provides this function? libgcc?
> 
We will provide a little init routine applications can use. This is not
hard.

> Do you want to use the IBM Advance Toolchain for POWER to be able to 
> support this feature across all distributions at the same time by not
> requiring any particular glibc version and by doing the initialization
> out of band via __builtin_cpu_init() for older glibc? It will still result
> in a weird crash of the application if the user doesn't know any better.
> 
The Advance Toolchain provides it own newer GLIBC. This feature can be
delivered in any of the current AT version within weeks after it goes
upstream.

The customer requirement for the single binary only requires that GLIBC
on the target system or from he AT is as new or newer then the GLIBC it
was linked to in the build.

So not a problem.

> It is certainly a benefit to using the TCB, that this kind of use case
> is supported. However, in doing so you adversely impact the distribution
> maintainers for the benefit of?
> 
I can not think of any adverse impacts on any of the other platform
maintainers, on any the distros.

This is all platform specific code. And a tiny amount at that.

Eventually distro's will pick this up in the normal way. The normal
distro processes used for interim release updates applies.

> Cheers,
> Carlos.
> 
> [1] https://sourceware.org/ml/libc-alpha/2015-03/msg00580.html
>

Follow-Ups:
- Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
  - From: Rich Felker
- Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
  - From: Carlos O'Donell

References:
- Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
  - From: Steven Munroe
- Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]