This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: HWCAP is method to determine cpu features, not selection mechanism.
- From: Steven Munroe <munroesj at linux dot vnet dot ibm dot comcom>
- To: GLIBC Devel <libc-alpha at sourceware dot org>
- Cc: Szabolcs Nagy <szabolcs dot nagy at arm dot com>, Carlos Eduardo Seo <cseo at linux dot vnet dot ibm dot com>, Steve Munroe <sjmunroe at us dot ibm dot com>, OndÅej BÃlka <neleai at seznam dot cz>, Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, pinskia at gmail dot com
- Date: Thu, 25 Jun 2015 10:58:46 -0500
- Subject: Re: HWCAP is method to determine cpu features, not selection mechanism.
- Authentication-results: sourceware.org; auth=none
- References: <55760314 dot 6070601 at linux dot vnet dot ibm dot com> <5576FC80 dot 1090806 at arm dot com> <1433862393 dot 21101 dot 9 dot camel at sjmunroe-ThinkPad-W500> <20150609154223 dot GA20028 at domone> <1433865684 dot 21101 dot 20 dot camel at sjmunroe-ThinkPad-W500> <20150610125047 dot GA10861 at domone>
- Reply-to: munroesj at linux dot vnet dot ibm dot com
On Wed, 2015-06-10 at 14:50 +0200, OndÅej BÃlka wrote:
> On Tue, Jun 09, 2015 at 11:01:24AM -0500, Steven Munroe wrote:
> > On Tue, 2015-06-09 at 17:42 +0200, OndÅej BÃlka wrote:
> > > On Tue, Jun 09, 2015 at 10:06:33AM -0500, Steven Munroe wrote:
> > > > On Tue, 2015-06-09 at 15:47 +0100, Szabolcs Nagy wrote:
> > > > >
> > > > > On 08/06/15 22:03, Carlos Eduardo Seo wrote:
> > > > > > The proposed patch adds a new feature for powerpc. In order to get
> > > > > > faster access to the HWCAP/HWCAP2 bits, we now store them in the TCB.
> > > > > > This enables users to write versioned code based on the HWCAP bits
> > > > > > without going through the overhead of reading them from the auxiliary
> > > > > > vector.
> > > >
> > > > > i assume this is for multi-versioning.
> > > >
> > > > The intent is for the compiler to implement the equivalent of
> > > > __builtin_cpu_supports("feature"). X86 has the cpuid instruction, POWER
> > > > is RISC so we use the HWCAP. The trick to access the HWCAP[2]
> > > > efficiently as getauxv and scanning the auxv is too slow for inline
> > > > optimizations.
> > > >
>Snip
After all was said and done, much more was said then done ....
Sorry I have been on vacation and them catching up on day job from being
on vacation.
But i think we need to reset the discussion and hopefully eliminate some
misconceptions:
1) This is not about the clever things what this clever things that this
community knows how to do, it is what the average Linux application
developer is willing to learn and use.
I have tried to get them to use; CPU Platform libraries (library search
based on AT_PLATFORM). the AuxV and HWCAP directly, and use IFUNC. They
will not do this.
They think this is all silly and too complicated. But we still want them
to exploit features of the latest processor while continuing to run on
existing processors in the field. Processor architectures evolve and we
have to give them a simple mechanism that they will actually use, to
handle this. __builtin_cpu_supports() seems to be something they will
use.
2) This is not about exposing a private GLIBC resource (TCB) to the the
compiler. The TCB and TLS is part of the Platform ABI and must be known,
used, and understood by the compiler (GCC, LLVM, ...) binutils,
debuggers, etc in addition to GLIBC:
Power Architecture 64-Bit ELF V2 ABI Specification, OpenPOWER ABI for
Linux Supplement: Section 3.7.2 TLS Runtime Handling
This and other useful documents are available from the OpenPOWER
Foundation: http://openpowerfoundation.org/
If you look, you will see that TCB slots have already been allocated to
support other PowerISA specific features like; Event Based Branching,
Dynamic System Optimization, and Target Address Save. Recently we added
split-stack support for the GO language that required a TCB slot. So
adding a double word slot to cache AT_HWCAP and AT_HWCAP2 is no big
deal.
So far this all fits nicely in a single 128 byte cache-line. The TLS ABI
(which I defined back in back in 2004) reserved a full 4KB for the TCB
and extensions.
This all was not done lightly and was discussed extensively with the
appropriate developers in the corresponding projects. You all may not
have seen this because GLIBC not directly involved except as the owner
of ./sysdeps/powerpc/nptl/tls.h
The only reason we raised this discussion here because we wanted to
publish a platform specific API
in ./sysdeps/unix/sysv/linux/powerpc/bits/ppc.h to make is easier for
the compilers to access it. And we felt it would be rude not discuss
this with the community.
3) I would think that the platform maintainers would have the standing
to implement their own platform ABI? Perhaps the project maintainers
would like to weigh in on this?
4) I have ask Carlos Seo to develop some micro benchmarks to illuminate
the performance implications of the various alternatives to the direct
TCB access proposal. If necessarily we can provide detail cycle accurate
instruction pipeline timings.