This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB


On Mon, Jun 29, 2015 at 01:37:05PM -0500, Steven Munroe wrote:
> Lets look at a real customer example. The customer wants to use the P8
> 128-bit add/sub but also wants to be able to unit test code on existing
> P7 machines. Which results in something like this:
> 
> static inline vui32_t
> vec_addcuq (vui32_t a, vui32_t b)
> {
>         vui32_t t;
> 
>                 if (__builtin_cpu_supports("PPC_FEATURE2_HAS_VSXâ))
>                 {
>                 
>                         __asm__(
>                             "vaddcuq %0,%1,%2;"
>                             : "=v" (t)
>                             : "v" (a),
>                               "v" (b)
>                             : );
>                 }
>                 else
>                         vui32_t c, c2, co;
>                         vui32_t z= {0,0,0,0};
>                         __asm__(
>                             "vaddcuw %3,%4,%5;\n"
>                             "\tvadduwm %0,%4,%5;\n"
>                             "\tvsldoi %1,%3,%6,4;\n"
>                             "\tvaddcuw %2,%0,%1;\n"
>                             "\tvadduwm %0,%0,%1;\n"
>                             "\tvor %3,%3,%2;\n"
>                             "\tvsldoi %1,%2,%6,4;\n"
>                             "\tvaddcuw %2,%0,%1;\n"
>                             "\tvadduwm %0,%0,%1;\n"
>                             "\tvor %3,%3,%2;\n"
>                             "\tvsldoi %1,%2,%6,4;\n"
>                             "\tvadduwm %0,%0,%1;\n"
>                             : "=&v" (t), /* 0 */
>                               "=&v" (c), /* 1 */
>                               "=&v" (c2), /* 2 */
>                               "=&v" (co) /* 3 */
>                             : "v" (a), /* 4 */
>                               "v" (b), /* 5 */
>                               "v" (z)  /* 6 */
>                             : );
>                         t = co;
>                 }
>         return (t);
> }
> 
> So it is clear to me that executing 14+ instruction to decide if I can
> optimize to use new single instruction optimization is not a good deal.
>
No, this is prime example that average programmers shouldn't use hwcap
as that results in moronic code like this.

When you poorly reinvent wheel you will get terrible performance like
fallback here. Gcc already has 128 ints so tell average programmers to
use them instead and don't touch features that they don't understand.

As gcc compiles addition into pair of addc, adde instructions a
performance gain is minimal while code is harder to maintain. Due to
pipelining a 128bit addition is just ~0.2 cycle slower than 64 bit one
on following example on power8.


int main()
{
  unsigned long i;
  __int128 u = 0;
//long u = 0;
  for (i = 0; i < 1000000000; i++)
    u += i * i;
  return u >> 35;
}

[neleai@gcc2-power8 ~]$ gcc uu.c -O3
[neleai@gcc2-power8 ~]$ time ./a.out 

real	0m0.957s
user	0m0.956s
sys	0m0.001s

[neleai@gcc2-power8 ~]$ vim uu.c 
[neleai@gcc2-power8 ~]$ gcc uu.c -O3
[neleai@gcc2-power8 ~]$ time ./a.out 

real	0m1.040s
user	0m1.039s
sys	0m0.001s


 
> One instruction (plus the __builtin_cpu_supports which should be and
> immediate,  branch conditional) is a better deal. Inlining so the
> compiler can do common sub-expression about larger blocks is an even
> better deal.
> 
That doesn't change fact that its mistake. A code above was bad as it
added check for single instruction that takes a cycle. When difference
between implementations is few cycles then each cycle matter (otherwise
you should just stick to generic one). Then a hwcap check itself causes
slowdown that matters and you should use ifunc to eliminate.

Or hope that its moved out of loop, but when its loop with 100
iterations a __builtin_cpu_supports time becomes imaterial.


> I just do not understand why there is so much resistance to this simple
> platform ABI specific request.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]