This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Fixing strcmp performance on power7 for unaligned loads.


On Wed, Aug 19, 2015 at 05:07:28PM -0300, Adhemerval Zanella wrote:
> Hi
> 
> Thanks for checking on that.  Comments below:
> 
> On 18-08-2015 18:18, OndÅej BÃlka wrote:
> > Hi,
> > 
> > As I told before that benchmarks should be read or they are useless so I
> > looked on powerpc ones. I noticed that power7 strcmp and strncmp are
> > about five times slower than memcmp for unaligned case.
> > 
> > Thats too much so I could easily improve performance by 50% on that case by 
> > implementing strcmp as strnlen+memcmp loop despite overhead of strnlen. 
> > As that loop is due that overhead lot slower than aligned data it should be fixed in
> > assembly by changing unaligned case to follow pattern in following c
> > code.
> > 
> [...]
> > +
> > +# include "libc-internal.h"
> > +int __strcmp_power7b(const char *a, const char *b)
> > +{
> > +  size_t len;
> > +  int ret;
> > +  len = __strnlen_power7 (a, 64);
> > +  len = __strnlen_power7 (b, len);
> > +  if (len != 64)
> > +    {
> > +      return __memcmp_power7 (a, b, len + 1);
> > +    }
> > +  ret = __memcmp_power7 (a, b, 64);
> > +  if (ret)
> > +    return ret;
> > +
> > +  const char *a_old = a;
> > +  a = PTR_ALIGN_DOWN (a + 64, 64);
> > +  b += a - a_old;
> > +
> > +  while (1)
> > +    {
> > +       len = __strnlen_power7 (b, 64);
> > +       if (len != 64)
> > +         {
> > +           return __memcmp_power7 (a, b, len + 1);
> > +         }
> > +
> > +       ret = __memcmp_power7 (a, b, 64);
> > +       if (ret)
> > +         return ret;
> > +       a+=64;
> > +       b+=64;      
> > +    }
> > +}
> >  
> >  libc_ifunc (strcmp,
> >              (hwcap2 & PPC_FEATURE2_ARCH_2_07)
> > 
> 
> Indeed this seems a better strategy, although I am not convinced it will have
> much gain by aligning the 'a' source.  The strnlen do take the source alignment
> in consideration (aligned and unaligned will take the same path), and memcmp
> implementation will take the unaligned path anyway (since although 'a' is
> aligned, 'b' won't be).
> 
You need that to be able to use memcmp as it could segfault by reading
past end which doesn't happen on aligned case. That is unless
particular memcmp guarantees it doesn't fault by reading cross-page
boundary which several implementations do.


> Using a similar strategy as you did:
> 
> int __strcmp_power7c (const char *a, const char *b)
> { 
>   if (IS_ALIGN(a, 8) && IS_ALIGN(b, 8))
>     return __strcmp_power7 (a, b);
>    
>   while (1)
>     {
>        size_t len = __strnlen_power7 (b, 64);
>        if (len != 64)
>          {
>            return __memcmp_power7 (a, b, len + 1);
>          }
> 
>        int ret = __memcmp_power7 (a, b, 64);
>        if (ret)
>          return ret;
>        a+=64; 
>        b+=64;
>     }
> } 
> 

And as strncmp I was tired when I wrote previous mail so implementation
is following, bug was that I forgot to consider checking null in limit.

int __strncmp_power7b (char *a, char *b, size_t l)
{ 
  size_t len;
  int ret;
  if (l==0)
    return 0;
  l--;
  len = strnlen (a, l < 64 ? l : 64);
  len = strnlen (b, len);
  if (len != 64)
    { 
      return memcmp (a, b, len + 1);
    } 
  ret = memcmp (a, b, 64);
  if (ret) 
    return ret;
  
  const char *a_old = a;
  a = ALIGN_DOWN (a + 64, 64);
  b += a - a_old;
  l -= a - a_old;
  while (1)
    {  
       len = strnlen (b, l < 64 ? l : 64);
       if (len != 64)
         { 
           return memcmp (a, b, len + 1);
         }
       
       ret = memcmp (a, b, 64);
       if (ret) 
         return ret;
       a+=64;
       b+=64;
       l -= 64;
    }
}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]