This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] Adding strncmp functionality to strcmp


Hi,

I am now looking how implement unaligned strncmp.

The first step is gather and analyze input data. To get them 
I used a simple program that you use with commands

gcc -fPIC shared strncmp.c -o strncmp.so
LD_PRELOAD=./strncmp.so bash 2> data # now run something to gather data

#include <stdio.h>
int strncmp(unsigned char *x,unsigned char *y,int n){
 fprintf(stderr, "n:%i ",n);
 int i;
 for (i=0;i<n;i++) {
   if (x[i]!=y[i]) {fprintf(stderr,"dif\n"); return x[i]-y[i];}
   if (!x[i]) {fprintf(stderr,"same\n"); return 0;}
 }
 fprintf(stderr,"size\n"); return 0;
}

Now when I looked at data I noticed that n is almost always less than
64, probability of mismatch is around 4/5 and ending at n is 1/5 which I
got from

21:15:14:~$ grep dif  data | wc -l
102495
21:15:18:~$ grep same data | wc -l
0
21:15:24:~$ grep size data | wc -l
25031

Now given this information I needed to decide how make hot case in
strncmp as fast as possible. A relevant part how this is done in strcmp
 checks first 64 bytes in roughtly following way:

strcmp(char *a, char *b)
{
  int or = (((unsigned long) a) | ((unsigned long) b)) & 4095;
  if (or <= 4096 - 64) /* Does not cross page boundary.  */
    {
      /* Function mask16 (a, b) return x such that for 0 <= i < 16
         x & (1 << i) == (a[i] != b[i] || a[i] == 0) << i */
      unsigned long m = mask16 (a, b); /* Check bytes a[0]..a[15].  */
      if (m)
        return a[ffs (m)] - b[ffs (m)];
      m = mask_next48 (a, b); /* Check bytes a[16]..a[63].  */
      if (m)
        return a[ffs (m)] - b[ffs (m)];
    }

Given these data a solution that adds least overhead is to use mask
which has first n bits 1 to filter out matches after n-1'th byte in
mask. We first check if there is match(4/5 of cases) then if n<64 and
as we checked 64 bytes there is no match. On x64 there is little problem
that shifts are done modulo 64 so we must check mask again wnen n>=64.

This could be done with following code.

strncmp(char *a, char *b, size_t n)
 if (!n)
    return 0;
  int or = (((unsigned long) a) | ((unsigned long) b)) & 4095;
  if (or <= 4096 - 64)
    {
      unsigned long xm = 1L << n;
      xm = xm ^ (xm - 1);
      unsigned long m = mask16 (a, b);
      if (m & xm)
        return a[ffs (m & xm)] - b[ffs (m & xm)];
      m = m | mask_next48 (a, b);
      if (m & xm)
        return a[ffs (m & xm)] - b[ffs (m & xm)];
      if (n <= 64)
        return 0;
      if (m) /* This is needed as x64 does 64bit shifts modulo 64.  */
        return a[ffs (m)] - b[ffs (m)];
    }

For larger strings there is a loop case. In strcmp I use a counter to
detect crossing a page. For strcmp I could get nearly identical
performance by storing in that counter minimum of crossing page and
exceeding n bytes.

A calculations to archieve this are bit involved so I would like
somebody to check that I did not ommit any case. These are at strcmp and
strncmp skeletons that I attached. 

>From code review perspective I do not know which is better, 
First  possibility is check first a c language skeletons, let gcc generate assembly
and send diff that adds actual detection. This has disadvantage of duplication 
between strcmp and strncmp.

Second one would be simulate c computatitons in assembly and add it to
strcmp-sse2-unaligned.S while ensuring that we do not accidentaly modify
additional register.

What do you prefer?

Attachment: strcmp_skeleton.c
Description: Text document

Attachment: strncmp_skeleton.c
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]