This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

about glibc qsort

I am writing again about glibc qsort algorithms and its performance. I
don't know exactly what to do to attract attention on this subject, but
speed of GNU qsort  could greatly improved. At least in my experiments,
i can get quite large improvements. One of the reasons that could make
qsort so slow is the use of memcpy function in the inner loop. this is
inefficient because memcpy is slow when copied buffers are of variable
lengths.One of the solutions i have experimented was to use 4 different
sort functions, each one implemented for a class of operands lengths:
1-sort function for operands of size 4.
2-1-sort function for operands of size 8.
3-sort function for operands of size 12,20,28,,8n+4.
4-sort function for operands of size 16,24,32,,8n.

for each of those functions i wrote my own memory copy function
optimized for size class and this made the algorithm run faster. Another
solution that made even larger speedup is to use indirect merge sorting.
indirect merge sorting is a lot faster than direct sorting for arrays
that fit into cache or for operands whose size is larger or equal than
cache lines. this is the case mainly because indirect sorting avoids
memory copying altogether. indirect sorting can easily be implemented
with the same interface of qsort and without violating standard
requirement that states that compare function must have operands inside
original array. Paul Eggert already replaces qsort by indirect merge
sort in ls.c:
To be able to do indirect sorting with the same interface, we must first
generate an array of pointers to original array elements then sort that
array of
pointers and finally rearrange elements in the original array according
to the sorted pointers array. When i tried this procedure on arrays of
size that
fits into L1 cache i got also a large speed is a
procedure written by Paul Eggert that does indirect merge sorting
maintaining the same interface as qsort :
rpl_qsort (void *base, size_t n, size_t width,
	   int (*cmp) (void const *, void const *))
  void *storage;
  size_t object_words = (width + sizeof (void *) - 1) / sizeof (void *);

  if (SIZE_MAX / sizeof (void *) / 3 * 2 < n + object_words
      || ! (storage = malloc ((object_words + n + n / 2) * sizeof (void
    qsort (base, n, width, cmp);
      /* Set up a vector of pointers to the array, for mpsort.  */
      void *tmp = storage;
      void **v = storage;
      size_t i;
      unsigned char *ip;
      v += object_words;
      for (i = 0, ip = base; i < n; i++, ip += width)
	v[i] = ip;

      /* Sort the pointers, to obtain the sorted permutation.  */
      printf("\n do rpl_qsort");
      mpsort ((void const **) v, n, (void const **) v + n, cmp);

      /* Rearrange the records to match the sorted permutation.  */
      if (width <= sizeof (void *))
	  /* Records are small, so just copy them twice to avoid
	     having to divide by width.  */
	  unsigned char *cv = (unsigned char *) v;
	  printf("\n do rpl_qsort");
	  for (i = 0, ip = cv; i < n; i++, ip += width)
	    memcpy (ip, v[i], width);
	  memcpy (base, v, ip - cv);
	  /* See Knuth vol. 3 (2nd ed.) exercise 5.2-10.  */
	  for (i = 0, ip = base; i < n; i++, ip += width)
	      unsigned char *kp = v[i];
	      if (kp != ip)
		  size_t j = i;
		  unsigned char *jp = ip;
		  memcpy (tmp, ip, width);

		      unsigned char *cbase = base;
		      size_t k = (kp - cbase) / width;
		      v[j] = jp;
		      memcpy (jp, kp, width);
		      j = k;
		      jp = kp;
		      kp = v[k];
		  while (kp != ip);

		  v[j] = jp;
		  memcpy (jp, tmp, width);

      free (storage);

The other way to optimize merge sort that is somewhat less intrusive was
posted three times on this mailing list but never got any comment. My
last posting was on:
this wassbasically an optimization that halve the extra space and extra
copying used in gnu msort_with_tmp function without violating standard
requirement that comparisons must be done inside original array. this is
different from the optimized version from :
that was first included in glibc but was undone later because it seems
it didn't respect the standard.
best regards.

Avez-vous essayé le nouveau Yahoo! Mail ? Plus rapide, plus efficace... simplement révolutionnaire ! Découvrez-le.
Lien :

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]