This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/17941] Propose C Metaprogramming-based qsort as GNU extension


https://sourceware.org/bugzilla/show_bug.cgi?id=17941

--- Comment #2 from Daniel Santos <daniel.santos at pobox dot com> ---
(In reply to Carlos O'Donell from comment #1)
> Building glibc requires gcc 4.6, so you should be fine?

At this point, it doesn't matter what glibc is built with.

> Unless you're saying that the compiled program requires gcc 4.6 or newer to
> take advantage of this?

Yes. In its current form, the program being compiled actually needs gcc 4.8+ or
some such, but this isn't necessary. With the proper compiler-sniffing macros
(which belong mostly in the "metaprogramming library" header(s)), it should
build with any half-descent C compiler, both in optimized and unoptimized
builds. The difference is that performance will be very poor unless it's a 4.4+
compiler. I haven't actually run the tests using the full gambit of compilers
on this particular algorithm yet, but I have done it on a generic red-black
tree that uses these same techniques and it essentially is poor prior to 4.4
and then gets good around 4.6 and great at 4.7. The generic red-black tree even
builds and works on 3.6.2, but is huge and horrible.

> You could check for the compiler at build time in the headers and select a
> fallback.
> 
> > 2. Typically results in larger code size for the calling program. The
> > function qsort_template is just a psudo-template function and isn't built
> > until a piece of code that uses it is compiled.
> 
> How much larger?

Well I should start with some some definitions for clarity.

qsort_r        - the current glibc qsort_r implementation in stdlib/msort.c
_quicksort     - the legacy / fallback qsort_r implementation stdlib/qsort.c
qsort_template - my proposed pseudo-template functions

So for comparison, qsort_r in my system glibc is 824 bytes. I'm building
_quicksort in my benchmark program and that is 1214 bytes. There are some
slight differences in the CFLAGS used for my system glibc and my benchmarks
however:

system glibc:
    CFLAGS="-march=native -g3 -O2 -fno-strict-aliasing -fno-stack-protector"

my benchmarks:
    CFLAGS="-march=native -mtune=native -g3 -O2 -DNDEBUG -std=gnu11"

So that being said, here are a few values for the generated code of
qsort_template instantiations. I don't have this scripted (just picking this
out of an objdump):

Elem. Size  Align     Fn size in bytes
         1      1     721 (0x2d1)
         2      1     797 (0x31d)
         2      2     754 (0x2f2)
         4      1     753 (0x2f1)
         4      2     732 (0x2dc)
         4      4     732 (0x2dc)
         8      1     785 (0x311)
         8      8     731 (0x2db)
        16     16     881 (0x371)
        24      8    1140 (0x474)
        32     16    1178 (0x49a)
        32     32    1161 (0x489)
        40      8    1361 (0x551)
        64     32    1788 (0x6fc)
       128     32     977 (0x3d1)
       256      1    1361 (0x551)
       256      2    1245 (0x4dd)
       256      4    1105 (0x451)
       256      8     977 (0x3d1)
       256     16     977 (0x3d1)
       256     32     977 (0x3d1)
       512     32     977 (0x3d1)
      1024      1    1435 (0x59b)
      1024      2    1401 (0x579)
      1024      5    1141 (0x475)
      1024      8     981 (0x3d5)
      1024     16     981 (0x3d5)
      1024     32     981 (0x3d5)

Some of this is bloat caused by attributes always_inline and flatten I'm using
to force the use of values known at compile-time. Some of this can be mitigated
with a if statement that directly calls memcpy when the size is smaller and
calls a no_inline function (or simply one defined somewhere else without a
compiler built-in) when the size is too large. The if statement would be
compiled-out after optimizations. But all of that is sub-optimal - always
better when we can let the compiler decide when to inline and when not to. gcc
uses the optimizations flags as well as some tuning parameters (like
max-inline-insns-single, etc.) which are fed via CFLAGS to make these decisions
normally and we're only overriding them here as a mechanism to cause a sort of
template instantiation.

> 
> > 3. Requires a basic C Metaprogramming library somewhere to avoid duplication
> > of support functions (in the above project on github, this is currently
> > contained in compiler.h). This library is mostly macros to verify that
> > values are constant at compile-time, and add gcc attributes (always_inline,
> > etc.)
> 
> License? Size?

I do GPL3 whenever at all possible. Some of what I have in this project is GPL2
having been borrowed from the Linux kernel. However, much of it is my own
property as they were my commits, so I can reuse them anywhere I want. :) (not
that I intend to stop working on the kernel)

As to the size, I presume that you mean the compiled size. Being a
metaprogramming library, there is not yet any of it that will add to the size
of libc.so, although I can easily foresee the addition of utility functions
that would be built with glibc (usually tiny). This "metaprogramming library"
can be thought of as somewhat analogous to the boost library for C++, most of
which is headers containing C++ templates.


> At a high level I like the novel approach.
> 
> At a practical level the license of C metaprogramming library is a key point
> in the discussion. As is making glibc depend upon it. As are a lot of other
> questions like: runtime dependencies, build time dependencies, dependencies
> of the program being built with glibc headers that rely on this feature.

No runtime dependencies.
No built-time dependencies aside from the compiler.
Programs being built with glibc headers that use this (or similar) extensions
will only need to be using a recent gcc compiler *for good results.* If the
metaprogramming library (headers) are written correctly and tested with a good
enough array of compilers, the programs built using this should still compile
and run correctly on other compilers, but will perform poorly.  In fact, I
guess there's no reason why the qsort_template function can't simply be
replaced when a good gcc compiler isn't being used. Basically, a qsort_r
compatible compar function pointer would need to be added to struct qsort_def
and then just do a #ifdef

#ifndef _GLIBC_SOME_VALUE

static inline int
qsort_template (const struct qsort_def *def, void *const pbase,
                size_t n, void *arg)
{
  qsort_r(pbase, n, def->size, def->compar);
  return 0;
}

#else
/ * the qsort_template definition... */
#endif

This way, if the programmer wants their program to have good performance when
not built with a recent gcc compiler, they need only populate the struct's
compar member. The reason I'm using less instead of compar basically comes down
to

1.) no existing standard mandating a compar function,
2.) we don't care about equality
3.) less() is the standard in C++ libraries where equality is irrelevant, and
4.) the fact that current gcc fails to properly optimize the results of a
compare function (that returns -1, 0 or 1) resulting in one or two more
instructions than necessary (I have a gcc bug open for this somewhere). It
isn't a huge difference, but I got into the habit of using less whenever
equality doesn't matter.


> 
> I would strongly suggest posting to libc-alpha@sourceware.org before moving
> forward with any broader work to integrate with glibc. I would not want to
> see such good work go to waste.

I most certainly will.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]