This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/17941] New: Propose C Metaprogramming-based qsort as GNU extension


https://sourceware.org/bugzilla/show_bug.cgi?id=17941

            Bug ID: 17941
           Summary: Propose C Metaprogramming-based qsort as GNU extension
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: libc
          Assignee: unassigned at sourceware dot org
          Reporter: daniel.santos at pobox dot com
                CC: drepper.fsp at gmail dot com

As part of my work on a paper on C Metaprogramming with gcc, I have implemented
a sqort/insertion sort algorithm based off of the legacy glibc implementation
(when msort is not used) who's performance far exceeds the current
implementation. For example, I did a series of performance tests using these
parameters:

Phenom CPU
gcc 4.9.2
16k bytes of data
element sizes between 1 and 1024 bytes
alignments between 1 and 32 bytes (where applicable)
key of unsigned integer of either 1, 2, 4 or 8 bytes (depending upon element
size, largest possible used)

The greatest performance improvement was for 1 byte elements with 219% increase
in speed and the worse was 1024 byte element size (both 2 and 8 bytes
alignments) of only 26.78% increase. The average of this set was an increase of
82.94%. I've made a chart and put it here for now:
http://loudmouth.javamonger.org/images/chart-16k-vs-msort.png.

The code can be found here: https://github.com/daniel-santos/cmeta. Here is a
break-down of the pros and cons if these techniques as I see it.

The basic mechanism of C Metaprogramming is that many data values are known at
the time a program is compiled. These techniques simply provide a way for gcc
to use such values as "compile-time constants" and create a sort function
specifically for sorting data of a specific element size, a specific element
alignment, and a specific compare function. Since -findirect-inline, pointers
to inline functions that are known to be constant at compile-time are able to
compile-out the pointer and the function call and just inline the code. This
inlining results in numerous additional optimization opportunities. Thus, at
compile-time qsort_template() causes the generation of a sort function that is
specific data type.

Pros:
Faster. Sometimes, much faster.
Adds no extra code to glibc library share librarys.

Cons:
1. Currently only works on recent GCC compilers (4.4+). It may be the case that
other compilers (icc, cl.exe) supports extensions that can enable
metaprogramming in C, but I have not ventured far enough into that. My hope is
that the compiler capabilities required will eventually become part of the
standard

2. Typically results in larger code size for the calling program. The function
qsort_template is just a psudo-template function and isn't built until a piece
of code that uses it is compiled.

3. Requires a basic C Metaprogramming library somewhere to avoid duplication of
support functions (in the above project on github, this is currently contained
in compiler.h). This library is mostly macros to verify that values are
constant at compile-time, and add gcc attributes (always_inline, etc.)

4. I have not yet discovered a mechanism to force GCC to instantiate a
"pseudo-template function" (as I call them) reliably and not inline everything
â this results in bloat. This is most pronounced when __builtin_memcopy is
expanded in multiple places.

The code its self needs more cleanup to be ready to integrate into glibc,
although the algorithm appears to now be quite efficient and error-free.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]