This is the mail archive of the cygwin-developers@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

__stdcall and regparm


As part of my fiddling about with a putative readv/writev
implementation, I checked the improvements gained by the __stdcall
and the regparm attributes.

In summary, __stdcall makes the DLL slower as does regparm (3);
the fastest combination is to avoid __stdcall and to use regparm
(2) (and this seems to be insensitive to the number of arguments
passed to the function).  (Nb. This is just tested on my m/c, a
Pentium III, and with gcc 2.95.3-5.)

If this is true for other CPUs / compiler versions, it might be
worthwhile changing these settings throughout the DLL, unless
these declarations have been added for some other reason than
speed.

I tested this with the cygwin DLL itself, changing the
declarations of the fhandler::read (and fhandler::readv) methods,
then testing the DLL with a program that reads 16Mb from /dev/zero
one byte at a time and writes it to /dev/null (again, one byte at
a time).

The combinations I tested are as follows (fastest first):

regparm (2)                0m37.354s
__stdcall, regparm (2)     0m37.440s
regparm (1)                0m37.482s
regparm (2), regparm (3)   0m38.364s    (*)
regparm (3)                0m38.566s
neither                    0m38.654s
__stdcall                  0m38.848s
__stdcall, regparm (3)     0m39.409s    (**)

(*) This uses regparm (2) for fhandler::read and regparm (3) for
fhandler::readv, which has 3 arguments in my current
implementation.

(**) These are the current settings for the cygwin DLL.

My guess is that regparm (3) wrecks the optimization of the
calling function, since all three x86 temporary registers have to
made available for the call.  Given this, the new gcc (3.2) might
do better here as it's got a different register allocator (as I
understand it).  If I can be bothered I'll do some tests on that
tomorrow.

Nb, the difference in performance here of nearly 2 seconds between
slowest and fastest results amounts to about an eighth of a
microsecond per read(2) call; perhaps not immensely significant.
Compare that to a difference between the stock DLL and my current
readv/writev changes of something like half a microsecond per
read(2) call (simply due to the increased numbers of function
calls since read(2) is forwarded to readv(2) and so forth.).

HTH,

// Conrad




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]