This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

i386 inline-asm string functions - some questions


>From some moment in the past, the next input parameters are used here
and there in sysdeps/i386/i486/bits/string.h:

        "m" ( *(struct { char __x[0xfffffff]; } *)__s)

When I was seeking for the reasons to do so, I've found some
discussions about this in libc-alpha and gcc mailing lists.  As I
understand from there, there are an options - to use the "m" arg(s)
shown above or just to use "memory" in the list of a clobbered
registers.  So, the question is: why the "m"-way had been choosen?
I'm asking, because I've found that this "m"-way leads GCC to produce
an unoptimal enough assembler, while "memory" code is ok.

Let me describe.  This is some kind of typical inline-asm string
defun:

extern inline
_s2(const char *a, const char *b)
{
    asm volatile (
        "/*%0%1%2%3*/"
        :"+&r"(a), "+&r"(b)
        :"m"(*(struct{__extension__ char __x[0xfffffff];}*)a),
         "m"(*(struct{__extension__ char __x[0xfffffff];}*)b)
        :"cc"
    );
}

It's, of course, just an essence from the typical string defun, all
real elements, which aren't important for the demonstration, are
omited.  And the references for the asm operands inside the comment
are inserted - they will be healthy.  So, compile the next:

s2(const char *a, const char *b){return _s2(a,b);}

.globl s2
	.type	s2, @function
s2:
	pushl	%esi
	pushl	%ebx
	movl	12(%esp), %edx
	movl	16(%esp), %eax
	movl	%edx, %ebx
	movl	%eax, %esi
#APP
	/*%ebx%esi(%edx)(%eax)*/
#NO_APP
	popl	%ebx
	movl	%ecx, %eax
	popl	%esi
	ret

Obviously, the following is a garbage:

	pushl	%esi
	pushl	%ebx
	movl	%edx, %ebx
	movl	%eax, %esi
	popl	%ebx
	popl	%esi

And this is the "memory" variant:

extern inline
_s2(const char *a, const char *b)
{
    asm volatile (
        "/*%0%1*/"
        :"+&r"(a), "+&r"(b):
        :"cc", "memory"
    );
}

.globl s2
	.type	s2, @function
s2:
	movl	4(%esp), %edx
	movl	8(%esp), %eax
#APP
	/*%edx%eax*/
#NO_APP
	movl	%ecx, %eax
	ret

So, we've no garbage at all, only the very good assembler.

Then the next question is: am I understand right that the problem is
in the combination of the "earlyclobber" modifier of the asm operands
and the "m" with the corresponding args in the input list?  And for
some reason GCC decides that "m" is tied with arg itself vs. a memory
this arg points to, and so a separate copy of the arg is needed, as
the corresponding output operand is early clobbered?  The content of
the comment in the "m"-way defun shows (%edx)(%eax), but it seems that
GCC thinks about %edx%eax instead.  (But very may be I'm wrong - I
don't know these GCC internals.)

Well, this is a very simple example, but my investigation shows that
the situation is the same for any C code - either simple or complex.
Always some extra registers are used, some extra loads are emited etc.
So, if both the variants are correct, it should be healthy to use the
"memory" one (as I understand, there was a time when it was really
used in sysdeps/i386/i486/bits/string.h ?).  For example it's an
output from 'size libc.so' for the GLIBC-2.3.2 compiled with
-D__USE_STRING_INLINES:

   text	   data	    bss	    dec	    hex	filename
1108363	  11296	  10820	1130479	 113fef	libc.so

and this is the same, but if just the only one defun - __strcmp_gg -
is redone thru the "memory"-way:

   text    data     bss     dec     hex filename
1107779   11296   10820 1129895  113da7 libc.so

The difference of the text's sizes is a little over 0.5k.  And there
are tens of such defuns.  So, the third question is about redoing all
the inline-asm string functions that way (of course, if there are no
any cons here).


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]