This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: 0.144

From: Roland McGrath <roland at redhat dot com>
To: elfutils-devel at lists dot fedorahosted dot org
Date: Wed, 06 Jan 2010 04:44:48 -0800
Subject: Re: 0.144
> > (/lib64/libc-2.11.so) _mm_unpacklo_epi8: tests/funcretval:
> > dwfl_module_return_value_location: (null)
> 
> Looks that comes from not supporting DW_AT_GNU_vector
> http://gcc.gnu.org/ml/gcc-patches/2002-04/msg01428.html
> http://sourceware.org/ml/gdb-patches/2002-04/msg00997.html

You devious Dutchman!  You made me go and implement the ppc/ppc64 Altivec
return value support, because that's the only place where DW_AT_GNU_vector
is a determinant for return value locations in gdb's code.

But actually you were citing an x86_64 case.  There nothing actually
depends on DW_AT_GNU_vector at all.  Instead what the ebl code is hitting
is an array_type with no byte_size, when it needs to know the array size to
decide what the rule is.

It turns out that an array_type with no size is a kosher enough thing.
You're expected to look for the number of elements (really the stride) and
multiply.  For the general case that can be some hairy logic.  So I added a
new function dwarf_aggregate_size that does this for you.  It should handle
everything except the truly hairy cases where some of the sizes involved
are not constants but given as expressions and such, which will probably
always be outside of this sort of C interface in libdw.

I made all the backend code use dwarf_aggregate_size for cases that might
be DW_TAG_array_type, where previously it was just getting DW_AT_byte_size.

So now I think the return value determination is actually correct for a ppc
or ppc64 Altivec vector type.  I haven't tested this.  It isn't what you
reported about.  I don't know that it's used anywhere in e.g. a libc
function's return value so as anyone would care any time soon.  Um, yay?
(AFAIK the only real use of this ebl call is for systemtap's $return.)

Now, the x86_64 is another strange set of blind alleys.

Take this trivial example:

	#include <mmintrin.h>
	#include <xmmintrin.h>
	#include <immintrin.h>

	int arr_64[2] = {};
	__m64 fn_64 (__m64 x) { return x; }

	unsigned long arr_256[4] = {};
	__m256i fn_256i (__m256i x) { return x; }

	unsigned long arr_128[2] = {};
	__m128i fn_128i (__m128i x) { return x; }

I compiled that with "gcc -g -c -mavx" on gcc-4.4.1-2.fc11.x86_64 and what
I get looks good.  Following fn_128i's return type, through a typedef, you
get to:

 [    c2]    array_type
             GNU_vector           
             type                 [    61]
             sibling              [    d3]
 [    cc]      subrange_type
               type                 [    49]
               upper_bound          1

cf:

 [    61]    base_type
             byte_size            8
             encoding             signed (5)
             name                 "long long int"

So it's an array with range [0,1], i.e. size 2, of 8-byte unsigned integers.
That's all we care to know, and the GNU_vector decoration doesn't actually
matter to the x86_64 ABI's return-value location determination.

There's alley number one, and not so blind: at the end we got a working
dwarf_aggregate_size function and we are happy to have it.

But now let's follow the original rabbit down its hole, in the dump from
glibc-debuginfo-2.11-2.x86_64.rpm's /usr/lib/debug/lib64/libc.so.6.debug:

 [1688c4]    typedef
             name                 "__m128i"
             decl_file            2
             decl_line            47
             type                 [1688cf]
 [1688cf]    array_type
             GNU_vector           
             type                 [168867]
 [1688d5]    subprogram
             external             
             name                 "_mm_unpacklo_epi8"
             decl_file            2
             decl_line            966
             prototyped           
             type                 [1688c4]
             inline               declared_inlined (3)
             artificial           
...

cf:

 [168867]    base_type
             byte_size            8
             encoding             signed (5)
             name                 "long long int"

That first fragment omits 1688d5's children, but I elided nothing else.
i.e., 1688cf has no children.  So it doesn't give any indication of the
number of elements in the array.  I think that might be the kosher thing to
do when indicating a type like that of "extern elt_type array[];".  But the
return type of a function cannot be an array of unknown size!

IMHO that seems like a GCC bug, unless there is some magic I am missing.
If there is, I can't think where it might be well-specified magic.

For the trivial example above, gcc-4.4.2-7.fc12.x86_64 with "-g -c -mavx"
produces the same sort of (correct) output I cited above.  The DWARF in
that libc.so.6.debug says that is the very same compiler that built it.  
So it must be tickled only by some stranger situation or different options
than I am using.

I think this deserves a GCC bug filed.  But doing it properly requires
reproducing the circumstances in the glibc rpm build so as to get the .i
file and command line to reproduce this same bad output.  Perhaps there is
an idle Dutchman who wants to do that legwork.  (The CU's where an
array_type with GNU_vector and no subrange_type child nor byte_size
attribute appears are sysdeps/x86_64/multiarch/str{str,casestr,cspn-c}.c
and a few others.  They are probably all the same case in the compiler.)

Alley number two, it's pitch black in the heart of the compiler and we're
lucky not to have been eaten by a grue.  Note that one on your map.
Someone has to go back there and put that tranquilized grue into bugzilla.

While making it use dwarf_aggregate_size, I made x86_64_return_value_location
just punt to a default case if that call fails, instead of giving an error
as it did before for a missing DW_AT_byte_size.  This was sort of incidental,
on the vague theory that if it does indicate an array of unknown size (in
some language where that's expressible as a return value type) then perhaps
that means the return array is always supplied by the caller, as is the
case for all large aggregates in the x86-64 ABI anyway.  But it also
conveniently silences the loss from this compiler problem.  (Or perhaps
that is anti-convenient, I don't know.)

So now instead of barfing, it gives a wrong answer:

	() _mm_unpacklo_epi8: return value location: {0x70, 0}

Because the DWARF doesn't say the size of the array, we decide it's a
"large" one and returned by reference like a large array or struct is.
Whereas for the trivial example above where we have full proper DWARF:

() fn_128i: return value location: {0x50, 0} {0x93, 0x8} {0x51, 0} {0x93, 0x8}

Since it knows the size of the array is 16 (by the 2*8 calculation above),
that is just within the "small aggregate" size limit.  So it's returned in
two registers.  This is the answer we were always expecting, back when we
thought we'd always have a DW_AT_byte_size for any aggregate.

And, shazam, we've turned the corner into alley number three!
I bet you didn't see this one coming!  Unless maybe you read:

      /* XXX
	 Must examine the fields in picayune ways to determine the
	 actual answer.  This will be right for small C structs
	 containing integer types and similarly simple cases.
      */

      goto intreg;

in the aggregate case of x86_64_return_value_location.  
Yes, you guessed it, this answer is also wrong!

http://www.x86-64.org/documentation/abi.pdf defines these rules, and it
says it was last revised May 11, 2009, so it should be pretty well
current.  Those rules include a hairy algorithm for considering the
types to derive which locations to use for arguments and return values.
This is what that XXX comment refers to and what we haven't implemented.
It's only relevant for aggregates, which includes the vector types
(they are expressed in DWARF as arrays, as seen above).

The algorithm is hairy, but not desperately so.  GDB does implement it,
and it's not real large.  The complexity of whipping it up for libebl is
probably not many times more work than implementing dwarf_aggregate_size
was.  We just never really needed it before, and probably don't quite
need it yet for $return.  It's on the list (which is to say, there is
still an XXX comment there, and this email in the archive, and no other
record of the subject), but I'm not doing it today.

That ABI algorithm is specified in terms of C types.  It mentions the FP
vector types by their C names, but doesn't say how those are indicated
in DWARF.  In fact, they are encoded as array of float and the like.
Though they are described separately, I think the ABI in fact treats
them the same in effect as it does the literal array types that their
DWARF descriptions resemble, so it doesn't matter.  But I'm not really
sure about that.  GDB's doesn't distinguish them (see amd64_classify in
amd64-tdep.c), but it has:
    /* FIXME: __m64 .  */
    /* FIXME: __float128, __m128.  */
One should walk through the ABI algorithm and see if it could ever
matter to treat __m128 as if it were float[4], which is what we get
today.  If it does matter, then GDB should be using DW_AT_GNU_vector
(which in its code is spelled "TYPE_VECTOR (type)") to distinguish.

We should figure out all that before we get around to trying to
implement a truly correct x86_64_return_value_location in libebl.
Draft me another Dutchman!

But, wait!  That's not the alley we were supposed to be in!  That was a
similar-looking detour, because our case is __m128i, not __m128!  Jump
over into alley number four, integer vector types.

So you heard about this type-directed ABI thingabob over there in alley
number three.  That method still rules the day over here.  But that
"Draft 0.99" spec (URL above) doesn't say anything about __m128i or
other integer vector types.  So what gives?

The ABI specifically mentions __int128, which is actually called
__int128_t in GCC.  Ironically, what the spec says is:

	For classification purposes __int128 is treated as if it were
	implemented as:
		typedef struct {
		    long low, high;
		} __int128;

But GCC actually emits:

 [   169]    base_type
             byte_size            16
             encoding             signed (5)
             name                 "__int128_t"

AFAICT both "a really wide integer type" and "an aggregate of integers"
mean exactly the same thing in the ABI algorithm.  The irony is that the
aggregate suggested for __int128 is actually more like what GCC emitted
(above) for __m128i (though an array_type, not a structure_type).

Now, remember this from above?

() fn_128i: return value location: {0x50, 0} {0x93, 0x8} {0x51, 0} {0x93, 0x8}

That says it's in the first two 8-byte integer registers.  This is
actually correct for __int128 or for an actual aggregate of integers.
But it's not right for __m128i!

The actual return value convention for __m128i (integer vector type), as
easily seen in unoptimized assembly from the trivial example above, is
to use the vector registers, i.e. just %xmm0 for a plain __m128i return
value (or sole argument).  I take this to be the real intent of the ABI,
and I think the spec is just behind the compiler on this.  I assume this
should extend to the general algorithm for arguments and for aggregates
in both arguments and return values, i.e. classifying the integer vector
types as SSE{,UP} rather than INTEGER (see the ABI spec 3.2.3).

So, the devious Dutchman strikes back!  In the cases without the buggy
behavior from alley number two, GCC describes __m128i in DWARF the same
as long long int[2]--except for having DW_AT_GNU_vector.  That's indeed
how we have to distinguish whether it's returned in two integer
registers or in one vector register.  That was pretty devious, taking me
through all those hoops to find out that you were actually right about
DW_AT_GNU_vector support being the issue, albeit far from the only one.
But GDB doesn't do that (or even have a FIXME comment about __m128i).

So to get out of alley number four, draft a Dutchman and have him talk
to the x86-64 ABI folks to fix or clarify the spec to match what appears
to be reality in the compiler, and then have him talk to the GDB folks
about whether the return value (e.g. $ value after "finish") for a
function returning __m128i is correct.  For the spec, to fit in with its
current wording, it makes sense to talk about __m128i by name as a C
type, put it in the tables alongside __m128 and __int128 and so on.  One
will always construe this to mean "integer vector types" in general so
as to apply it to other languages and derive their actual sensible
ABIs--and (equivalent to that) so as to apply it to an implied
language-agnostic set of DWARF descriptions that could qualify for the
same classification in the ABI.

While we're here, on the way in there I noticed this other little alley
right next to it and, well, just step over here for a minute.  Remember
that -mavx for the "trivial example" above?  What's that about, you say?
(I know you care.  Really, you do.)

The -mavx switch is necessary to enable __m256i.
This type is like __m128i, only moreso (as you would expect).

 [    cf]    typedef
             name                 "__m256i"
             decl_file            4
             decl_line            43
             type                 [    da]
 [    da]    array_type
             GNU_vector           
             type                 [    5d]
             sibling              [    eb]
 [    e4]      subrange_type
               type                 [    45]
               upper_bound          3

cf:

 [    5d]    base_type
             byte_size            8
             encoding             signed (5)
             name                 "long long int"

So, as __m128i is to long long[2], __m256i is to long long[4].

The instructions that use this type are called Intel AVX.
The ABI spec mentions __m256 (which is like float[8]) and
AVX in particular.  But it doesn't exactly clear things up.

The old SSE registers like %xmm0 are actually the same thing as the low
half of the new, larger AVX registers, called %ymm0 (go Intel).  The ABI
does not give any new DWARF register numbers, so in locations you would
indicate %xmm0 and %ymm0 the same way and (I guess) rely on the type and
size of the particular access to know which you're really using.  The
ABI (3.2.1) says, "We use vector register to refer to either SSE or AVX
register."  Later (3.2.3) it talks about, "... types that fit into a
vector register."

If we were to follow the oblique combination of 3.2.1 and 3.2.3 then we
might think that since __m256 "fits into a vector register" if "vector
register" means "AVX register" (sensible when talking about 256-bit
types, which is the size of an AVX register), then an __m256 return
value would be in %ymm0.  But, just later in the part that actually
specifies the algorithm, it mentions __m256 by name and says it is
treated essentially like double[4].  If you follow the wording in the
rest of that section (about SSE and SSEUP), it's fairly clear this is
always talking about 16-byte %xmm0 et al.

However, what GCC actually does with -mavx is %ymm0.  I don't know if
there is or was any compiler mode and type that behaves like __m256 is
described in the ABI spec, i.e. 256 bits in %xmm0+%xmm1.  If there were,
then we can only hope that it would be distinguished in the DWARF by
appearing as an aggregate totalling 256 bits but without DW_AT_GNU_vector.
Those already qualify by the generic rules as going into %xmm0+%xmm1.

Though __m128i and __m256i are not mentioned at all in this ABI spec, in
a similar "oblique" way you could try to construe from just the first
part of 3.2.3 that since these integral types do not "fit into one of
the general purpose registers" but do "fit into a vector register" (the
latter only so if assuming AVX), they thus are SSE{,UP} class and so
land in %xmm0 and %ymm0 respectively.  But you certainly can't come
clearly to any such conclusion from following the ABI algorithm steps
directly laid out in the spec.

So grab that Dutchman on his way out of alley four and when he goes to
talk to the x86-64 ABI people have him discuss the disparity between
spec and compiler for __m256, and include __m256i and thorough AVX
clarifications in the whole the whole integer vector type subject.

Worn out yet?  But, wait!  There's more!  If you keep reading in the
next ten minutes, you get a bonus alley!  That's right, not one, not
two, not three, not four, not five, but six--count them--six blind
alleys for your hard-earned euro.  Way back in the "trivial example"
I included a few things like this:

	unsigned long arr_128[2] = {};

Those weren't part of your test case at all, but I wanted them for
comparison in the DWARF.  (That's how I figured out that subrange_type
but no byte_size is normal for array_type, which re-reading the spec a
few times confirmed moderately clearly too.  In the DWARF, the type of
arr_128 resembles __m128i exactly, just lacking DW_AT_GNU_vector.)

So what's with that initializer?  I should have just written:

	unsigned long arr_128[2];

and that's what I did.  But I was testing by just compiling this file as
I said, with "-c -g -mavx", and then using eu-readelf --debug-dump=info
on the .o file to look and also running "tests/funcretval -e file.o" to
test the new backend and libdw code.  

But the latter was mysteriously getting nothing, seeing no CUs.  It
turns out this was because dwfl_module_getdwarf fails, ultimately with
"relocation refers to undefined symbol".  That's because libdwfl is
relocating the .o's .debug_info to read it.  It hits:

  0x0000000000000240  X86_64_64       0x0000000000000010      +0 arr_128

in the place where the DW_AT_location for the "arr_128" DIE is giving
the variable's address.  The symbol is:

   17: 0000000000000010     16 OBJECT  GLOBAL DEFAULT   COMMON arr_128

The libdwfl code treats this like an UNDEF symbol, which means it will
try to look it up in other modules.  For "-e file.o" there are no other
modules, so it finds nothing and yields the error.  This behavior makes
some amount of sense in the usual case of libdwfl ET_REL handling, which
is for Linux .ko kernel modules.  In that context (-k/-K), there will be
other modules to resolve symbols in, and this works nicely for SHN_UNDEF
there.  

I think the kernel may actually be compiled with -fno-common now, so
SHN_COMMON might not appear there at all.  But if it does, then in that
context treating them like SHN_UNDEF may work in some cases.

This goes back to the origins of SHN_COMMON, long, long ago.  In fact,
it really came from ELF's predecessor a.out (N_COMMON).  There it was
originally intended for Fortran COMMON blocks, hence the name.  But in
long-standing Unix tradition, it was also used for uninitialized
variables (static/global) in C since it had the right implicit-zero
semantics when there is no initializer.

This led to the possibility of e.g. writing in foo.h:

	int foo;
	int foo2;

and #include "foo.h" would copy that into bar.o and baz.o,
while in foo.c:

	int foo = 1;

But no *.c has "int foo2 = 0;", it's implicit.

For Fortran COMMON blocks (whose syntax I don't know), this was more or
less the intended use.  But in C it just fell out that way so people
could do this.  It's long been considered gauche and people usually
write "extern int foo;" in foo.h plus "int foo2;" in foo.c instead.

But you could do it once and so you still can, and them's the semantics
of C by default.  There is -fno-common and it's a fine thing to use so
that SHN_COMMON never appears, assuming you aren't relying on the
semantics that many uninitialized definitions can coexist given at most
one initialized definition.  Of course, Fortran itself has since moved
on to using COMDAT groups for this, but ELF still has SHN_COMMON too.

So, you see that in this arcane and obsolescent case treating the
SHN_COMMON "foo" in bar.o as undefined would rightly find the foo.o
definition (which is in a real .data section, not marked SHN_COMMON).
OTOH, for "foo2" where all the options are SHN_COMMON, the right
behavior is to pick the first one.  The plain "-e file.o" is the
degenerate case of this.  That's what libdwfl is not handling well.

There is really no address to resolve an SHN_COMMON symbol to.
They're not allowed in kernel modules.  So this only comes up
with "-e file.o" kinds of uses, I suppose.  I've made libdwfl's
relocation treat "unresolved" SHN_COMMON symbols as 0 instead of
an error.

I'll admit that alley number six there really didn't have anything to do
with anything you raised at all.  It just came up in how I happened to
write and use my own little test case.  So I'll take the responsibility
for the consequences of my test case, and you can be responsible for
yours.  In English we call that, "Going Dutch."

So, I'll see your bug report and raise you two or three fixes, another
bug report (that you get to file on GCC), a dubious incidental
workaround for that, a spec clarification (which you could follow up
on), another bug report (that you get to file on GDB), another couple of
spec clarifications (and GDB might be wrong about that too), and an
entirely unrelated fix (which is completely on me, don't lift a finger!).
Take that to your dike and plug it.

In fact, all of those items are pretty low priority.  They don't really
need to be resolved until some use is actually affected, and not much
code ever uses these types, let alone code that people are using stap
$return or gdb finish on to notice a wrong register choice.  But they
each do deserve follow-up and not to get dropped on the floor.

1. GCC bug about missing array bounds with GNU_vector in libc binary
2. x86-64 ABI corrections/clarifications wrt vector types (256 and integer)
3. GDB bug(?) about __m128i

The only thing that has much priority right now is to test the new
elfutils code to make sure that it has no regressions (i.e. no changes
outside obscure vector cases) in $return resolution.  At least, however
much (little) you normally test that in stap, I suppose.


Thanks,
Roland
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]