This is the mail archive of the
gdb@sourceware.cygnus.com
mailing list for the GDB project.
MMX: Messy Multimedia eXtensions
- To: gdb at sourceware dot cygnus dot com
- Subject: MMX: Messy Multimedia eXtensions
- From: Jim Blandy <jimb at cygnus dot com>
- Date: Tue, 9 Nov 1999 02:06:17 -0500 (EST)
Intel has contracted with Cygnus to provide support for the MMX and
SSE registers in the GNU toolchain. We've just finished the beta.
The work is on a branch at the moment, so it won't be showing up in
snapshots. I'd like to explain what I did in GDB, and get folks'
criticisms and ideas for what we would be happy with in mainline GDB.
There are some exquisitely twisted problems here.
If someone wants to really scrutinize my code, then I can post diffs.
In GCC, you can now write code like this (toy example, not tested):
/* This declares the type V4SF to be a vector of four single-precision
floats, in a way that encourages GCC to map it onto the Pentium-III
SSE registers. */
typedef int v4sf __attribute__ ((mode(V4SF)));
/* Given a bunch of points (X[i], Y[i]), 0 <= i < N, rotate
each one clockwise by ANGLE radians. For simplicity's sake,
N must be a multiple of four. */
void
rotate (double angle, int n, float *x, float *y)
{
int i;
/* Load all four slots of these with the sin and cos of angle. */
v4sf cos_angle = __builtin_ia32_setps1 (cos (angle));
v4sf sin_angle = __builtin_ia32_setps1 (sin (angle));
/* Rotate all the points, four at a time. */
for (i = 0; i < n; i += 4)
{
/* new_x = cos (angle) * x - sin (angle) * y
new_y = sin (angle) * x + cos (angle) * y */
v4sf x4 = __builtin_ia32_loadaps (x + i);
v4sf y4 = __builtin_ia32_loadaps (y + i);
v4sf new_x4
= __builtin_ia32_subps (__builtin_ia32_mulps (cos_angle, x4),
__builtin_ia32_mulps (sin_angle, y4));
v4sf new_y4
= __builtin_ia32_addps (__builtin_ia32_mulps (sin_angle, x4),
__builtin_ia32_mulps (cos_angle, y4));
__builtin_ia32_storeaps (x + i, new_x4);
__builtin_ia32_storeaps (y + i, new_y4);
}
}
All the v4sf values get mapped onto SSE registers automatically, and
the __builtin_ia32_foo forms turn into single SSE instructions. It's
very sexy. (Automatic vectorization would be even sexier, of course,
but that's another day.)
In GDB, you can now debug code like this (real example):
(gdb) break *0x0804846b
Breakpoint 1 at 0x804846b: file sse-mandel.c, line 42.
(gdb) run
Starting program: /home/jimb/play/sse-mandel
Breakpoint 1, 0x804846b in iter.aligned () at sse-mandel.c:42
(gdb) next
iter.aligned () at sse-mandel.c:43
(gdb) p count
$1 = {f = {1, 1, 1, 1}}
(gdb) p countadd
$2 = {f = {0, 0, 0, 0}}
(gdb) p countadd
$3 = {f = {1, 1, 1, 1}}
(gdb) p zx
$4 = {f = {-2.5, -2.482337, -2.464674, -2.44701076}}
(gdb) p zy
$5 = {f = {-1.25, -1.25, -1.25, -1.25}}
(gdb) p countadd
$6 = {f = {1, 1, 1, 1}}
(gdb) set countadd.f[1] = 0
(gdb) p countadd
$7 = {f = {1, 0, 1, 1}}
(gdb)
If you want to print SSE registers, you can:
(gdb) p $xmm3
$14 = {f = {-2.5, -2.482337, -2.464674, -2.44701076}}
You can print MMX registers, too, but it's messier, since GDB doesn't
know whether it's eight 8-bit values, four 16-bit values, et cetera:
(gdb) p $mm2
$1 = {v8qi = {f = "\001\000\001\000\001\000\001"}, v4hi = {f = {1,
1, 1, 1}}, v2si = {f = {65537, 65537}}, uint64 = 281479271743489}
(Please ignore the fact that the eight 8-bit integers are printed as
characters. I'm going to fix that.)
The SSE work is pretty uncontroversial. I think there's basically one
right way to do this. The only unusual step is to assign the
appropriate virtual type to the registers --- choose something like
struct __builtin_v4sf { float f[4]; };
and everything just works.
The MMX arrangement, however, is controversial. That's what I'd like
people's criticism and comments on.
There are eight MMX registers, 64 bits long each. They're actually
not new registers --- they occupy the 64-bit mantissas of the eight
floating-point registers. The MMX registers map to physical FP
registers; the correspondence is unaffected by the FPU's top-of-stack
register.
The interaction between the MMX instructions and the FPU is odd.
Whenever you read or write an MMX register, the processor sets the
FPU's TOS to zero, and marks all FP registers as "Valid". That is,
the stack is now full. If you write an MMX register, the processor
sets the corresponding FP register's upper 16 bits to 0xffff. (I
think this is a quiet NaN.)
So, how should we represent the MMX registers in GDB's register file?
There are two basic approaches:
- Assign them register numbers separate from the FP stack registers'.
- Assign them the same numbers as the FP stack registers, and treat them as
an alternative way of looking at the FP registers' mantissas.
The first approach has some problems.
- Do you assign the MMX registers a separate region of the register
file as well?
- If so, when your target-specific code writes back GDB register values to
the inferior, which copy does it write --- the FP registers, or the
MMX registers?
- If the user assigns to an FP stack register, the corresponding MMX
registers' contents must be updated. Is that handled in
architecture-specific code? Via what interface to the
architecture-independent code? How can that interface be designed
so that future hackers, perhaps innocent of the delights of the
x86, won't break it? Would *you* expect writing register 12 to
affect the value, in GDB's register file, of register 42?
I think this approach is fundamentally wrong, because the register
file doesn't match reality. There are not really two separate sets of
bits --- the FP mantissas and the MMX registers are the same object.
If our model doesn't reflect that, we're going to be perpetually
discovering bugs with no correct solution. I hate that.
The second approach is the one I took. The typing information
provided by the compiler tells GDB how to interpret the register's
bits anyway. The only wrinkle is that the FP registers are
REGISTER_CONVERTIBLE, so REGISTER_CONVERT_TO_{VIRTUAL,RAW} need to
expect MMX types as well as FP types. They simply memcpy them.
With this approach, the register file accurately reflects the reality:
there is only one set of bits.
To let people access the MMX registers using names like `$mm2', I
added a new thing, "register views". Register views allow you see a
register's bits using different types, depending on the name you call
it.
When the parser sees `$FOO', after checking whether `FOO' is a
register name, it calls the architecture-defined macro
IS_REGISTER_VIEW_NAME. This macro either returns -1, meaning that it
doesn't recognize the name, or a register view number. The macros
REGISTER_VIEW_REGNO and REGISTER_VIEW_TYPE map this register view
number to an ordinary register number, and a type to apply to that
register. So for the x86, we have register views named "mm0", "mm1",
and so on, for which REGISTER_VIEW_REGNO returns FP0_REGNUM,
FP0_REGNUM + 1, and so on, and for which REGISTER_VIEW_TYPE returns an
appropriate union type for MMX registers. There is a new expression
op, OP_REGISTER_VIEW, which works much like OP_REGISTER, but uses
REGISTER_VIEW_TYPE insead of REGISTER_VIRTUAL_TYPE.
I think this concept is useful for other architectures, too. You
could use register views to provide more helpful interpretations of
control registers. For example, perhaps a new register view $ftos
could apply the type
struct { :10; unsigned int tos:3 }
to $fstat, or $fprec could apply the type
struct { :7; enum { single, reserved, double, extended } pc:2; }
to $fctrl. Thus:
(gdb) print $ftos
$1 = 3
(gdb) print $fprec
$2 = extended
Or something like that.
But, getting back to the MMX registers...
The problem is, we're using the same register number for %mm0 and
%st(0), but %mm0 doesn't really correspond to %st(0). It depends on
the value of the FPU TOS register. However, every MMX instruction
does reset TOS to zero. And you can't really mix FP and MMX code very
effectively; the processor's behavior (marking the stack as full;
resetting TOS) seems designed to prevent this, without actually losing
data. So it's almost always right.
Another problem is, we've added an entirely new concept --- register
views --- which affects the parser and the evaluator. But the changes
are simple and straightforward, and they could be useful on other
architectures, if you want to view a single register
Still, though, it's not quite right. All the information is available
to do the job perfectly --- we have the TOS in $fctrl and everything.
And for something which requires (even simple) changes to the parser,
expression evaluator, and everything else that touches expressions,
you'd like to get perfection.
The real obstacle is the assumption, pervasive in GDB, that each
distinct register is an independent part of the machine state. This
makes it very difficult to implement a truly accurate solution. I
don't really know how to work around that.
So, I'm interested in folks' opinions on the current support, and
ideas on how to do better. How should we do this?