This is the mail archive of the gdb@sourceware.cygnus.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

MMX: Messy Multimedia eXtensions



Intel has contracted with Cygnus to provide support for the MMX and
SSE registers in the GNU toolchain.  We've just finished the beta.
The work is on a branch at the moment, so it won't be showing up in
snapshots.  I'd like to explain what I did in GDB, and get folks'
criticisms and ideas for what we would be happy with in mainline GDB.
There are some exquisitely twisted problems here.

If someone wants to really scrutinize my code, then I can post diffs.

In GCC, you can now write code like this (toy example, not tested):

    /* This declares the type V4SF to be a vector of four single-precision
       floats, in a way that encourages GCC to map it onto the Pentium-III
       SSE registers.  */
    typedef int v4sf __attribute__ ((mode(V4SF)));

    /* Given a bunch of points (X[i], Y[i]), 0 <= i < N, rotate
       each one clockwise by ANGLE radians.  For simplicity's sake,
       N must be a multiple of four.  */
    void
    rotate (double angle, int n, float *x, float *y)
    {
      int i;

      /* Load all four slots of these with the sin and cos of angle.  */
      v4sf cos_angle = __builtin_ia32_setps1 (cos (angle));
      v4sf sin_angle = __builtin_ia32_setps1 (sin (angle));

      /* Rotate all the points, four at a time.  */
      for (i = 0; i < n; i += 4)
	{
	  /* new_x = cos (angle) * x - sin (angle) * y
	     new_y = sin (angle) * x + cos (angle) * y */
	  v4sf x4 = __builtin_ia32_loadaps (x + i);
	  v4sf y4 = __builtin_ia32_loadaps (y + i);
	  v4sf new_x4
	    = __builtin_ia32_subps (__builtin_ia32_mulps (cos_angle, x4),
				    __builtin_ia32_mulps (sin_angle, y4));
	  v4sf new_y4
	    = __builtin_ia32_addps (__builtin_ia32_mulps (sin_angle, x4),
				    __builtin_ia32_mulps (cos_angle, y4));

	  __builtin_ia32_storeaps (x + i, new_x4);
	  __builtin_ia32_storeaps (y + i, new_y4);
	}
    }

All the v4sf values get mapped onto SSE registers automatically, and
the __builtin_ia32_foo forms turn into single SSE instructions.  It's
very sexy.  (Automatic vectorization would be even sexier, of course,
but that's another day.)

In GDB, you can now debug code like this (real example):


    (gdb) break *0x0804846b
    Breakpoint 1 at 0x804846b: file sse-mandel.c, line 42.
    (gdb) run
    Starting program: /home/jimb/play/sse-mandel 

    Breakpoint 1, 0x804846b in iter.aligned () at sse-mandel.c:42
    (gdb) next
    iter.aligned () at sse-mandel.c:43
    (gdb) p count
    $1 = {f = {1, 1, 1, 1}}
    (gdb) p countadd
    $2 = {f = {0, 0, 0, 0}}
    (gdb) p countadd
    $3 = {f = {1, 1, 1, 1}}
    (gdb) p zx
    $4 = {f = {-2.5, -2.482337, -2.464674, -2.44701076}}
    (gdb) p zy
    $5 = {f = {-1.25, -1.25, -1.25, -1.25}}
    (gdb) p countadd
    $6 = {f = {1, 1, 1, 1}}
    (gdb) set countadd.f[1] = 0
    (gdb) p countadd
    $7 = {f = {1, 0, 1, 1}}
    (gdb) 

If you want to print SSE registers, you can:

    (gdb) p $xmm3
    $14 = {f = {-2.5, -2.482337, -2.464674, -2.44701076}}

You can print MMX registers, too, but it's messier, since GDB doesn't
know whether it's eight 8-bit values, four 16-bit values, et cetera:

    (gdb) p $mm2
    $1 = {v8qi = {f = "\001\000\001\000\001\000\001"}, v4hi = {f = {1,
      1, 1, 1}}, v2si = {f = {65537, 65537}}, uint64 = 281479271743489}

(Please ignore the fact that the eight 8-bit integers are printed as
characters.  I'm going to fix that.)

The SSE work is pretty uncontroversial.  I think there's basically one
right way to do this.  The only unusual step is to assign the
appropriate virtual type to the registers --- choose something like

	struct __builtin_v4sf { float f[4]; };

and everything just works.

The MMX arrangement, however, is controversial.  That's what I'd like
people's criticism and comments on.

There are eight MMX registers, 64 bits long each.  They're actually
not new registers --- they occupy the 64-bit mantissas of the eight
floating-point registers.  The MMX registers map to physical FP
registers; the correspondence is unaffected by the FPU's top-of-stack
register.

The interaction between the MMX instructions and the FPU is odd.
Whenever you read or write an MMX register, the processor sets the
FPU's TOS to zero, and marks all FP registers as "Valid".  That is,
the stack is now full.  If you write an MMX register, the processor
sets the corresponding FP register's upper 16 bits to 0xffff.  (I
think this is a quiet NaN.)

So, how should we represent the MMX registers in GDB's register file?
There are two basic approaches:
- Assign them register numbers separate from the FP stack registers'.
- Assign them the same numbers as the FP stack registers, and treat them as
  an alternative way of looking at the FP registers' mantissas.

The first approach has some problems.
- Do you assign the MMX registers a separate region of the register
  file as well?
  - If so, when your target-specific code writes back GDB register values to
    the inferior, which copy does it write --- the FP registers, or the
    MMX registers?
  - If the user assigns to an FP stack register, the corresponding MMX
    registers' contents must be updated.  Is that handled in
    architecture-specific code?  Via what interface to the
    architecture-independent code?  How can that interface be designed
    so that future hackers, perhaps innocent of the delights of the
    x86, won't break it?  Would *you* expect writing register 12 to
    affect the value, in GDB's register file, of register 42?

I think this approach is fundamentally wrong, because the register
file doesn't match reality.  There are not really two separate sets of
bits --- the FP mantissas and the MMX registers are the same object.
If our model doesn't reflect that, we're going to be perpetually
discovering bugs with no correct solution.  I hate that.

The second approach is the one I took.  The typing information
provided by the compiler tells GDB how to interpret the register's
bits anyway.  The only wrinkle is that the FP registers are
REGISTER_CONVERTIBLE, so REGISTER_CONVERT_TO_{VIRTUAL,RAW} need to
expect MMX types as well as FP types.  They simply memcpy them.
With this approach, the register file accurately reflects the reality:
there is only one set of bits.

To let people access the MMX registers using names like `$mm2', I
added a new thing, "register views".  Register views allow you see a
register's bits using different types, depending on the name you call
it.

When the parser sees `$FOO', after checking whether `FOO' is a
register name, it calls the architecture-defined macro
IS_REGISTER_VIEW_NAME.  This macro either returns -1, meaning that it
doesn't recognize the name, or a register view number.  The macros
REGISTER_VIEW_REGNO and REGISTER_VIEW_TYPE map this register view
number to an ordinary register number, and a type to apply to that
register.  So for the x86, we have register views named "mm0", "mm1",
and so on, for which REGISTER_VIEW_REGNO returns FP0_REGNUM,
FP0_REGNUM + 1, and so on, and for which REGISTER_VIEW_TYPE returns an
appropriate union type for MMX registers.  There is a new expression
op, OP_REGISTER_VIEW, which works much like OP_REGISTER, but uses
REGISTER_VIEW_TYPE insead of REGISTER_VIRTUAL_TYPE.

I think this concept is useful for other architectures, too.  You
could use register views to provide more helpful interpretations of
control registers.  For example, perhaps a new register view $ftos
could apply the type
  struct { :10; unsigned int tos:3 }
to $fstat, or $fprec could apply the type
  struct { :7; enum { single, reserved, double, extended } pc:2; }
to $fctrl.  Thus:

    (gdb) print $ftos
    $1 = 3
    (gdb) print $fprec
    $2 = extended

Or something like that.

But, getting back to the MMX registers...

The problem is, we're using the same register number for %mm0 and
%st(0), but %mm0 doesn't really correspond to %st(0).  It depends on
the value of the FPU TOS register.  However, every MMX instruction
does reset TOS to zero.  And you can't really mix FP and MMX code very
effectively; the processor's behavior (marking the stack as full;
resetting TOS) seems designed to prevent this, without actually losing
data.  So it's almost always right.

Another problem is, we've added an entirely new concept --- register
views --- which affects the parser and the evaluator.  But the changes
are simple and straightforward, and they could be useful on other
architectures, if you want to view a single register 

Still, though, it's not quite right.  All the information is available
to do the job perfectly --- we have the TOS in $fctrl and everything.
And for something which requires (even simple) changes to the parser,
expression evaluator, and everything else that touches expressions,
you'd like to get perfection.

The real obstacle is the assumption, pervasive in GDB, that each
distinct register is an independent part of the machine state.  This
makes it very difficult to implement a truly accurate solution.  I
don't really know how to work around that.

So, I'm interested in folks' opinions on the current support, and
ideas on how to do better.  How should we do this?

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]