This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFA] MIPS 24K Errata Patch


Catherine Moore <clm@codesourcery.com> writes:
> 	* gas/mips/24k-branch-delay-1.d: New.
> 	* gas/mips/24k-branch-delay-1.s: New.

It looks like this just contains mips1 code, so I think it should
be run for all architectures, rather than be forced to -march=24kfx.
This is important because -mfix-* options should work with generic ISAs;
they shouldn't be restricted to the corresponding -march=* option.

Also, please remove the -mmips:* options from the objdump command line.
They shouldn't be needed.

> 	* gas/mips/24k-triple-stores-1.d: New.
> 	* gas/mips/24k-triple-stores-1.s: New.

Likewise every mips32r2+ architecture.

> 	* gas/mips/24k-triple-stores-2.d: New.
> 	* gas/mips/24k-triple-stores-2.s: New.

mips2+ for this one.

> 	* gas/mips/24k-triple-stores-3.d: New.
> 	* gas/mips/24k-triple-stores-3.s: New.

mips3+ here.

> 	* gas/mips/24k-triple-stores-5.d: New.
> 	* gas/mips/24k-triple-stores-5.s: New.

mips1+ again here.

No -4? :-)  Not a big deal, but could you remember the highest-numbered
test to 4?

> 	* gas/mips/24k-triple-stores-6.d: New.
> 	* gas/mips/24k-triple-stores-6.s: New.
> 	* gas/mips/24k-triple-stores-7.d: New.
> 	* gas/mips/24k-triple-stores-7.s: New.

mips2+ for these two.

> 	* gas/mips/24k-triple-stores-8.d: New.
> 	* gas/mips/24k-triple-stores-8.s: New.

mips1+ here.

> 	* gas/mips/24k-triple-stores-9.d: New.
> 	* gas/mips/24k-triple-stores-9.s: New.

mips2+ here.

(Let me know if I've got some of the classifications wrong here.)

> +#define BASE_REG_EQ(INSN1, INSN2) 	\
> +  ((((INSN1) >> OP_SH_RS) & OP_MASK_RS) \
> +      == (((INSN2) >> OP_SH_RS) & OP_MASK_RS))
> +
> +/* Return the offset, if any, for this store instruction.  */
> +static int
> +fix_24k_offset (unsigned long opcode, const struct mips_opcode *mo)

Prevailing style is to have a blank line between the comment and
the function.

> +{
> +  if (!strcmp (mo->name, "swxc1")
> +      || !strcmp (mo->name, "suxc1")
> +      || !strcmp (mo->name, "sdxc1"))
> +    return 0;

Another style nitlet, but the nop insertion code uses
"strcmp (...) == 0".  That said...

> +  return (opcode >> OP_SH_IMMEDIATE) & OP_MASK_IMMEDIATE;

...if you're checking for OP_SH_IMMEDIATE, I think it'd be better
to use:

  if (strstr (mo->args, "o("))
    return (opcode >> OP_SH_IMMEDIATE) & OP_MASK_IMMEDIATE;
  /* The only other stores on the 24k have a register index.  */
  return 0;

I'm uneasy about the return 0 though; see below.

> +/* Return the the minimum alignment for this store instruction.  */
> +static int
> +fix_24k_align_to (const struct mips_opcode *mo)

Blank line after comment.  Typo: "the the".

> +{
> +  

Excess blank line here.

> +  if (!strcmp (mo->name, "sh"))
> +    return 2;
> +
> +  if (!strcmp (mo->name, "swc1")
> +      || !strcmp (mo->name, "swc2")
> +      || !strcmp (mo->name, "sw")
> +      || !strcmp (mo->name, "sc"))
> +    return 4;
> +
> +  if (!strcmp (mo->name, "sdc1")
> +       || !strcmp (mo->name, "sdc2"))
> +    return 8;
> +
> +  /* sb, swl, swr */
> +  return 1;

This doesn't handle s.s and s.d, which are direct aliases as well
as macros:

{"s.s",     "T,o(b)",	0xe4000000, 0xfc000000,	SM|RD_T|RD_b|FP_S,	0,		I1	}, /* swc1 */
{"s.s",     "T,A(b)",	0,    (int) M_SWC1_AB,	INSN_MACRO,		INSN2_M_FP_S,	I1	},

It looks like you'd wrongly (but conservatively) classify them as having
alignment 1.

Same strcmp == 0 comment.

> +/* 24K Errata: Lost Data on Stores During Refill. 
> +  
> +  Problem: The FSB (fetch store buffer) acts as an intermediate buffer
> +  for the data cache refills and store data. The following describes
> +  the scenario where the store data could be lost.
> +  
> +  * A data cache miss, due to either a load or a store, causing fill
> +    data to be supplied by the memory subsystem
> +  * The first three doublewords of fill data are returned and written
> +    into the cache
> +  * A sequence of four stores occurs in consecutive cycles around the
> +    final doubleword of the fill:
> +  * Store A
> +  * Store B
> +  * Store C
> +  * Zero, One or more instructions
> +  * Store D
> +  
> +  The four stores A-D must be to different doublewords of the line that
> +  is being filled. The fourth instruction in the sequence above permits
> +  the fill of the final doubleword to be transferred from the FSB into
> +  the cache. In the sequence above, the stores may be either integer
> +  (sb, sh, sw, swr, swl, sc) or coprocessor (swc1/swc2, sdc1/sdc2,
> +  swxc1, sdxc1, suxc1) stores, as long as the four stores are to
> +  different doublewords on the line. If the floating point unit is
> +  running in 1:2 mode, it is not possible to create the sequence above
> +  using only floating point store instructions.  In this case, the cache
> +  line being filled is incorrectly marked invalid, thereby losing the
> +  data from any store to the line that occurs between the original miss
> +  and the completion of the five cycle sequence shown above.

Nice explanation, thanks.  Please start a new paragraph at "In this case",
so that it's obvious you're talking about the whole errata situation,
rather than the FPU being in 1:2 mode.

> +  
> +  * Run the data cache in write-through mode.
> +  * Insert a non-store instruction between
> +    Store A and Store B or Store B and Store C.  */

Add something like

  The workarounds are:

above this.

> +static int
> +nops_for_24k (const struct mips_cl_insn *hist,
> +	      const struct mips_cl_insn *insn,
> +	      expressionS *address_expr)
> +{
> +
> +  struct store_info
> +    {
> +      /* Immediate offset, if any, for this store instruction.  */
> +      short off;
> +      /* Alignment required by this store instruction.  */
> +      int align_to;
> +    } pos[3], tmp;
> +
> +  int align, i, align_insn;
> +  int range;
> +  int new_off[3];
> +
> +  /* Three stores in a row are required to trigger the errata.  */
> +  if (!insn 
> +      || !(insn->insn_mo->pinfo & INSN_STORE_MEMORY)
> +      || !(hist[0].insn_mo->pinfo & INSN_STORE_MEMORY)
> +      || !(hist[1].insn_mo->pinfo & INSN_STORE_MEMORY))
> +    return 0;
> +
> +  if (frag_now != hist[0].frag
> +      || frag_now != hist[1].frag)
> +    return 1;

I don't think this is correct.  If INSN is null, the function is
supposed to assume the worst for INSN, not the best.

I also don't understand the frag checks are doing.  If we reach a point
at which we can no longer be sure that the sequence is consecutive,
the generic nop infrastructure is supposed to insert nops for the worst
case (see mips_emit_delays).  So when do we see (and care about) two
stores in different frags?

I'd have expected something like:

  /* If INSN is definitely not a store, there's nothing to worry about.  */
  if (insn && (insn->insn_mo->pinfo & INSN_STORE_MEMORY) == 0)
    return 0;

  /* Likewise if the previous instruction wasn't a store.  */
  if ((hist[0].insn_mo->pinfo & INSN_STORE_MEMORY) == 0)
    return 0;

  /* If we don't know what came before the store, assume the worst.  */
  if (hist[1].frag == NULL)
    return 1;

  /* If the instruction was not a store, there's nothing to worry about.  */
  if (hist[1].insn_mo->pinfo & INSN_STORE_MEMORY) == 0)
    return 0;

although I haven't tested it.

I think there should be tests for things like:

	.text
        sb      $2,0($8)
        .data
        .word   1
        .text
        sb      $3,8($8)
        .data
        .word   1
        .text
        sb      $4,16($8)

and:

	.text
        sb      $2,0($8)
        sb      $3,8($8)
        .data
        .word   1
        .text
        sb      $4,16($8)

to cover this situation.  It's OK to be conservatively correct and
insert nops where they're not needed.

It would also be nice to have checks in which some individual stores
are wrapped in ".set noreorder"...".set reorder".

Then, following on from the above:

> +  if (!BASE_REG_EQ (insn->insn_opcode, hist[0].insn_opcode)
> +      || !BASE_REG_EQ (insn->insn_opcode, hist[1].insn_opcode))
> +    return 1;

  /* If we don't know the relationship between the store addresses,
     assume the worst.  */
  if (insn == NULL
      || !BASE_REG_EQ (insn->insn_opcode, hist[0].insn_opcode)
      || !BASE_REG_EQ (insn->insn_opcode, hist[1].insn_opcode))
    return 1;

> +  if (!address_expr)
> +    return 1;
> +
> +  pos[0].off = address_expr->X_add_number;
> +  pos[1].off = fix_24k_offset (hist[0].insn_opcode, hist[0].insn_mo);
> +  pos[2].off = fix_24k_offset (hist[1].insn_opcode, hist[1].insn_mo);

I think this ignores situations like:

        sw      $2,%gprel(s1)($28)
        sw	$3,%gprel(s2)($28)
        sw	$4,%gprel(s3)($28)

where the in-place offsets will all be 0.  I think the code will
assume that there is no problem, whereas there could be for certain
offsets of s1, s2 and s3.  It would be nice to have a test for this too.

For INSN, I think we should move the:

      if (address_expr->X_op == O_constant)
	{
	  unsigned int tmp;

	  switch (*reloc_type)

block of append_insn so that it runs before:

  if (mips_relax.sequence != 2 && !mips_opts.noreorder)
    {
      /* There are a lot of optimizations we could do that we don't.
	 In particular, we do not, in general, reorder instructions.
	 If you use gcc with optimization, it will reorder
	 instructions and generally do much more optimization then we
	 do here; repeating all that work in the assembler would only
	 benefit hand written assembly code, and does not seem worth
	 it.  */
      int nops = (mips_optimize == 0
		  ? nops_for_insn (history, NULL)
		  : nops_for_insn_or_target (history, ip));

We could then add a:

  /* True if this instruction is complete.  */
  unsigned int complete_p : 1;

field to mips_cl_insn.  Adding the associated code to the append_insn
function I mentioned above gives:

  if (address_expr == NULL)
    ip->complete_p = 1;
  else if (*reloc_type <= BFD_RELOC_UNUSED
           && address_expr->X_op == O_constant)
    {
      unsigned int tmp;

      switch (*reloc_type)
      ...
      ip->complete_p = 1;
    }

We could then check complete_p for all three instructions and return
1 if any of them is false.  We could also get the offset from INSN
in the same way as for the others.  There would be (and should be)
no need to pass address_expr around.

I think anything that makes the handling eqiuvalent for all three
insns (rather than treating INSN as a special case) is likely to
be more robust.

Also, doesn't this code (incorrectly?) assume that two register-indexed
stores followed by an offset store will overlap?  fix_24k_offset returns
0 for register-indexed stores, which we'll then treat as having the
same offset.

> +  /* Sort pos from smallest to largest.  */
> +  if (pos[1].off < pos[0].off)
> +    {
> +      tmp = pos[0];
> +      pos[0] = pos[1];
> +      pos[1] = tmp;
> +    }
> +  if (pos[2].off < pos[1].off)
> +    {
> +      tmp = pos[2];
> +      pos[2] = pos[1];
> +      pos[1] = tmp;
> +
> +      if (pos[1].off < pos[0].off)
> +	{
> +	  tmp = pos[0];
> +	  pos[0] = pos[1];
> +	  pos[1] = tmp;
> +	}
> +    }

It might sound like overkill, but please use qsort here.

> +  /* Do a quick check to see if the range is more than 32 bytes.
> +     If so, we are definately not on the same cache line.
> +     There may be a chance that range after alignment adjustment
> +     is >= 32, so we recheck this again later on.  */
> +
> +  if (pos[2].off - pos[0].off >= 32)
> +    return 0;

And given my previous comment, I think this micro optimisation isn't
necessary.

> +  /* Check for different double-words.  We also check for
> +     corner cases (including unaligned addresses).
> +     Depending on the type of store, a max distance of
> +     (pos[2] - pos[0]) < X will guarantee one double-word
> +     overlap.  */
> +		  
> +  align_insn = 0;
> +
> +  /* The value of X depends on the insn at pos[i] for the
> +     alignment, where i is the widest insn of store type.  */
> +
> +  if (((insn->insn_opcode >> OP_SH_RS) & OP_MASK_RS) == SP)
> +    align = 8;
> +  else
> +    {
> +      align = 1;
> +      for (i = 0; i < 3; i++)
> +	{
> +	  if (align < pos[i].align_to)
> +	    {
> +	      align = pos[i].align_to;
> +	      align_insn = i;
> +	    }
> +	}
> +    }
> +
> +  /* Align everything using align_insn's alignment.
> +     1. Change align_insn's offset to 0.
> +     2. Align all insns to "align".  */
> +
> +  new_off[0] = pos[0].off;
> +  new_off[1] = pos[1].off;
> +  new_off[2] = pos[2].off;
> +
> +  if (((insn->insn_opcode >> OP_SH_RS) & OP_MASK_RS) != SP)
> +    {
> +      /* Take care of unaligned offsets.  */
> +      new_off[0] -= pos[align_insn].off;
> +      new_off[1] -= pos[align_insn].off;
> +      new_off[2] -= pos[align_insn].off;
> +    }

I found this a bit confusing.  How about:

  /* Pick a value of ALIGN and X such that all offsets are adjusted by
     X bytes and such that the base register + X is known to be aligned
     to ALIGN bytes.  */
  if (((insn->insn_opcode >> OP_SH_RS) & OP_MASK_RS) == SP)
    align = 8;
  else
    {
      align = pos[0].align_to;
      base_offset = pos[0].off;
      for (i = 1; i < 3; i++)
	if (align < pos[i].align_to)
          {
            align = pos[i].align_to;
            base_offset = pos[i].off;
          }
      for (i = 0; i < 3; i++)
        pos[i].off -= base_offset;
    }

(and get rid of new_off).  Might be a case of picking your poison though. :-)

> +  /* Determine max range using align_insn's offset.  */
> +  range = 8 + align;
> +  new_off[0] &= ~align + 1;
> +  new_off[1] &= ~align + 1;
> +  new_off[2] &= ~align + 1;

  /* Divide the offset range into ALIGN-byte chunks.  Represent each
     offset by the first byte in its chunk.  */
  pos[0].off &= ~align + 1;
  pos[1].off &= ~align + 1;
  pos[2].off &= ~align + 1;

> +  if (new_off[0] == new_off[1]
> +      || new_off[0] == new_off[2]
> +      || new_off[1] == new_off[2]
> +      || new_off[2] - new_off[0] < range
> +      || new_off[2] - new_off[1] >= 24
> +      || new_off[1] - new_off[0] >= 24
> +      || new_off[2] - new_off[0] >= 32)
> +   return 0;

Given the masking that you've done, isn't:

      || new_off[2] - new_off[0] < range

equivalent to:

      || new_off[2] - new_off[0] <= 8

?  The latter seems more readable to me.  So:

  /* If any two stores write to the same chunk, they also write to the
     same doublewod.  The offsets are still sorted at this point.  */
  if (pos[0].off == pos[1].off || pos[1].off == pos[2].off)
    return 0;

  /* A range of at least 9 bytes is needed for the stores to be in
     non-overlapping doublewords.  */
  if (pos[2].off - pos[0].off <= 8)
    return 0;

The last case is interesting.  Does the errata still trigger if
a suxc1 instruction straddles a cache boundary?  E.g., playing
battleships for a moment:

    0 2 4 6
  0 ........
  8 ........
 10 ........
 18 ....XXXX   suxc1 to 0x18
 20 XXXX....
 28 ........
 30 ..XX....   sh to 0x32
 38 ......XX   sh to 0x38

It looks like the range checks will return 0 in this case, even though
all three stores write to different doublewords of the second cache line.

The answer might well be "yes, the errata doesn't trigger for this case".
If so, it'd be worth mentioning that in the main comment.  The answer
affects what the comments above the remaining 24 and 32 checks should be.

Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]