This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 4/5 v4] Tracepoint for ppc64.


Wei-cheng Wang wrote:

> Ulrich Weigand wrote:
> > Well, but we know the logic the stub uses.  For example, we know that
> > we certainly cannot install a fast tracepoint in any shared library code,
> > since the jump pad will definitely be too far away.  We can check for
> > this condition here.  (We could also check for tracepoints in executables
> > that have a text section larger than 32 MB ...)
> 
> Now in this path, ppc_fast_tracepoint_valid_at will check the distance
> and return 0 (invalid) if ADDR is too far from jumppad.
> 
> However, if a tracepoint was pending and later found it's not valid,
> it will cause an internal-error.  See remote.c
> 
>   if (gdbarch_fast_tracepoint_valid_at (target_gdbarch (),
> 					tpaddr, &isize, NULL))
>     xsnprintf (buf + strlen (buf), BUF_SIZE - strlen (buf), ":F%x",
> 	       isize);
>   else
>     /* If it passed validation at definition but fails now,
>        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>        something is very wrong.  */
>     internal_error (__FILE__, __LINE__, _("Fast tracepoint not "
> 					  "valid during download"));
> 
> If the tracepoint is pending at definition, it won't be checked at all.
> 
>   /* Fast tracepoints may have additional restrictions on location.  */
>   if (!pending && type_wanted == bp_fast_tracepoint)
>       ^^^^^^^^
>     {
>       for (ix = 0; VEC_iterate (linespec_sals, canonical.sals, ix, iter); ++ix)
> 	check_fast_tracepoint_sals (gdbarch, &iter->sals);
>     }
> 
> Maybe we use "error" instead of "internal_error"?

Huh.  Shouldn't the location be re-checked in breakpoint.c once the
tracepoint is re-set from "pending" to active?  It seems there's a
call to gdbarch_fast_tracepoint_valid_at missing at that point.
(If the gdbarch_fast_tracepoint_valid_at fails then, we probably
should emit an error and either leave the tracepoint pending or
else convert it to a regular tracepoint at that point.)

> A new function (gdbserver target op), ppc_get_thread_area, is implemented
> used for the jump pad lock object (collecting_t). Please have a look.

This looks fine to me.

> Changing timeout in tspeed.exp is reverted in this patch.  Recently I
> tested the patch on gcc112 server and timeout should be 540 instead of 360
> in order to pass tspeed.exp.  I am not sure what's the proper value for timeout.
> Maybe someone will test it in another environment, and 540 is not enough again.

OK.  Do we know why the test case is taking so long?  Is this simply to be
expected, or is the slowness due to some hidden issue that should be fixed?  

> BTW, in this patch, unlock is implemented by simply writing 0 instead
> of a full atomic-swap.  It that ok?
> 
>   /* Prepare collecting_t object for lock.  */
>   p += GEN_STORE (p, 3, 1, min_frame + 37 * rsz);
>   p += GEN_STORE (p, tp_reg, 1, min_frame + 38 *rsz);
>   /* Set R5 to collecting object.  */
>   p += GEN_ADDI (p, 5, 1, 37 * rsz);
> 
>   p += gen_atomic_xchg (p, lockaddr, 0, 5);
> 
>   /* Call to collector.  */
>   p += gen_call (p, collector);
> 
>   /* Simply write 0 to release the lock.  */
>   p += gen_limm (p, 3, lockaddr);
>   p += gen_limm (p, 4, 0);
>   p += GEN_LWSYNC (p);
>   p += GEN_STORE (p, 4, 3, 0);

This should be OK.  However, we need to take care to use appropriate
syncs when taking the lock as well.

> +  /*
> +  1: lwsync
> +  2: lwarx   TMP, 0, LOCK
> +     cmpwi   TMP, OLD
> +     bne     1b
> +     stwcx.  NEW, 0, LOCK
> +     bne     2b */

In particular, in order to guarantee proper aqcuire semantics, there
needs to be an lwsync (or isync) *after* the loop here.

On the other hand, you do not need to loop back to the lwsync at the
beginning.  In fact, for a normal lock-acquire sequence you would not
need the lwsync before the loop at all.

However, in this specific case, since the lock value is itself a
pointer to a data structure, you *do* want to have a lwsync between
storing that data structure and taking the lock.

It's probably best to move the lwsync out of this subroutine and
place the two lwsyncs in the caller.


Also, a couple of comments from an earlier review still apply:

> +#if __PPC64__

Please use #ifdef __powerpc64__ throughout.

> +static int
> +gen_limm (uint32_t *buf, int reg, uint64_t imm)
> +{
> +  uint32_t *p = buf;
> +
> +  if ((imm + 32768) < 65536)
> +    {
> +      /* li	reg, imm[7:0] */
> +      p += GEN_LI (p, reg, imm);
> +    }
> +  else if ((imm >> 16) == 0)
> +    {
> +      /* li	reg, 0
> +	 ori	reg, reg, imm[15:0] */
> +      p += GEN_LI (p, reg, 0);
> +      p += GEN_ORI (p, reg, reg, imm);
> +    }
> +  else if ((imm >> 32) == 0)
> +    {
> +      /* lis	reg, imm[31:16]
> +	 ori	reg, reg, imm[15:0] */
> +      p += GEN_LIS (p, reg, (imm >> 16) & 0xffff);
> +      p += GEN_ORI (p, reg, reg, imm & 0xffff);

This is still incorrect if bit 31 of imm is set.
(The lis will fill the 32 high bits of the register
with 1 instead 0 in that case.)

> +static void
> +ppc_relocate_instruction (CORE_ADDR *to, CORE_ADDR oldloc)
> +{
[...]
> +      else if ((PPC_BO (insn) & 0x14) == 0x4 || (PPC_BO (insn) & 0x14) == 0x10)
> +	{
> +	  newrel -= 4;
> +
> +	  /* Out of range. Cannot relocate instruction.  */
> +	  if (newrel >= (1 << 25) || newrel < -(1 << 25))
> +	    return;
> +
> +	  if ((PPC_BO (insn) & 0x14) == 0x4)
> +	    insn ^= (1 << 24);
> +	  else if ((PPC_BO (insn) & 0x14) == 0x10)
> +	    insn ^= (1 << 22);
> +
> +	  /* Jump over the unconditional branch.  */
> +	  insn = (insn & ~0xfffc) | 0x8;
> +	  write_inferior_memory (*to, (unsigned char *) &insn, 4);
> +	  *to += 4;
> +
> +	  /* Build a unconditional branch and copy LK bit.  */
> +	  insn = (18 << 26) | (0x3fffffc & newrel) | (insn & 0x3);
> +	  write_inferior_memory (*to, (unsigned char *) &insn, 4);
> +	  *to += 4;
> +
> +	  return;
> +	}
> +      else

This should check for (PPC_BO (insn) & 0x14) == 0.
Note that (PPC_BO (insn) & 0x14) == 0x14 is the "branch always"
case, which should be replaced simply by an unconditional branch.

> +	{
> +	  uint32_t bdnz_insn = (16 << 26) | (0x10 << 21) | 12;
> +	  uint32_t bf_insn = (16 << 26) | (0x4 << 21) | 8;
> +
> +	  newrel -= 12;

That should be -= 8, shouldn't it?

> +static void
> +ppc64_emit_stack_adjust (int n)
> +{
> +  uint32_t buf[4];
> +  uint32_t *p = buf;
> +
> +  n = n << 3;
> +  if ((n >> 7) != 0)
> +    {
> +      emit_error = 1;
> +      return;
> +    }
> +
> +  p += GEN_ADDI (p, 30, 30, n);

ADDI supports up to 15-bit unsigned operands,
so this seems overly strict.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]