This is the mail archive of the gdb@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Moribund breakpoints and hardware single-step

From: Frederic Riss <frederic dot riss at gmail dot com>
To: gdb at sourceware dot org
Date: Mon, 2 May 2011 09:30:27 +0200
Subject: Re: Moribund breakpoints and hardware single-step
References: <BANLkTinOzBkkJfmM5zwy2vTykEt+bQ+p3g@mail.gmail.com>

On 28 April 2011 18:26, Frederic Riss <frederic.riss@gmail.com> wrote:
> Hi,
>
> I just debugged a very interesting problem in the moribund breakpoints
> machinery. First I'm working on sources that must be ~ 2 months old. I
> haven't had time upgrading, but from looking at the diff, the current
> GDB master should be subject to the same behavior.

For the record, I reproduced this issue on HEAD with x86_64. It's
quite easy to reproduce, as every 1-byte instruction with cumulative
side-effects (eg a pushd) is a candidate reproducer:

-------------------8<-----------------------------------8<----------------------------
$ gcc -g -fno-omit-frame-pointer ../gdb/testsuite/gdb.base/recurse.c
$ ./gdb/gdbserver/gdbserver :10000 ./a.out &
[1] 30543
Process ./a.out created; pid = 30548
Listening on port 10000
$ ./gdb/gdb a.out --silent
Reading symbols from /tmp/gdb/build/a.out...done.
(gdb) set target-async
(gdb) set non-stop
(gdb) tar extended-remote :10000
Remote debugging using :10000
Remote debugging from host 127.0.0.1
[New Thread 30548.30548]
(gdb)
[Thread 30548.30548] #1 stopped.
0x0000003eac400b20 in ?? ()
tb *main
Temporary breakpoint 1 at 0x4004b9: file
../gdb/testsuite/gdb.base/recurse.c, line 24.
(gdb) c
Continuing.

Temporary breakpoint 1, main () at ../gdb/testsuite/gdb.base/recurse.c:24
24	{
(gdb) s
main () at ../gdb/testsuite/gdb.base/recurse.c:29
29	  recurse (10);
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
-------------------8<-----------------------------------8<----------------------------

What happened is that the pushd instruction at the start of main was
executed a bunch of times by the 'step' command because of the
behavior described bellow, thus corrupting the stack. I kept the full
description bellow:

> The target is in async + non-stop mode and uses displaced stepping.
> When stepping into a function. infrun.c:handle_step_into_function()
> inserts a step_resume breakpoint at the end of the prologue and
> resumes execution. When the breakpoint is hit, it is removed from the
> target and from the breakpoint list and is remembered in the moribund
> breakpoints list for a bit. We have the current PC that points to the
> location of the moribund breakpoint, and we try to step further. GDB
> asks the target to step one instruction and gets the hand back.
> Currently if the size of an instruction equals decr_pc_after_break(),
> infrun.c:adjust_pc_after_break() will consider that the target hit the
> moribund breakpoint and reset the PC to the breakpoint address, thus
> executing again and again the same instruction until the breakpoint is
> ripped off the moribund list.
>
> The issue is quite serious as it breaks the inferior behavior (it will
> go unnoticed if the instruction being repeatedly stepped has always
> the same side effect, but $r0 = $r0 + 1 will become $r0 = $r0 + 3 *
> (thread_count () + 1) )
>
> The comment in adjust_pc_after_break reads:
>
> ? ? ?/* When using hardware single-step, a SIGTRAP is reported for both
> ? ? ? ? a completed single-step and a software breakpoint. ?Need to
> ? ? ? ? differentiate between the two, as the latter needs adjusting
> ? ? ? ? but the former does not.
>
> ? ? ? ? The SIGTRAP can be due to a completed hardware single-step only if
> ? ? ? ? ?- we didn't insert software single-step breakpoints
> ? ? ? ? ?- the thread to be examined is still the current thread
> ? ? ? ? ?- this thread is currently being stepped
>
> ? ? ? ? If any of these events did not occur, we must have stopped due
> ? ? ? ? to hitting a software breakpoint, and have to back up to the
> ? ? ? ? breakpoint address.
>
> ? ? ? ? As a special case, we could have hardware single-stepped a
> ? ? ? ? software breakpoint. ?In this case (prev_pc == breakpoint_pc),
> ? ? ? ? we also need to back up to the breakpoint address. ?*/
>
> It's the last special case here that bites. I 'fixed' that in my tree
> with the following simple patch:
>
> @@ -2941,7 +2884,8 @@ adjust_pc_after_break (struct
> execution_control_state *ecs)
> ? ? ? if (singlestep_breakpoints_inserted_p
> ? ? ? ? ?|| !ptid_equal (ecs->ptid, inferior_ptid)
> ? ? ? ? ?|| !currently_stepping (ecs->event_thread)
> - ? ? ? ? || ecs->event_thread->prev_pc == breakpoint_pc)
> + ? ? ? ? || (software_breakpoint_inserted_here_p (aspace, breakpoint_pc)
> + ? ? ? ? ? ? && ecs->event_thread->prev_pc == breakpoint_pc))
> ? ? ? ?regcache_write_pc (regcache, breakpoint_pc);
>
> ? ? ? if (RECORD_IS_USED)
>
> The patch is based on the fact that we won't ever hardware single-step
> a moribund-breakpoint. However, I'm not sure this assertion always
> holds, and I'm a bit nervous that there might be some other cases that
> lead to the same kind of behavior. What do you think?
>
> As an aside, why do we use a step-resume breakpoint when stepping into
> a function? In these days of massive multi-threading, wouldn't it be
> much better to just change the thread's stepping range to avoid other
> threads hitting the temporary breakpoint ?
>
> Regards,
> Fred
>

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]