This is the mail archive of the archer@sourceware.org mailing list for the Archer project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

gdb-6.7-bz233852-attach-signalled-fix.patch


I hope one of the first orders of business, before the fancy stuff,
will be to feed all the work in the Fedora gdb that's not upstream
through the group's review and reconcile all that work into a branch
in the Archer project and unify the tracks of pushing fixes upstream.
Jan has been doing excellent work, but has mostly been out there alone
as far getting more eyeballs on the code and good review feedback.

Once the usage of git gets on its feet, I think it will make sense to have
a "reviewed fixes ready for upstream" branch that forks from upstream and
is what all the Archer project branches for bigger and experimental items
fork from.  Fedora's gdb will use this branch, and use ad hoc rpm patches
only for building quick fixes pending review.

To wit, I came across something I wasn't expecting in GDB today
and it turned out to be code in a Fedora patch that's not upstream.
I'm using gdb-6.8-11.fc9.x86_64, and the code in question comes from
gdb-6.7-bz233852-attach-signalled-fix.patch.

This came up when using the crash-suspend utrace module, which I was
dusting off and updating today.  (It's been months since I tried it,
and a much earlier gdb.)  The module creates the situation where a
process is suspended but has a fatal pending and unblocked, so SIGCONT
(fg) makes it wake up and crash immediately.

It's otherwise impossible to create this situation without using races.
So it's easier just to talk about the code than to show a test case
that one can try without a very special new kernel.

What surprised me was "Redelivering pending <signal desc>", a message I
never saw from gdb before.  This comes from Jan's new code in the patch,
which is trying to clean up the SIGSTOP/SIGCONT mess you can't avoid
creating when attaching ptrace to a process that might be stopped.

Actually, what surprised me first was that utrace and crash-suspend had
actually started working right.  I used strace on gdb to see what bad thing
the kernel was doing to provoke it.  But lo and behold, it wasn't my fault,
for a change!  ;-)

The interesting snippet is this:

     write(1, "Attaching to process 2600\n", 26) = 26
     ptrace(PTRACE_ATTACH, 2600, 0, 0)       = 0
     open("/proc/2600/status", O_RDONLY)     = 5
     fstat(5, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
     mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f17594f9000
     read(5, "Name:\tcat\nState:\tT (stopped)\nTgi"..., 1024) = 682
     close(5)                                = 0
     munmap(0x7f17594f9000, 4096)            = 0
     tkill(2600, SIGCONT)                    = 0
     wait4(2600, 0x7fff614fc64c, 0, NULL)    = ? ERESTARTSYS (To be restarted)
     --- SIGCHLD (Child exited) @ 0 (0) ---
     rt_sigreturn(0x11)                      = 61
     wait4(2600, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGQUIT}], 0, NULL) = 2600
     write(1, "Redelivering pending Quit.\n", 27) = 27
     ptrace(PTRACE_CONT, 2600, 0, SIGQUIT)   = 0
     wait4(2600, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGQUIT}], 0, NULL) = 2600

So it attaches, resumes from the stop, and then the guy stops with the
pending SIGQUIT, just like he should.  But gdb doesn't tell me about the
signal so I can debug the program!  (This defeats the whole purpose of
crash-suspend.)  It just acts like I'd done "handle pass nostop noprint".

The cascade bug is that after doing this delivery I wish it hadn't,
it's then utterly confused:

Program process 2600 exited: Unknown signal 0 (terminated)

(gdb) c
Continuing.
Couldn't get registers: No such process.
(gdb) det
../../gdb/linux-nat.c:808: internal-error: iterate_over_lwps: Assertion `!is_lwp (inferior_ptid)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) y

../../gdb/linux-nat.c:808: internal-error: iterate_over_lwps: Assertion `!is_lwp (inferior_ptid)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
[...]


I mention the second problem just to avoid any possible loose ends.
I suspect it won't come up without the code that makes the first problem.

I think the intent is to swallow the SIGSTOP that PTRACE_ATTACH sent
(was-not-stopped case) or the SIGCONT that gdb sent (was-stopped case).

I don't think you can do it locally after attach the way this code tries
to.  Any intervening signal dequeued first (like my SIGQUIT) ought to go
through the normal infrun loop.  So I think you have to integrate the
SIGSTOP/SIGCONT magic case into the general loop, not just isolate it here.
i.e., set flags saying what spurious signal is expected and should be
swallowed, and have the standard got-a-signal case check and reset the flags.


Thanks,
Roland


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]