This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFA [PATCH v4] Implement 'catch syscall' for gdbserver (was Re: RFA [PATCH v3] Implement 'catch syscall' for gdbserver)


On 10/03/2013 11:02 PM, Philippe Waroquiers wrote:
> Thanks for the in depth comments (sorry for still not getting all
> style aspects ok).
> 
> I can handle the (easy) comments in the coming days.
> 
> However, there are a few not easy comments on which I have
> questions/(long) feedback below ...
> 
> 
> On Thu, 2013-10-03 at 19:40 +0100, Pedro Alves wrote:
> 
>> Please don't ever send back ffffffff as meaning -1.
>> Numbers in the RSP are variable length hex.  You can't assume the
>> host side interprets them as two's complement 32-bit integers.  If
>> sending a negative number, the '-' sign needs to be there explicitly
>> in the packet, like '-1'.  Perhaps it would even be better to leave
>> -1 as implementation detail, and at the protocol level send something
>> else, like "unknown" or just "-", or nothing.
>>
>>> (do we really want that to be part of the protocol spec ?
>>> This is really a very special case caused by a user "strange"
>>> manipulation of forcing to use the packet QCatchSyscalls).
>>
>> Yes, please specify how that should work.  It can happen, so
>> better define it.  It should also be possible for the server to be
>> dumb, and have GDB itself figure out the syscall number.  GDB
>> could read the registers itself, just like gdbserver is currently
>> doing.  I'd assume the -ENOSYS trick might not always work on
>> all archs, for instance?
> I agree "unknown" or "-" or something like that is cleaner.
> For what concerns the -ENOSYS nasty trick: no, this does not work on
> all archs. I quickly hacked ppc64 gdbserver, and there the
> nasty trick is to compare sysret and orig_r3 : if identical, it is
> a syscall entry.
> All these logics are all today partially wrong:
> * The "flip/flop" logic in native GDB becomes completely lost
>   in case of enable/disable e.g. between entry and exit).
> * x86 gdbserver  has a minor bug (I think): a syscall "exit" from
>   a not implemented syscall is seen as an entry,
> * the hacky ppc64 trick is wrong in case the sysret of the previous
>   syscall is equal to the r3 argument of the new syscall.
> 
> In summary: differentiate syscall entry from exit is currently
> fully correct nowhere (the less bad being gdbserver x86, which is almost
> always correct and "repairs" itself). ppc64 is not always correct,
> but also repairs "itself". The flip/flop logic, once broken, confuses
> all entry/exits until further notice.
> 
> I think we need to have something better e.g. in common/linux-ptrace.c
> which would maintain the correct expected state using maybe the
> following behaviour described in the ptrace man :
>        Syscall-enter-stop and  syscall-exit-stop  are  indistinguishable  from
>        each  other  by  the  tracer.   The  tracer  needs to keep track of the
>        sequence of ptrace-stops in order to  not  misinterpret  syscall-enter-
>        stop  as  syscall-exit-stop  or  vice versa.  The rule is that syscall-
>        enter-stop is always followed by syscall-exit-stop,  PTRACE_EVENT  stop
>        or  the  tracee's  death;  no  other  kinds of ptrace-stop can occur in
>        between.
> It is however not clear to me how the above works if e.g. gdb attaches
> to a process which is blocked in a syscall, and then does PTRACE_SYSCALL
> Will GDB get an entry for such a restarted syscall ?
> Or will it get only an exit ?

I don't know for sure off hand, I'd have to experiment, but I think yes,
GDB gets a syscall entry for the restarted syscall.  I think
that's similar to what I saw when I wrote that comment in
linux-nat.c:linux_handle_syscall_trap -- "Later, when the user re-resumes
this LWP, we'd see another syscall entry event".

> I was intending to improve (and factorise) this logic after the RSP
> protocol patch, but I have to admit I do not understand the above man.
> I first intended to make a regtest showing the bug in the native
> implementation.

Well, the above man snippet makes it look like GDB's implementation
is closer to the "correct" one, modulo the bug you found.  That makes
me think that we should try and fix the GDB-side bug in the existing
framework (to make sure it's solvable using that approach), and then
depending on the result of that, make gdbserver follow the same
approach.

>>> +QCatchSyscalls:1 [;SYSNO]...
>>> +QCatchSyscalls:0
>>> +  Enable ("QCatchSyscalls:1") or disable ("QCatchSyscalls:0")
>>> +  catching syscalls from the inferior process.
>>
>> So, "catch syscall" is per-inferior/process on the GDB side, but
>> this always sets the catchpoints on all processes.  Was that
>> intended?
> As I understand now (after the hint about add-inferior you gave
> on IRC), gdbserver supports multi process.
> However, it is not clear to me how gdbserver properly processes
> e.g. the packet QPassSignals.

The "handle" command, or anything signal related (even "catch signal")
on the GDB side is _not_ per process.

> The QCatchSyscalls packet is just following the QCatchSyscalls packet.
> 
> So, if I understand properly the comment, the QCatchSyscalls packet
> should modify a list of syscalls to maintain per process
> i.e. in the inferiors.h struct process_info ?

Hmm, not sure that'd be the ultimate way.  I think it may be best to
leave this global.  Thinking forward, with something like itsets on
the target side, a given "catch syscall" could be applicable to all
processes/threads, or some process, or multiple processes, but also to some
thread, to multiple threads, or any mix of those things.  When you put
threads in the mix, you start wondering whether to maintain the list
also per-thread.  But then, it'd probably make more sense to
keep the list of syscall catchpoints a global list of syscall catchpoint
objects, where this object has a "scope" field attached.  When querying
whether GDB is interested in the event, we'd iterate over all catchpoints,
and check if the event thread matches the scope field of each.  In fact,
this is just like the breakpoint module on the GDB side checks whether a
thread-specific event matches any thread-specific breakpoint.

> To set the process to be used for a QCatchSyscalls packet, gdb will
> then first have to send a Hg packet.

I'm thinking that, looking ahead at itsets, with the concept of two
sets - the trigger set (which threads should trigger/cause a stop), and
the stop set (which threads should be implicitly paused when the event
triggers, e.g., stop the whole process), Hg will not be sufficient.  We
need a "scoping" machinery like that for all sorts of breakpoint/catchpoint
packets.  E.g., we don't do thread-specific breakpoints on the target
side yet, and that would also benefit from a generalized scoping operator
or some such.  So it's probably best to come up later with a
generalized machinery for specifying those scopes.  Not clear to me
whether or not that should be a separate packet (like Hg is).  There's
also the merging to breakpoints on the GDB side and on the server side
to handle.  E.g., if the user does "catch syscall 2" for process 1, and
the "catch syscall 2" for process 2, will that end up sending just
one QCatchSyscalls packet, or multiple?  The "a QCatchSyscalls always
replaces a previous QCatchSyscalls" is a point to consider there.  What if
we support target-side commands and conditions associated with QCatchSyscalls, but
each catchpoint has a different target-side condition?   Then the way merging
is to be done isn't as clear.

Hmm, why _don't_ we support target-side commands and conditions with
this?  Indeed it looks like a step backward to not support features we've
been adding to the Z packets.  So it looks like Luis may be right in that
we should really consider doing catchpoints in the RSP differently.

Thinking only in terms of multi-process/thread, I'm inclined to ignore the
"per-process" thing now, and just leave it as gdbserver making catch
syscall apply to all processes.

> Do I correctly understand the comment (and the way GDB will or should
> use Hg to indicate the inferior for the next QCatchSyscalls) ?

Not sure what comment you're referring to.

>> The whole 'SIGTRAP | 0x80' thing is super silly.  The ideal way
>> would be for the kernel to tell us.  See:
>>
>>  https://sourceware.org/gdb/wiki/LinuxKernelWishList
>>
>> "A PTRACE_O_SYSCALL(_ENTER|_EXIT) ptrace option, along with
>>  PTRACE_EVENT_SYSCALL(_ENTER|_EXIT).  PTRACE_SYSCALL/sysgood doesn't leave
>>  any way to trace syscalls when single-stepping.  Separate syscall enter
>>  and exit events would just be nice, to avoid having to keep track of
>>  enter/exit - the kernel knows, might as well give us the info. (While
>>  at it, a way to filter which syscalls trigger events would be nice to
>>  avoid unnecessary slowing down the inferior until it calls the syscall the user
>>  is interested in (you can catch specific syscalls with "catch syscall foosyscall";
>>  gdb currently filters after the fact), though that's obviously a
>>  bit more work.)"
>>
>> Want to work on that?  It'd be great...
> Well, I will first concentrate on trying to make a correct GDB patch
> and properly follow the GDB coding style rules.

Sure, I didn't mean to imply doing that as prerequisite for the
GDB patch.

>> Another idea would be, in addition to supporting syscall_entry
>> and syscall_exit stop replies, also support plain "syscall" (and
>> a new TARGET_WAITKIND_SYSCALL), meaning, "I don't know if it was
>> an entry or an exit, you decide." (just like SYSCALL_SIGTRAP), and then
>> we'd leave GDB to decide.  GDB will end up fetching registers
>> off the target anyway, so I think I'd end up costing 0 in terms of
>> performance.
>>
>> Then when we have real PTRACE_EVENT_SYSCALL(_ENTER|_EXIT), we'd
>> use syscall_entry/syscall_exit.
>>
>> (not completely sure about this).
> Not sure also. Adding this in the protocol before we know it is needed
> is not ideal. However, it looks to me that not adding it now means
> that a GDB supporting only syscall enter/exit will have a problem
> with a newer gdbserver that would give back the 3rd reason.
> 
> 
>>
>> How portable is this -ENOSYS thing?  What does strace do?
> -ENOSYS is documented in man ptrace, but only for x86.
> strace uses -ENOSYS for x86, and (from what I understood reading
> the code) cannot differentiate entry/exit (in all cases) for other
> archs.
> I suspect strace uses for these archs a flip/flop which might
> suffer from the same bug/weakness as the native GDB flip/flop.

It'd be good to confirm that.  Perhaps reach out to the strace
developers/list asking for their opinion on which scheme to
use, etc.?  We shouldn't be guessing this stuff really.  They
have experience we should be able to leverage.

> 
>>
>>
>> BTW, it'd be very very nice if GDB and GDBserver didn't end
>> up with different mechanisms for this.  We've been trying to
>> make the gdb and gdbserver backends more similar, with the end
>> goal of merging them, and more points of divergence don't really
>> help that.
> Yes, I agree. I was intending to look more in depth to that after
> the RSP patch.

As we're seeing, these issues interfere with how we end up
designing the the packets.  IMO, we should really take a look
at the GDB bug before finalizing the RSP side.  We can go back
and fix bugs easily, but we can't go back and change the
RSP as easily...

> 
>>> +# This shall be updated whenever QCatchSyscalls packet support is implemented
>>> +# on some gdbserver architecture.
>>> +if { [is_remote target]
>>> +     && ![istarget "x86_64-*-linux*"] 
>>> +     && ![istarget "i\[34567\]86-*-linux*"] } {
>>> +    continue
>>> +}
>>
>> This isn't strictly correct, as it's quite concievable that
>> e.g., a non-x86 valgrind (or some other random target) gets
>> support for these packets before our own corresponding gdbserver
>> port gets it.  Ideally we'd probe for support somehow, like by
>> looking for some GDB error/warning in the first "catch syscall", or
>> some such.
> Effectively, at Valgrind side, it will be supported for all
> archs out of the box (as Valgrind has already an arch independent
> syscall interception, and properly identifies entry from exit).
> However, up to now, I never dared to run the gdb testsuite with
> Valgrind gdbserver.

Eh, perhaps you should.  ;-)

>> - I think that if you set a catch syscall, and then disconnect
>> in a way that gdbserver leaves the target running (disconnected
>> tracing, or if we have any persistent commands, see server.c), we'll end up
>> with the program held stopped as soon as the program calls a
>> syscall.  I think at least handle_target_event should be updated.
> I will need to dig into that.
>>
>> - linux-nat.c:linux_handle_syscall_trap has some hairy stuff
>> when "stopping" is true, due to syscall restarts.  I was
>> a little surprised that the gdbserver patch didn't need
>> anything of the sort, but then again, I don't recall whether
>> that bits is covered by the testsuite.
>> I'm wondering whether this -ENOSYS mechanism would allow
>> getting rid of all that, of if something is missing on the
>> gdbserver side.
> I believe the hairy stuff for native is to implement the flip/flop
> logic, but this logic is broken when enabling/disabling.

It may be a simple bug.  Until we analyze it more deeply, we
don't know.

> -ENOSYS does not suffer from this problem (see above), but is
> arch dependent.
> So, I believe the current x86 gdbserver -ENOSYS is better/simpler,
> but for other archs, it seems such tricks do not exist or are
> not working as well (e.g. ppc64).
> 
> I am not sure how to continue with this problem of differentiating
> entry/exit, together with the GDB 7.7 deadline.
>
> I see 3 approaches:
> 1. implement the QCatchSyscalls packet as it is now
>    (with only entry/exit), and -ENOSYS for x86,
>     and see later what to do better for other archs or native GDB.
> 2. Same as 1, but we add the 3rd stop reasons
>          syscall_no_idea_if_it_is_an_entry_or_return
>    so as to have the protocol ready for archs less easy than x86.
> 3. work longer on the patch, first trying to have a proper
>    way to differentiate entry/return in both native GDB and
>    gdbserver, and preferrably arch independent.
>    (unclear if this is possible. If not, we fallback to
>     option 2 or 1).
>    (or possibly, the bug in the GDB flip/flop is easy to fix ?
>     then the flip/flop should be used by gdbserver also)

Argh, things look backwards to me when time-based releases get
delayed for features.  It's supposed to be the opposite.  :-/  Perhaps
Sergio or I can try to look at the bug, though I already have a huge
backlog of distractions, so I shouldn't promise anything...

> I guess 1 and 2 could be done on time for GDB 7.7, but I will not have
> enough evening/WE time to finish 3 in let's say the next two weeks
> (there is soon a valgrind 3.9.0 to go out, for which I have a few
> things still to do).

-- 
Pedro Alves


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]