This is the mail archive of the gdb-prs@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

build/2481: GDB 6.8 (and 6.7.1) - Solaris 10 Sparc - Failing to Detect Stopped Process Properly


>Number:         2481
>Category:       build
>Synopsis:       GDB 6.8 (and 6.7.1) - Solaris 10 Sparc - Failing to Detect Stopped Process Properly
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    unassigned
>State:          open
>Class:          support
>Submitter-Id:   net
>Arrival-Date:   Tue Jul 22 03:58:01 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator:     Jonathan Leffler <jonathan.leffler@gmail.com>
>Release:        6.8
>Organization:
>Environment:
Sun Sparc E450 (4 CPU) - Solaris 10 - GCC 3.4.1
>Description:

I have an ongoing problem on a Sun E450 (4 CPU - slow, 450 MHz) with both the Sun Studio 11 dbx (it core dumps) and with GDB 6.7.1 (and 6.8) - I did more investigation on 6.7.1, but found that 6.8 was released and got the same problem with it. 

I have rebuilt GDB in both 32-bit and 64-bit mode, but it doesn't make much difference.  I've also built a version with some trace added in procfs.c.  What seems to be happening is that the code is traversing a couple of "can't happen" paths.  I'm not sure where to look next...

Here is the trace output I've got from 'gdbtui ifxchkpath' (which happens to be the program I'd really like to debug) - the stuff I added is marked with JL: (the printf() statements include that tag).


GNU gdb 6.7.1
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.10"...
Reading symbols from /work1/jleffler/src/cmd/ifxchkpath...done.
Reading in symbols for ifxchkpath.c...done.
(gdb) br main
Breakpoint 1 at 0x100004f4c: file ifxchkpath.c, line 1199.
(gdb) r
Starting program: /work1/jleffler/src/cmd/ifxchkpath
JL:-->> proc_flags(): pid 22655 (status valid = 0)
JL:-->> proc_get_status(): pid = 22655
JL:---- proc_get_status(): pid = 22655 - read() for zero tid (1232 (1232) bytes requested)
JL:---- proc_get_status(): pid = 22655 - read 1232 bytes
JL:---- proc_get_status(): pid = 22655 - read() complete - fd = 7; status valid = 1
JL:---- proc_get_status(): pid = 22655 - status valid = 1
JL:-->> proc_flags(): pid 22655 (status valid = 1)
JL:-->> proc_flags(): pid 22655 flags 0x09000003
JL:<<-- proc_get_status(): pid = 22655 - status valid = 1
JL:-->> proc_flags(): pid 22655 flags 0x09000003
JL:-->> proc_get_status(): pid = 22655
JL:---- proc_get_status(): pid = 22655 - read() for zero tid (1232 (1232) bytes requested)
JL:---- proc_get_status(): pid = 22655 - read 1232 bytes
JL:---- proc_get_status(): pid = 22655 - read() complete - fd = 7; status valid = 1
JL:---- proc_get_status(): pid = 22655 - status valid = 1
JL:-->> proc_flags(): pid 22655 (status valid = 1)
JL:-->> proc_flags(): pid 22655 flags 0x09200003
JL:<<-- proc_get_status(): pid = 22655 - status valid = 1
JL:-->> procfs_wait(): label wait_again
JL:---- procfs_wait(): pid = 22655
JL:-->> proc_flags(): pid 22655 (status valid = 0)
JL:-->> proc_get_status(): pid = 22655
JL:---- proc_get_status(): pid = 22655 - read() for zero tid (1232 (1232) bytes requested)
JL:---- proc_get_status(): pid = 22655 - read 1232 bytes
JL:---- proc_get_status(): pid = 22655 - read() complete - fd = 7; status valid = 1
JL:---- proc_get_status(): pid = 22655 - status valid = 1
JL:-->> proc_flags(): pid 22655 (status valid = 1)
JL:-->> proc_flags(): pid 22655 flags 0x09200003
JL:<<-- proc_get_status(): pid = 22655 - status valid = 1
JL:-->> proc_flags(): pid 22655 flags 0x09200003
JL:---- procfs_wait(): pid = 22655 - long else
JL:-->> proc_flags(): pid 22655 (status valid = 1)
JL:-->> proc_flags(): pid 22655 flags 0x09200003
JL:---- procfs_wait(): pid = 22655 - if (flags&(PR_STOPPED|PR_ISTOP))
JL:---- procfs_wait(): flags = 0x09200003, why = 4, what = 59
PR_SYSEXIT : Exit from a traced system call Exit from SYS_execve
JL:---- procfs_wait(): post prettyprint
JL:---- procfs_wait(): case PR_SYSEXIT
JL:---- procfs_wait(): hopefully exec()...
JL:<<-- procfs_wait(): pid = 22655
JL:-->> proc_flags(): pid 22655 (status valid = 1)
JL:-->> proc_flags(): pid 22655 flags 0x09200003
JL:-->> procfs_wait(): label wait_again
JL:---- procfs_wait(): pid = 22655
JL:-->> proc_flags(): pid 22655 (status valid = 0)
JL:-->> proc_get_status(): pid = 22655
JL:---- proc_get_status(): pid = 22655 - read() for zero tid (1232 (1232) bytes requested)
JL:---- proc_get_status(): pid = 22655 - read 1232 bytes
JL:---- proc_get_status(): pid = 22655 - read() complete - fd = 7; status valid = 1
JL:---- proc_get_status(): pid = 22655 - status valid = 1
JL:-->> proc_flags(): pid 22655 (status valid = 1)
JL:-->> proc_flags(): pid 22655 flags 0x09200020
JL:<<-- proc_get_status(): pid = 22655 - status valid = 1
JL:-->> proc_flags(): pid 22655 flags 0x09200020
JL:---- procfs_wait(): pid = 22655 - long else
JL:-->> proc_flags(): pid 22655 (status valid = 0)
JL:-->> proc_get_status(): pid = 22655
JL:---- proc_get_status(): pid = 22655 - read() for zero tid (1232 (1232) bytes requested)
JL:---- proc_get_status(): pid = 22655 - read -1 bytes
JL:---- proc_get_status(): pid = 22655 - read() complete - fd = 7; status valid = 0
JL:<<-- proc_get_status(): pid = 22655 - status valid = 0
JL:<<-- proc_flags(): pid 22655 - bad failure - 0 returned
JL:-->> proc_get_status(): pid = 22655
JL:---- proc_get_status(): pid = 22655 - read() for zero tid (1232 (1232) bytes requested)
JL:---- proc_get_status(): pid = 22655 - read -1 bytes
JL:---- proc_get_status(): pid = 22655 - read() complete - fd = 7; status valid = 0
JL:<<-- proc_get_status(): pid = 22655 - status valid = 0
JL:-->> proc_get_status(): pid = 22655
JL:---- proc_get_status(): pid = 22655 - read() for zero tid (1232 (1232) bytes requested)
JL:---- proc_get_status(): pid = 22655 - read -1 bytes
JL:---- proc_get_status(): pid = 22655 - read() complete - fd = 7; status valid = 0
JL:<<-- proc_get_status(): pid = 22655 - status valid = 0
JL:---- procfs_wait(): pid = 22655 - else for (flags&(PR_STOPPED|PR_ISTOP)) 0x00000000 vs 0x00000003
JL:---- procfs_wait(): pid = 22655 - surely this can't happen...but it is happening!
(gdb) : ...giving up...not stopped.


When I run 'ps -lp xxxxx', I see the status T (stopped, traced).

When I type q, it says "the program is running.  Exit anyway (y, n)?" and exits in response to 'y'.

I set the variable info_verbose = 1 in top.c as well.

It seems that the output 'PR_SYSEXIT : Exit from a traced system call Exit from SYS_execve' is significant - things start going wrong after that, symbolized by the returns from 'read()' of -1 -- however, the first read() after that works, it is the subsequent ones that do not.

Can someone guide me on where to look next?

...time passes...find GDB 6.8...download it...compile it...'make check' can't find 'runtest' again (do I need dejagnu or something to provide runtest?)...and running the newly compiled GDB (after fixing a cast '(unsigned int)pid_t' at line 2859 of gdb/remote.c) I get basically the same result as before:

GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.10"...
(gdb) r
Starting program: /work1/jleffler/src/cmd/ifxchkpath
procfs:4337 -- process not stopped.
procfs: ...giving up...
(gdb) br main
procfs: couldn't find pid 27395 (kernel thread 1) in procinfo list.
(gdb) q
The program is running.  Exit anyway? (y or n)


The machine was last booted 11th May 2008.  I can't absolutely rule out that someone has done something weird (like install patches without rebooting) since then.  I have had periods when GDB seemed to work - and others when it doesn't, with "doesn't" prevailing.  I'll risk a reboot if you think it likely to help - but I don't normally do it because the sshd doesn't get restarted automatically (or didn't the last time I did an ad hoc reboot - which might well have been 11th May).  The machine is a couple thousand miles from where I'm sitting, so it is hard to fix that problem remotely.

Any ideas?
>How-To-Repeat:

>Fix:

>Release-Note:
>Audit-Trail:
>Unformatted:


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]