This is the mail archive of the gdb@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Partial cores using Linux "pipe" core_pattern


I'm not sure this is the best list for this question; if anyone has any
other thoughts of where to ask please let me know.

I'm having problems debugging some cores being generated on a
distributed system.  The "client" (where the cores are being dumped) is
running on a cut-down GNU/Linux system, running out of a ramdisk (no
local disk).  To preserve cores I have set up NFS and automount, and I'm
dumping cores over the network to a host.  In order to make this as
efficient as possible I am using the Linux (I'm running 2.6.27) kernel's
pipe capability in the core_pattern and piping it to my own program to
write compressed output using gzopen()/etc.  I have some other locking,
etc. to do myself which is why I have my own program instead of just
piping to gzip.

Most of the time this works great; the core appears on the host and I
can decompress it and debug it and it's very nice.

But sometimes, the core is truncated and can't be debugged.  Basically
it has the first part of the core file without error (I've seen sizes
both 64K(!) and about 65M) but obviously you can't even get a backtrace,
with the whole last part of the core missing.  However, it's still a
valid compressed file (it decompresses just fine) so it's not a network
error.  After some experimentation I can determine that indeed the
generated core file contains all the data that was read from the
kernel... in this situation, it appears, the kernel simply doesn't give
me all the data to construct the core.

I've instrumented every single function with checking for errors and
writing issues to syslog (including informational messages so I know the
logging works) and no errors are printed.  The size of the core that I
get from read(2)'ing stdin is just short, but read(2) never fails or
shows any errors!

Does anyone have any thoughts about where I can look next to try to
figure out what's going on?  Ideas or knowledge about limitations of the
kernel's core_pattern pipe capability, such as timing issues etc., that
might be leaving me with short cores?

I'm pretty stumped here!

-- 
-------------------------------------------------------------------------------
 Paul D. Smith <psmith@gnu.org>          Find some GNU make tips at:
 http://www.gnu.org                      http://make.mad-scientist.us
 "Please remain calm...I may be mad, but I am a professional." --Mad Scientist


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]