This is the mail archive of the
mailing list for the GDB project.
RE: collecting data from a coring process
- From: Paul Marquess <Paul dot Marquess at owmobility dot com>
- To: Dmitry Samersoff <dms at samersoff dot net>, "gdb at sourceware dot org" <gdb at sourceware dot org>
- Date: Thu, 8 Sep 2016 17:12:36 +0000
- Subject: RE: collecting data from a coring process
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=none (sender IP is ) smtp.mailfrom=Paul dot Marquess at owmobility dot com;
- References: <CY1PR0501MB11783F479AF7D639A82FE02F95EC0@CY1PR0501MB1178.namprd05.prod.outlook.com> <CAKhyrx_9GnLTBDKkhW_y4QG+f3xV_SL-Vtg0WN+vU6UXnY-qLA@mail.gmail.com> <CY1PR0501MB1178A955FBE2AAAE65655EAB95EC0@CY1PR0501MB1178.namprd05.prod.outlook.com> <firstname.lastname@example.org> <CY1PR0501MB117800AACB41115C303EB9D495E60@CY1PR0501MB1178.namprd05.prod.outlook.com> <email@example.com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
From: Dmitry Samersoff [mailto:firstname.lastname@example.org]
> > Thanks, will take a look at that. When you say "more or less safely",
> > I'm reading that as saying there will be issues with it. :-)
> I don't know a way to do anything with a crashing process with 100% reliability. Ever coredump.
> Custom code in signal handler doesn't make the situation worse.
I'd rephrase that as saying that to say that if you careful and know the all limitations of what is possible in a signal handler you won't make the situation worse.
> It's quite often for complicated apps that the crash is result of something that happens far before crash point. E.g. when you see a memory corruption you typically interesting where the memory had been corrupted but not where corrupted memory was hit by the app.
Tell me about it. Memory corruption errors can be impossible to track down.
> So signal handlers that know application data structure and can print meaningful information is quite usable and saves a lot of time in debugging.
> Also it might be necessary to free some resources before process start dumping core to allow faster restart.
> > Trouble is I soon will not allow a core file to be written -- the
> > process is reaching a size where I cannot allow it to be out of action
> > for the amount of time it takes to write that to disk.
> One of possible solution is to add some keep-alive protocol between child and parent (e.g. child keep touching file on disk or sending udp packets), if keep-alive doesn't come in time, parent consider the child as dead, send abort to it and fire a new process.
> This solution also covers the situation when a child process hugs or deadlocks.
Luckily I already have a health check probe that does that.