On Fri, Dec 09, 2005 at 04:57:57PM -0500, Greg Bronevetsky wrote:
I see that there's been some discussion on this list on checkpointing
techniques that may be included in gdb. My research group at Cornell is
working on a number of such checkpointers for both sequential and
parallel programs and we recently decided to try a more challenging
variant of checkpointing where the user can take a checkpoint of their
program, modify their source code a bit (add remove stack variables,
move function calls around a bit and a few other things) and then resume
computation using the modified code. This seems to be very useful for
debugging long-running applications since the user would be able to work
around the bug without losing a week's or month's worth of results. (can
happen in high-performance computing) Similarly, its useful for
situations where your execution is in some particularly buggy corner
case and you want to keep making modifications and trying them out
without having to guide the program's execution back into that corner
case after every code change.
My question is, has anybody heard of anything that can do this?
Obviously, this kind of checkpointing would require compiler support, so
gdb wouldn't have done this, but have you heard of any systems/research
that has addressed this question? Thanks.
Better: I know at least one production debug environment which supports
this - Apple's Xcode. The option is called fix-and-continue. I don't
think they combine it with checkpointing, though, only as an action on
a running process. It's partly compiler-based and partly in their
debug environment.
Merging that with Michael's fork-based code would be fairly
straightforward, I expect.