This is the mail archive of the guile@cygnus.com mailing list for the Guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: SIGSEGV in scm_gc_mark () [RESOLVED by upgrading]


On Monday, 19 July 1999, Jim Blandy writes:

> > Any ideas of how to tackle this bug?
> 
> I've forwarded this reply to guile@cygnus.com; there are lots of
> people there with experience tracking down these sorts of problems.

Thanks very much for your reply.  In the meantime, however,  I got 
the chance to upgrade to LinuxPPC-R5 (aka -1999) and I haven't observed 
the problem since.

I'd been bothered by the problem for about two months, but was
hesitating to contact guile development, because my LinuxPPC-R4
platform was rather shaky: I'd been having quite some other 
libc-1.99 and egcs-1.1b (both special hack-ups for ppc) problems.

It was only after I'd verified that

   * gnome compiled and ran flawlessly, and
   * LilyPond compiled on a debian potato snapshot: glibc-2.1, 
     egcs-1.1.2, etc. still showed the problem,

that I decided the problem wouldn't be solved easily, and we'd
have to tackle it.

Maybe I should have sent you a 'RESOLVED by upgrading' message
sooner, however, I haven't tested if the problem still remains
on the latest debian snapshot.  Also, I don't have a clue as to 
what part of the major upgrade 'solved' the problem.  In short, 
there will most probably not be a binary search for heap 
corruption patch from me any time soon.  Now I just hope this
won't bite us again.

Greetings,

Jan.

> One approach would be to use the garbage collector as a heap
> validator, and do an n-ary search to find out exactly when the cell is
> corrupted.  Change scm_igc so that, after doing a garbage collection,
> it truncates the free list to some small number of cells, controlled
> by a global variable, say scm_debug_alloc_count.  Then you will get a
> garbage collection after every `scm_debug_alloc_count' allocations.
> 
> Run the program under GDB, set scm_debug_alloc_count to 1000 or so,
> and set a breakpoint with an `ignore' count (with the `ignore'
> command) of a million or so on scm_igc.  When scm_igc crashes, use
> `info break' to check the remaining ignore count; see how many times
> the function has been called.
> 
> Start the program again, and set the ignore count to run up to the
> last successful call to scm_igc.  Now set scm_debug_alloc_count to a
> smaller value, so GC's will happen more often, and see how many
> further calls to scm_igc succeed.  Repeat the process with smaller and
> smaller values of scm_debug_alloc_count, until you know the two calls
> to SCM_NEWCELL between which the corruption occurs.  Then start
> looking at your code.
> 
> 
> In general, it would be nice to automate this whole process by having
> the GC say, when it notices an error, "the heap was corrupted sometime
> between the NNNNth and MMMMth cell allocation," and then further have
> an environment variable that forces a GC after a certain number of
> allocations, by counting the free list.  Then it would be pretty easy
> to do these binary searches for heap corruption.  That would be a nice
> patch to have, if someone wanted to write it.

Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond - The music typesetter
http://www.xs4all.nl/~jantien/      | http://www.lilypond.org/

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]