This is the mail archive of the
guile@cygnus.com
mailing list for the Guile project.
Re: SIGSEGV in scm_gc_mark () [RESOLVED by upgrading]
- To: Jim Blandy <jimb@red-bean.com>
- Subject: Re: SIGSEGV in scm_gc_mark () [RESOLVED by upgrading]
- From: Jan Nieuwenhuizen <janneke@gnu.org>
- Date: Tue, 20 Jul 1999 20:58:53 +0200
- cc: Jan Nieuwenhuizen <janneke@gnu.org>, guile@gnu.org, "ir. Wendy" <hanwen@cs.uu.nl>
On Monday, 19 July 1999, Jim Blandy writes:
> > Any ideas of how to tackle this bug?
>
> I've forwarded this reply to guile@cygnus.com; there are lots of
> people there with experience tracking down these sorts of problems.
Thanks very much for your reply. In the meantime, however, I got
the chance to upgrade to LinuxPPC-R5 (aka -1999) and I haven't observed
the problem since.
I'd been bothered by the problem for about two months, but was
hesitating to contact guile development, because my LinuxPPC-R4
platform was rather shaky: I'd been having quite some other
libc-1.99 and egcs-1.1b (both special hack-ups for ppc) problems.
It was only after I'd verified that
* gnome compiled and ran flawlessly, and
* LilyPond compiled on a debian potato snapshot: glibc-2.1,
egcs-1.1.2, etc. still showed the problem,
that I decided the problem wouldn't be solved easily, and we'd
have to tackle it.
Maybe I should have sent you a 'RESOLVED by upgrading' message
sooner, however, I haven't tested if the problem still remains
on the latest debian snapshot. Also, I don't have a clue as to
what part of the major upgrade 'solved' the problem. In short,
there will most probably not be a binary search for heap
corruption patch from me any time soon. Now I just hope this
won't bite us again.
Greetings,
Jan.
> One approach would be to use the garbage collector as a heap
> validator, and do an n-ary search to find out exactly when the cell is
> corrupted. Change scm_igc so that, after doing a garbage collection,
> it truncates the free list to some small number of cells, controlled
> by a global variable, say scm_debug_alloc_count. Then you will get a
> garbage collection after every `scm_debug_alloc_count' allocations.
>
> Run the program under GDB, set scm_debug_alloc_count to 1000 or so,
> and set a breakpoint with an `ignore' count (with the `ignore'
> command) of a million or so on scm_igc. When scm_igc crashes, use
> `info break' to check the remaining ignore count; see how many times
> the function has been called.
>
> Start the program again, and set the ignore count to run up to the
> last successful call to scm_igc. Now set scm_debug_alloc_count to a
> smaller value, so GC's will happen more often, and see how many
> further calls to scm_igc succeed. Repeat the process with smaller and
> smaller values of scm_debug_alloc_count, until you know the two calls
> to SCM_NEWCELL between which the corruption occurs. Then start
> looking at your code.
>
>
> In general, it would be nice to automate this whole process by having
> the GC say, when it notices an error, "the heap was corrupted sometime
> between the NNNNth and MMMMth cell allocation," and then further have
> an environment variable that forces a GC after a certain number of
> allocations, by counting the free list. Then it would be pretty easy
> to do these binary searches for heap corruption. That would be a nice
> patch to have, if someone wanted to write it.
Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond - The music typesetter
http://www.xs4all.nl/~jantien/ | http://www.lilypond.org/