This is the mail archive of the cygwin@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Heads up: *possible* bug in cygwin


Christopher Faylor wrote:
On Tue, Dec 31, 2002 at 09:43:50AM -0500, Charles Wilson wrote:

Did gcc (pre 3.2) automatically initialize data to 0, while gcc-3.2 does not? Hmmm...waitaminute, I do have gcc2 installed...

If gcc/ld is not initializing static data to zero then there are some
pretty serious problems.  Neither gcc, nor any other compiler that I
am aware of, is supposed to initialize automatic data to zero.
You're right, as usual. (Plus empirical evidence: no change when compiling with gcc-2. Still get the SIGSEGV)

You're also right [I think] about the buffer overrun/whatever problem in glib. I haven't found the specific, offending command yet, but it seems pretty obvious from the postmortem that that is what has happened.

Single-stepping thru the code shows some interesting things. Basically, we have a g_string (structure that contains a char* field, a length field, plus some other fields that are hidden from the public interface. The whole structure is dynamically allocated, and the char* field points to additional dynamically allocation storage.

This g_string is initially allocated as a minimum size string, where the char* points to a minimum size buffer (4 bytes, it appears), but contains "" (e.g. *char = '\0', len = 0). [as far as glib is concerned, the char* points to a chunk of memory 4 bytes long. But dlmalloc actually uses a chunk that is 16 bytes long == 3 32bit words of dlmalloc overhead, plus the user data]

Then, the testsuite attempts to append a char* that contains approx 20k characters. This means that glib must realloc a larger buffer for the g_string's internal storage. It rounds up to the nearest power of two -- 32768 -- and attempts to realloc the space:

line 220: gstring.c (g_string_maybe_expand)
string->str = g_realloc (string->str, string->alloc);
"string" is the g_string, recast to GRealString* to reveal the
private members of the structure (like 'alloc', which is how
much storage glib originally requested for this buffer, even
though it only needed 1 byte for the 0-length string "".)

in g_realloc, we simply thunk to the system realloc
line 338: gmem.c (g_realloc)
p = (gpointer) realloc (mem, size);
where mem is the string->str argument above, and size is
the string->alloc argument (it equals 32768)

mem, at this point (or string->str, if you like) is a viable pointer to a previously allocated chunk of memory (4 bytes long) that contains '\0' and 3 unknown other values. It has not previously been freed.

But now, as I said, we're in the bowels of dlmalloc.

line 146: malloc_wrapper.cc (realloc)
res = dlrealloc (p, size);
again, p (== mem == string->str) is a valid pointer. size is
32768.

line 4078: malloc.cc (dlrealloc == rEALLOc)
newmem = mALLOc(nb - MALLOC_ALIGN_MASK);
we've already determined that the request cannot be satisfied by expanding the allocation within the current chunk, nor can it be satisfied by expanding the allocation into the next chunk. So, we fall back on the "allocate-copy-free" algorithm, and start by allocating a new chunk of memory.

line 3516: malloc.cc (dlmalloc == mALLOc)
bck->fd = unsorted_chunks(av);
How we got to this line: there have been (other pointers) freed, so we have to first search the list of recently freed memory for appropriately sized chunks.
1) this request is not a 'fast bin' size, so skip that codeblock
2) it's not a small request, so skip the smallbin check
3) so, we consolidate all fastbins. That succeeds.
4) We enter a while loop, as described by this comment:

/* Process recently freed or remaindered chunks, taking one only if it is exact fit, or, if this a small request, the chunk is remainder from the most recent non-exact fit. Place other traversed chunks in bins. Note that this step is the only place in any routine where chunks are placed in bins. */

while ( (victim = unsorted_chunks(av)->bk) != unsorted_chunks(av)) {
bck = victim->bk;
size = chunksize(victim);
...(not a small request, so skip the next codeblock)...
(now we pull "victim" out of the doubly-linked list.)
unsorted_chunks(av)->bk = bck;
bck->fd = unsorted_chunks(av);

And it's here that the error occurs: trying to hook up the forward pointer in victim's predecessor to point to victim's successor.

The problem is, 'victim' is full of garbage. Supposedly this free'd chunk of memory is 808,464,432 bytes long. Its predecessor is also 808,464,432 bytes long. And its predecessor and successor are both located at 0x30303030. Which is bogus, becuase dlmalloc hasn't obtained that region of memory from the system (and besides, 0x30303030 is suspiciously like " ")

So it looks like glib's test program has clobbered out-of-bounds memory in a given malloc'ed block -- but only a few bytes, enough to clobber dlmalloc's bookkeeping but not enough to trigger an actual segfaut. And then it freed the (clobbered) block -- but we don't notice THAT until the NEXT time we try to malloc/realloc some memory. E.g. now.

So the error is not in
string-test.c (main) line 109: g_string_sprintf() --> g_string_sprintfa_int() --> g_string_append()...

It might be in
string-test.c (main) line 109: g_string_sprintf() --> g_string_sprintfa_int() --> g_strdup_vprintf(), which is immediately before the line above,

or it might be in some earlier test in string-test.c (main).

Have I mentioned how much I hate memory allocation? I'm starting to think, since the problem DOES seem to be in glib, that it'd be faster to compile this mess on linux and use a debugging malloc like electricfence. But I am having a hard time believing that of the thousands of glib/gtk/gnome developers out there, nobody has noticed a fundamental problem in [glib|glib-test-suite] before now.

[...but I can't reproduce the fault on linux. Even if I link in dlmalloc. Bleah. ElectricFence on linux couldn't find anything suspicious either.]

But I'm still confused as to why this worked in May. When cgf introduced Doug Lea's malloc on 2002/08/16, what were we using before? Lea's malloc is interesting, in that memory fragments obtained from the systems memorymanager contain both dlmalloc's own bookkeeping inline and user data. So **small** buffer overruns by a userland program will screw up dlmalloc's own accounting without triggering a segfault immediately. Other malloc algorithms maintain the bookkeeping elsewhere, and buffer overruns will (usually) only clobber other user data. (According to Lea's documentation, this is a feature of dlmalloc, not a bug).

So if cygwin's earlier malloc implementation had "separate books" as it were, then the buffer overrun (which I have not yet found) could have occurred without creating an obvious runtime error.

But that still doesn't explain why it works without fault on linux. Mebbe I'll try to build efence on cygwin. But not tonight; that's enough. Happy New Year, everybody.

--Chuck


--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]