This is the mail archive of the
gdb@sources.redhat.com
mailing list for the GDB project.
Re: Checking function calls
- From: Fredrik Tolf <fredrik at dolda2000 dot cjb dot net>
- To: Michael Elizabeth Chastain <mec at shout dot net>
- Cc: gdb at sources dot redhat dot com
- Date: 06 Dec 2002 20:31:00 +0100
- Subject: Re: Checking function calls
- References: <200212061708.gB6H8BS01208@duracef.shout.net>
On Fri, 2002-12-06 at 18:08, Michael Elizabeth Chastain wrote:
> > I know, I didn't plan ahead good enough when I started writing it, and
> > now I'm stuck with either this, or a large rewrite.
>
> When I run into this kind of problem, I like to step back -- way back --
> get away from computers for a day or two and think about it.
>
> I think there is no easy way out, that you actually are stuck with a
> large rewrite. There are just too many pthread_mutex_lock's flying
> around.
I'm beginning to believe that, too. Maybe I have just been too
optimistic.
>
> For instance:
>
> client.c:findtransfer() does not have any locks.
>
> in client.c:freesharecache(), there is code:
>
> if (cache->parent != NULL)
> {
> pthread_mutex_lock(&cache->parent->mutex)l;
> ...
> }
>
> in general, it's unsafe to test a member and then acquire the lock,
> because someone else can delete cache->parent between the "if" statement
> and the acquisition of the lock.
>
Here, however, that isn't possible, since all deletions from that list
go via the freesharecache function, and a deletion of the parent also
loops through, locks, and deletes all the children, and since one of the
children apparently is locked, it won't go any further. I suspect it
might deadlock it, though.
> I recommend finding a textbook on multi-threaded programming that covers
> "how to write thread-safe lists". From your package, it looks like
> you are in it to learn, so you could step way back from the code and
> learn some theory at this point.
Yeah, when I began writing this program, I did not have much experience
in multithreading. That's the reason that there are much too few mutexes
in the program.
Still, I don't think that's the reason for this bug. The loop in which
it crashes in quite thread-safe.
> Another alternative is to use one big mutex for the whole list.
That is precisely what I have been wanting to implement for a long time.
It's only that it would require an enormous rewrite to implement
everywhere that it should be used.
> The drawback is that walking the list locks the whole list against
> addition and deletion. If your list walker is just "print status
> information" then that is fine. If your list walker does some
> long-lived network operation at each node then it is not fine.
I have, however, made sure that doesn't happen by only using nonblocking
I/O.
Once again, though, I don't think that thread-unsafeness is the reason
for this bug to happen. But I've added checks to that loop now, so I
should discover it sooner or later. Thank you very much for all your
help.