This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Re: bug in spinlock.c?
- From: Kaz Kylheku <kaz at ashi dot footprints dot net>
- To: Andreas Jaeger <aj at suse dot de>
- Cc: libc-alpha at sources dot redhat dot com, Karsten Keil <kkeil at suse dot de>
- Date: Fri, 21 Feb 2003 08:27:37 -0800 (PST)
- Subject: Re: bug in spinlock.c?
On Fri, 21 Feb 2003, Andreas Jaeger wrote:
> Date: Fri, 21 Feb 2003 13:54:13 +0100
> From: Andreas Jaeger <aj at suse dot de>
> To: libc-alpha at sources dot redhat dot com
> Cc: Karsten Keil <kkeil at suse dot de>
> Subject: [libc-alpha] bug in spinlock.c?
>
>
> Looking at the ex18 hang (sometimes ex18 even segfaulted) on x86-64,
> Karsten noticed that we allocate a struct wait_node in
> __pthread_alt_lock on the stack - and put it somehow also on the list
> of waiting nodes.
>
> In __pthread_alt_unlock we go through the waiting nodes and deque it.
>
> This looks broken, since we allocate something on the stack of a
> function and leave the function with this data hanging around.
>
> Can somebody confirm this? Or do you have other ideas that would
> explain the segfaults we noticed? gdb pointed to this code,
Let me respond to this because I designed this little mousetrap.
I introduced wait nodes specifically to handle timeouts. The problem
with timed-out locks is that the spontaneous wakeup of the timed-out
operation has no easy way to remove a node from the middle of the list,
and so it must just abandon it there to be ``garbage collected'' later.
But you can't do that with the thread descriptor itself! Solution: use
a dynamically allocated node which points to the thread.
For waits that wake up normally, these nodes can be stack allocated,
so there isn't a memory allocation penalty for code that doesn't call
the timed-out operation.
This is okay, because while the thread suspends, its stack remains
stable. And since it's a non-timeout wait, the thread does not
wake up spontaneously. The lock owner chooses it, removes it from the
queue, and then wakes it up. So the stack-allocated node is no longer
in the queue by the time the function returns.
If there is some segfault caused by this code, it's some implementation
problem, not a design problem.