This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] malloc: Re-add protection for recursive calls to __malloc_fork_lock_parent
- From: Florian Weimer <fweimer at redhat dot com>
- To: Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>, libc-alpha at sourceware dot org
- Date: Wed, 11 May 2016 14:52:21 +0200
- Subject: Re: [PATCH] malloc: Re-add protection for recursive calls to __malloc_fork_lock_parent
- Authentication-results: sourceware.org; auth=none
- References: <57323BCE dot 3070305 at redhat dot com> <1462919631-30048-1-git-send-email-tuliom at linux dot vnet dot ibm dot com>
On 05/11/2016 12:33 AM, Tulio Magno Quites Machado Filho wrote:
I've just understood what you meant by "this could have happened
before". I do agree it was already broken. It was just too difficult
to reproduce it here.
With commit ID 8a727af9, I get:
$ x=0; while ./testrun.sh ./malloc/tst-mallocfork ; do x=$((x + 1)); \
done; echo $? $x
Timed out: killed the child process
0 10
After applying this patch, it runs for hours without failing. But it
clearly doesn't fix it.
However, if you believe the test case is invalid, let's remove it.
I wonder if it requires a new bug report as 8a727af9 has been backported
to glibc 2.23.
We already have a bug for this, I think:
https://sourceware.org/bugzilla/show_bug.cgi?id=19703
I've just attached a more reliable test case to this bug. For me, it is
quite reliable with even quite old glibcs—the test case indicates
delivery of a few signals and then goes into deadlock.
I believe the new test is completely valid because sigusr1_handler calls
only async-signal-safe functions (the old one should really call _exit
instead of exit in the signal handler).
I tried this test with current master and your patch applied on top, and
I still get deadlocks. Can you give this test a try as well?
A true fix for bug 19703 depends on bug 19702 (Provide a flag indicating
whether a thread is in a signal handler), which is not easy to address
because we need an async-signal-safe sigaction/signal function and need
to atomically change the signal handler and its associated flags.
I think I can provide a partial fix for single-threaded programs without
bug 19702. It's going to be rather small but quite ugly.
Florian