This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
答复: Help, any one ever meet hanging on _IO_lock_lock(list_all_lock) issue ?
- From: Wuqixuan <wuqixuan at huawei dot com>
- To: "Carlos O'Donell" <carlos at systemhalted dot org>
- Cc: "libc-help at sourceware dot org" <libc-help at sourceware dot org>, "schwab at redhat dot com" <schwab at redhat dot com>, Jiazhenghua <jiazhenghua at huawei dot com>, "Liuyong (John)" <john dot liuyong at huawei dot com>
- Date: Sat, 16 Nov 2013 01:18:07 +0000
- Subject: 答复: Help, any one ever meet hanging on _IO_lock_lock(list_all_lock) issue ?
- Authentication-results: sourceware.org; auth=none
- References: <BB7C62C2B0732E4DA93834A501E846456C8D8003 at szxema505-mbx dot china dot huawei dot com> <CAE2sS1ishHhT+LEqHkcadXyP4wBeWJFGRMroLmVQGrMEBMD9tg at mail dot gmail dot com> <BB7C62C2B0732E4DA93834A501E846456C8D8023 at szxema505-mbx dot china dot huawei dot com> <BB7C62C2B0732E4DA93834A501E846456C8D80D2 at szxema505-mbx dot china dot huawei dot com>,<CAE2sS1hQG7m3fKsqyqTXi-izB5cWM0ruqTw5Z2RofQH64-M+VQ at mail dot gmail dot com>
>> Because the issue happened in my side only once, but cannot be reproduced. Now the env is not there.
>>
>> Yes, if cnt value is corrupted, nobody can use this lock anymore. But do you know in our case how the cnt value is corrupted and how to reproduced ? I guess there is some other bug to cause the unbalance set of locks and unlocks in glibc 2.4. Do you know what's that?
> I don't know what it is or I would have fixed it :-)
>> We found http://sourceware.org/git/?p=glibc.git;a=commit;h=7583a88d1c7170caad26966bcea8bfc2c92093ba which is fixed by schwab.
>> The patch seems telling flush_cleanup has bug and possibility to corrupt cnt. Do you know prevously what was the exact issue when we want to fix it?
> I don't know what was previously wrong, I would have to review the code.
> Andreas can best comment on that.
We found a discussion which can cause this problem and Andreas also was on that, so we guess below
discussion cause Andreas to make the patch ( can Andreas confirm?) .
http://sourceware-org.1504.n7.nabble.com/PATCH-Fix-possible-deadlock-in-stdio-locking-code-td6853.html
Also, we analyze the code that thread(A) call pthread_cancel and thread(B) call fcloseall at the same time,
can reproduce this issue.
Thread A Thread B
fcloseall
_IO_cleanup
pthread_cancel(B) _IO_flush_all_lockp(0)
_IO_cleanup_region_start_noarg (flush_cleanup)
(here register flush_cleanup, if we did the patch, flush_cleanup will not be registered)
write (here is a cancel point, so it check somebody cancel me)
call flush_cleanup, (list_all_lock->cnt reduce to -1, but occur)
thread exit.
So we guess our problem is similiar with it.
Anyway, thank you for your help.
Thanks a lot & Regards
Wuqixuan