This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Help, any one ever meet hanging on _IO_lock_lock(list_all_lock) issue ?


On Tue, Nov 12, 2013 at 8:03 AM, Wuqixuan <wuqixuan@huawei.com> wrote:
> Hi All ,
>
>     I am using glibc 2.4 and meet problem as below:

That's quite an old glibc. Can you reproduce the problem with master?

>     a)  There are four threads are hanging on _IO_lock_lock(list_all_lock).
>         myfunc
>          -->fopen
>             -->_IO_fopen@@GLIBC_2.1
>                 -->  __fopen_internal
>                    --> _IO_file_init@@GLIBC_2.1  (actually, it calls _IO_link_in--> _IO_lock_lock)
>                       --> _L_lock_335
>                           --> __lll_mutex_lock_wait
>
>         From our analysis, the code is hanging on below:

This might happen if the thread died while writing IO and the lock is
now stuck locked.

> void
> _IO_link_in (fp)
>      struct _IO_FILE_plus *fp;
> {
>   if ((fp->file._flags & _IO_LINKED) == 0)
>     {
>       fp->file._flags |= _IO_LINKED;
> #ifdef _IO_MTSAFE_IO
>       _IO_cleanup_region_start_noarg (flush_cleanup);

This cleanup region should protect against cancellation keeping the lock held.

>       _IO_lock_lock (list_all_lock);                  // ******** 4 thread hanging here. *******

This is normal if the lock is already taken.

>       run_fp = (_IO_FILE *) fp;
>       _IO_flockfile ((_IO_FILE *) fp);
> #endif
>       fp->file._chain = (_IO_FILE *) INTUSE(_IO_list_all);
>       INTUSE(_IO_list_all) = fp;
>       ++_IO_list_all_stamp;
> #ifdef _IO_MTSAFE_IO
>       _IO_funlockfile ((_IO_FILE *) fp);
>       run_fp = NULL;
>       _IO_lock_unlock (list_all_lock);
>       _IO_cleanup_region_end (0);
> #endif
>     }
> }
> INTDEF(_IO_link_in)
>
>     b)  From gdb, member of typedef struct { int lock; int cnt; void *owner; } _IO_lock_t are below:
>          (gdb) x /10x &list_all_lock
>         0xb7da60a8 <list_all_lock>: 0x00000002 0xffffffff 0x8a1f3ba0
>
>            lock: 0x00000002         seems somebody has taken this lock.

Yes.

>            cnt: 0xffffffff                it's very strange cnt is -1. it should be 1.

That is odd. However it could be the result of an unbalanced set of
locks and unlocks. That could result in the problem you're seeing.

The IO lock can be taken recursively incrementing cnt, and
decrementing cnt on unlock.

Once it decrements to 0 the lock is unlocked.

If something corrupted the cnt value then it will not unlock.

e.g.
#define _IO_lock_unlock(_name) \
  do {                                                                        \
    if (--(_name).cnt == 0)                                                   \
      {                                                                       \
        (_name).owner = NULL;                                                 \
        lll_unlock ((_name).lock, LLL_PRIVATE);                               \
      }                                                                       \
  } while (0)

See the `cnt == 0' won't be true and it won't unlock or clear the
owner, and this thread will continue to do something else.

The lock will be leaked at that point.

>            owner: 0x8a1f3ba0       actually the task of this owner is sleeping on the kernel without any lock.

Is it alive? Dead? Backtrace?

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]