This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Need help to interpret pthread_mutex_t structure as printed by gdb

From: Olle Blomgren <ollemblomgren at gmail dot com>
To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
Cc: libc-help at sourceware dot org
Date: Wed, 17 Aug 2016 00:53:03 +0200
Subject: Re: Need help to interpret pthread_mutex_t structure as printed by gdb
Authentication-results: sourceware.org; auth=none
References: <CAF5YupLKz-CWdAp-X3Gdg+d_B9ijsjdOR0sFtoUhB4j6AOJZEw@mail.gmail.com> <e9dc7779-c3ba-ce74-148f-f1cdba8ecb25@linaro.org> <CAF5Yup+EuyjZVgdrQQLU+=PfE11Om9ceb4xUF6QniCiOkBxEtA@mail.gmail.com> <f2184f9f-c2d0-231e-10f3-2ebf894eca3f@linaro.org> <CAF5YupJWXY75qYfe+4Q+Xtyy5uHEOaixaAzCHWSnABzwrj8LpQ@mail.gmail.com> <7ee3aed8-90df-85c5-6122-bb36966493f4@linaro.org>

I have test case that fails 9 out of 10 times on my machine. Do I send
the code and instructions to you on this mail address? The code will
be GPL:ed before release but I don't want it to float around on he
Internet just yet. I want to send it outside this conversation if
that's ok?

On Tue, Aug 16, 2016 at 10:44 PM, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
> For thread race conditions checks I usually ran with Thread Sanitizer
> (both gcc and clang uses it).
>
> But again, bug reports for runtime issues in default usage modes are
> usually debugged with *reproducible* testcase. It could be either a snippet
> that stresses the issue or a more complex scenario on how to trigger the
> issue.
>
> What it is somewhat frustrating is report with partial debug information
> where one must constantly assume or guess scenarios about the origin
> environment.
>
> So if you could provide a way to reproduce this issue it would be
> *really* helpful to check what it is really happening.
>
> On 16/08/2016 16:53, Olle Blomgren wrote:
>> I have now run valgrind tools memcheck, helgrind and drd on both
>> client and server. The latter two are intended for MT programs using a
>> pthread implementation. When I run the tools on the client side, the
>> hang never occurs. When I run the tools on the server the hang occurs
>> as usual. Although the hang occurs there is nothing in the logs that
>> suggests an error that would cause the hang.
>>
>> When I use the helgrind tool it is possible to attach to the valgrind
>> GDB server with a GDB client and query all the known mutexes and ask
>> to see if any are held. This query reports 0 locked mutexes although
>> the client hangs on a mutex it believes is locked, as I pass the query
>> (monitor info locks).
>>
>> I have also run the Linux GRUB memtest86+ to ensure my RAM is
>> physically ok, and it is.
>>
>> Do you have any debug tools or any other hints on what I can try to
>> crack this on my own?  If not, I would really appreciate sending my
>> code to build and run on an x86_64 machine at the hands of someone
>> more knowledgeable than myself.
>>
>>
>> On Mon, Aug 15, 2016 at 11:45 PM, Adhemerval Zanella
>> <adhemerval.zanella@linaro.org> wrote:
>>>
>>>
>>> On 15/08/2016 18:14, Olle Blomgren wrote:
>>>>> Could you provide the glibc version, a complete backtrace of both processes
>>>>> and if possible a testcase?
>>>>
>>>> I use glibc-2.23.
>>>> I have downloaded the source and built it with debug symbols. Full
>>>> back traces follows below.
>>>> Client back trace + printout of offending mutex. (This is the one that hangs)
>>>> -------------------------------------------------------------------------------------------------------------------
>>>> Thread 3 (Thread 0x7ff8d1fcc700 (LWP 14005)):
>>>> #0  0x00007ff8d231b82d in poll () at ../sysdeps/unix/syscall-template.S:84
>>>> #1  0x00007ff8d2b39434 in _daemon_monitor (attr=0x7ff8d2da8300
>>>> <_mon_thr_attr>) at mam.c:1346
>>>> #2  0x00007ff8d2db03e4 in start_thread (arg=0x7ff8d1fcc700) at
>>>> pthread_create.c:333
>>>> #3  0x00007ff8d23248ad in clone () at
>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>>
>>>> Thread 2 (Thread 0x7ff8d15cb700 (LWP 14006)):
>>>> #0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>>>> #1  0x00007ff8d2db2a75 in __GI___pthread_mutex_lock
>>>> (mutex=0x7ff8d16ac810) at ../nptl/pthread_mutex_lock.c:80
>>>> #2  0x00007ff8d2b1f962 in lch_lock (name=919552, type=LCH_EXCL,
>>>> tx=0x7ff8d3446080, ctx=0x7ff8c4001110)
>>>>     at lch.c:124
>>>> #3  0x00007ff8d2b40c1c in _get_page_get_entry (entry=0x7ff8d15cab80,
>>>> nbr_bytes=8, ctx=0x7ff8c4001110)
>>>>     at pge.c:1194
>>>> #4  0x00007ff8d2b41b36 in pge_alloc_entry (entry=0x7ff8d15cab80,
>>>> nbr_bytes=8, page_sz=1 '\001',
>>>>     ctx=0x7ff8c4001110) at pge.c:1416
>>>> #5  0x00007ff8d2b5d0bf in api_int_setup_entry
>>>> (entry_arr=0x7ff8d15cacb8, page_ptr=0x7ff8d15cac88,
>>>>     page_ref=0x7ff8d15cac80, lr=0x7ff8d15cad50, meta=0xf68d50,
>>>> tx=0x7ff8d3446080, nbr_entries=1, sct=0,
>>>>     ctx=0x7ff8c4001110) at api_int.c:47
>>>> #6  0x00007ff8d2b5b410 in hope_new (tx=0x7ff8c4000cd0,
>>>> elm_type=0x40b1c9 "NoColl", nv_list=0x40adf9 "tjing")
>>>>     at hope.c:2495
>>>> #7  0x0000000000405d6d in _new1 (attr=0x0) at api_tests.c:1579
>>>> #8  0x00007ff8d2db03e4 in start_thread (arg=0x7ff8d15cb700) at
>>>> pthread_create.c:333
>>>> #9  0x00007ff8d23248ad in clone () at
>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>>
>>>> Thread 1 (Thread 0x7ff8d34e0700 (LWP 14004)):
>>>> #0  0x00007ff8d2db164d in pthread_join (threadid=140706641131264,
>>>> thread_return=0x0) at pthread_join.c:90
>>>> #1  0x0000000000406252 in _scan_1new40000_1read_1lockmod5remove_3thr
>>>> (tc=30, print_desc_only=0)
>>>>     at api_tests.c:1682
>>>> #2  0x000000000040a1b5 in _all_test_cases (tc=30, nbr_updates=0,
>>>> start_value=0, name=0x0,
>>>>     ref=18446744073709551615, print_desc_only=0) at api_tests.c:2981
>>>> #3  0x000000000040a784 in main (argc=4, argv=0x7fffed178988) at api_tests.c:3084
>>>> (gdb) t 2
>>>> [Switching to thread 2 (Thread 0x7ff8d15cb700 (LWP 14006))]
>>>> #1  0x00007ff8d2db2a75 in __GI___pthread_mutex_lock
>>>> (mutex=0x7ff8d16ac810) at ../nptl/pthread_mutex_lock.c:80
>>>> 80          LLL_MUTEX_LOCK (mutex);
>>>> (gdb) f 1
>>>> #1  0x00007ff8d2db2a75 in __GI___pthread_mutex_lock
>>>> (mutex=0x7ff8d16ac810) at ../nptl/pthread_mutex_lock.c:80
>>>> 80          LLL_MUTEX_LOCK (mutex);
>>>> (gdb) p *mutex
>>>> $2 = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0,
>>>> __kind = 0, __spins = 0, __elision = 0,
>>>>     __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats
>>>> 39 times>, __align = 0}
>>>>
>>>>
>>>> Server back trace (All threads are just waiting around.)
>>>> ------------------------------------------------------------------------
>>>> Thread 6 (Thread 0x7f0686b45700 (LWP 14094)):
>>>> #0  pthread_cond_wait@@GLIBC_2.3.2 () at
>>>> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>>>> #1  0x00007f0687838d8b in migrator (attr=0x7f0687c4f758) at hsh_int.c:207
>>>> #2  0x00007f06875fb3e4 in start_thread (arg=0x7f0686b45700) at
>>>> pthread_create.c:333
>>>> #3  0x00007f0686e2d8ad in clone () at
>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>>
>>>> Thread 5 (Thread 0x7f0684b41700 (LWP 14098)):
>>>> #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
>>>> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
>>>> #1  0x00007f0687896555 in _do_checkpoint (attr=0x0) at ckp.c:1411
>>>> #2  0x00007f06875fb3e4 in start_thread (arg=0x7f0684b41700) at
>>>> pthread_create.c:333
>>>> #3  0x00007f0686e2d8ad in clone () at
>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>>
>>>> Thread 4 (Thread 0x7f0685342700 (LWP 14097)):
>>>> #0  pthread_cond_wait@@GLIBC_2.3.2 () at
>>>> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>>>> #1  0x00007f0687838d8b in migrator (attr=0x7f0687c51280) at hsh_int.c:207
>>>> #2  0x00007f06875fb3e4 in start_thread (arg=0x7f0685342700) at
>>>> pthread_create.c:333
>>>> #3  0x00007f0686e2d8ad in clone () at
>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>>
>>>> Thread 3 (Thread 0x7f0685b43700 (LWP 14096)):
>>>> #0  pthread_cond_wait@@GLIBC_2.3.2 () at
>>>> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>>>> #1  0x00007f06878541bc in _rm_log (attr=0x0) at tlg.c:1527
>>>> #2  0x00007f06875fb3e4 in start_thread (arg=0x7f0685b43700) at
>>>> pthread_create.c:333
>>>> #3  0x00007f0686e2d8ad in clone () at
>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>>
>>>> Thread 2 (Thread 0x7f0686344700 (LWP 14095)):
>>>> #0  pthread_cond_wait@@GLIBC_2.3.2 () at
>>>> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>>>> #1  0x00007f0687853110 in flush_log_buffer (attr=0x0) at tlg.c:1385
>>>> #2  0x00007f06875fb3e4 in start_thread (arg=0x7f0686344700) at
>>>> pthread_create.c:333
>>>> #3  0x00007f0686e2d8ad in clone () at
>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>>
>>>> Thread 1 (Thread 0x7f0687cef700 (LWP 14093)):
>>>> #0  0x00007f0686e264f3 in select () at ../sysdeps/unix/syscall-template.S:84
>>>> #1  0x00007f068785dcf0 in _daemon_server_loop (attr=0x7ffff2d255d0) at mam.c:730
>>>> #2  0x00007f068785e1bc in mam_daemon_server_run (ctx=0x7ffff2d255d0)
>>>> at mam.c:860
>>>> #3  0x00007f0687870053 in dae_init (instance=0x212d180
>>>> "/tmp/func_api", force=1, ctx=0x7ffff2d255d0)
>>>>     at dae.c:297
>>>> #4  0x00007f068785bf4a in _do_start_daemon (ctx=0x7ffff2d255d0) at clu.c:320
>>>> #5  0x00007f068785c388 in clu_execute (ctx=0x7ffff2d255d0) at clu.c:449
>>>> #6  0x0000000000400994 in main (argc=6, argv=0x7ffff2d25b58) at db.c:21
>>>>
>>>> I can see that it is always server thread #5 that interferes with the
>>>> hanging client thread in the following way:
>>>> Server thread acquires mutex X.
>>>> Client thread acquires mutex X (it sits waiting for the server thread
>>>> to release mutex X).
>>>> Server thread releases mutex X and continues its work...
>>>> Client thread never wakes up. It continues to hang on mutex X.
>>>>
>>>> I have written a test program to try to provoke the same access
>>>> pattern but I cannot make the same thing happen.
>>>> I can see that the same access pattern:
>>>> 1) server acquires mutex X
>>>> 2) client acquires mutex X
>>>> 3) server releases mutex X
>>>> 4) client acquires mutex X
>>>> completes successfully numerous times before the client hangs as shown above.
>>>>
>>>> Can you give me hints on how to continue further or do you want me to
>>>> send my code to you?
>>>> As seen above the hardware is x86_64.
>>>>
>>>> Brs
>>>> /Olle
>>>
>>> I think either a testcase or a way to reproduce this issue would be the
>>> way to proceed since current log is still inconclusive.  I tried to check
>>> on sourceware bug database to check some fixed issue since 2.19, but you
>>> seemed to already checked with a more recent one.
>>>
>>> I strongly recommend you to try first a memory checker (valgrind, asan)
>>> and/or a thread checker (tsan) to see if you are doing something wrong.

Follow-Ups:
- Re: Need help to interpret pthread_mutex_t structure as printed by gdb
  - From: Adhemerval Zanella

References:
- Need help to interpret pthread_mutex_t structure as printed by gdb
  - From: Olle Blomgren
- Re: Need help to interpret pthread_mutex_t structure as printed by gdb
  - From: Adhemerval Zanella
- Re: Need help to interpret pthread_mutex_t structure as printed by gdb
  - From: Olle Blomgren
- Re: Need help to interpret pthread_mutex_t structure as printed by gdb
  - From: Adhemerval Zanella
- Re: Need help to interpret pthread_mutex_t structure as printed by gdb
  - From: Olle Blomgren
- Re: Need help to interpret pthread_mutex_t structure as printed by gdb
  - From: Adhemerval Zanella

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]