This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

race condition in dl code

From: Sripathi Kodi <sripathik at in dot ibm dot com>
To: libc-alpha at sources dot redhat dot com
Cc: dino at in dot ibm dot com
Date: Wed, 6 Sep 2006 15:22:06 +0530
Subject: race condition in dl code
Reply-to: sripathik at in dot ibm dot com

Hi,

We  are seeing  a race  condition in  dl code  resulting in  our application
crashing  with sigsegv.  We have  seen the  problem only  while running  -rt
(CONFIG_PREEMPT_RT)  kernels. We  have seen  the problem  on glibc-2.4.90-17
from FC6 as well as latest glibc from cvs.

Backtrace of the application thread that crashes is as follows. It is a Java
based testcase.

(gdb) bt
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x4f14c546 in __nanosleep_nocancel () from /lib/libc.so.6
#2  0x4f14c37b in sleep () from /lib/libc.so.6
#3  0xb7e65724 in Hack_print_info_before_abort (whichCase=0xb7e71898 "CASE 1: not-SI_USER", signal=11, sigInfo=0xb7ee4e0c,
    context=0xb7ee4e8c, portLibType=4, previousSignal=0x0) at j9signal.c:414
#4  0xb7e665dc in masterSynchSignalHandler (signal=11, sigInfo=0xb7ee4e0c, contextInfo=0xb7ee4e8c) at j9signal.c:1134
#5  <signal handler called>
#6  0x4ef694ad in do_lookup_x () from /lib/ld-linux.so.2
#7  0x4ef69883 in _dl_lookup_symbol_x () from /lib/ld-linux.so.2
#8  0x4ef6d8c4 in _dl_fixup () from /lib/ld-linux.so.2
#9  0x4ef72cb0 in _dl_runtime_resolve () from /lib/ld-linux.so.2
#10 0xb7ee9058 in thread_wrapper (arg=0x805bd5c) at j9thread.c:933
#11 0x4b359f9a in start_thread () from /lib/libpthread.so.0
#12 0x4f18abae in clone () from /lib/libc.so.6

Our analysis:
_dl_fixup  passes   link_map  structure  (l)   as  well  as   l->l_scope  to
_dl_lookup_symbol_x. In  the link_map structure  passed to fixup,  the field
l_scope_mem[4]  is  the  statically  allocated  memory  for  the  first  few
'r_scope_elem's. When this gets filled  up, more r_scope_elems are allocated
and attached at 'l_scope'. Memory pointed  by l_scope too can get exhausted.
When this  happens, this  memory is  doubled using realloc  in this  part of
dl_open_worker() function:

              else
                {
                  newp = (struct r_scope_elem **)
                    realloc (imap->l_scope,
                             new_size * sizeof (struct r_scope_elem *));
                  if (newp == NULL)
                    _dl_signal_error (ENOMEM, "dlopen", NULL,
                                      N_("cannot create scope list"));
                  imap->l_scope = newp;
                }

              imap->l_scope_max = new_size;
            }

So this  reallocs the  memory for  l_scope (doubling  it's size)  and points
l_scope to  the new  area of  memory. Old memory  is now  free for  reuse by
malloc code.

Sequence of events happening in our testcase is:
i) Thread  A calls  _dl_lookup_symbol_x from _dl_fixup,  passing it  'l' and
'l->l_scope' as parameters. It is preempted before _dl_lookup_symbol_x runs.

ii) Thread B does (possibly many)  dlopen()s, it exhausts the memory pointed
to by l_scope. Hence dl_open_worker() reallocs this memory.

iii) Memory  previously allocated  to l_scope is  reused. However,  thread A
still  has a  reference to  this area  of memory  through l_scope  passed to
_dl_lookup_symbol_x.

iv) Thread  A tries  to access memory  pointed to by  it's copy  of l_scope,
hence gets a SEGV.

Occasionally, when  it doesn't  get a  SEGV, thread A  fails to  resolve the
symbol  and prints  out a  message like  "undefined symbol:  printf, version
GLIBC_2.0"

We  feel that  l_scope will  have to  be protected  with a  lock (preferably
read-write lock) to solve this problem. The idea is to make the thread doing
dlopen  to hold  a write  lock on  l_scope before  doing realloc.  This will
prevent  memory being  freed from  under the  feet of  reader threads  doing
symbol lookup holding read  lock on l_scope. I can send such  a patch if the
idea is acceptable. If the idea of introducing a lock is not good, could you
please suggest any other way to solve this problem?

Thanks and regards,
Sripathi Kodi.

Follow-Ups:
- Re: race condition in dl code
  - From: Sripathi Kodi

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]