This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] Race in _dl_open with r_debug.r_state consistency


hi,

One of our application was getting terminated with the following error

"Inconsistency detected by ld.so: dl-open.c: 610: _dl_open: Assertion `_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT' failed!

with glibc-2.4.31. This race seems to be present in the libc I got from the CVS [at code inspection]. We were able to reproduce this consistently after running for 4-5hrs.

Upon debugging we found that it is due to a race between two threads doing a _dl_open().

The scenario is something like this :

In elf/dl-open.c, _dl_open:


/* Make sure we are alone. */ __rtld_lock_lock_recursive (GL(dl_load_lock));

[...]

  int errcode = _dl_catch_error (&objname, &errstring, &malloced,
                                 dl_open_worker, &args);
#ifndef MAP_COPY
  /* We must munmap() the cache file.  */
  _dl_unload_cache ();
#endif

  /* Release the lock.  */
  __rtld_lock_unlock_recursive (GL(dl_load_lock));

^^^^^ This would kick any other thread waiting on the lock.


if (__builtin_expect (errstring != NULL, 0)) { [...] assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT); }

assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);

And, if the thread which gets woken up is playing with the same namespace, and sets the r_state to RT_ADD in _dl_map_object_from_fd even before we reach here (truly possible in an SMP system), ( due to getting scheduled out ), we would hit the assert !

So, it is not safe to believe that the r_state won't get changed once we release the lock.

One of the ways in which this can be fixed is releasing the lock only after we have done the consistency check. Attached patch does the same. This has been tested and does resolve the issue.

Could you please share your thoughts about the issue ? Is there a better way to fix this ?


Thanks,


-Suzuki
Index: glibc-2.4/elf/dl-open.c
===================================================================
--- glibc-2.4.orig/elf/dl-open.c        2005-11-15 05:30:08.000000000 -0800
+++ glibc-2.4/elf/dl-open.c     2006-10-25 19:09:48.527370480 -0700
@@ -553,10 +553,14 @@
   /* We must munmap() the cache file.  */
   _dl_unload_cache ();
 #endif
-
-  /* Release the lock.  */
-  __rtld_lock_unlock_recursive (GL(dl_load_lock));
-
+
+  /* Releasing the lock here would expose a window for other threads
+     to load libs and thus make the state of r_debug, !RT_CONSISTENT.
+     We release the lock only after we have performed the RT_CONSISTENT
+     check.  */
+
+   //__rtld_lock_unlock_recursive (GL(dl_load_lock));
+
   if (__builtin_expect (errstring != NULL, 0))
     {
       /* Some error occurred during loading.  */
@@ -565,6 +569,7 @@

       /* Remove the object from memory.  It may be in an inconsistent
         state if relocation failed, for example.  */
+
       if (args.map)
        {
 #ifdef USE_TLS
@@ -578,9 +583,14 @@
          if ((mode & __RTLD_AUDIT) == 0)
            GL(dl_tls_dtv_gaps) = true;
 #endif
-
+          /* Note that we already hold the dl_load_lock here. But thats ok, since
+             it is a recursive lock ! */
          _dl_close (args.map);
        }
+      /* Now we are sure that the state is consistent. Do the check and
+         release the lock */
+      assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
+      __rtld_lock_unlock_recursive (GL(dl_load_lock));

       /* Make a local copy of the error string so that we can release the
         memory allocated for it.  */
@@ -601,14 +611,16 @@
       if (malloced)
        free ((char *) errstring);

-      assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
-
       /* Reraise the error.  */
       _dl_signal_error (errcode, objname, NULL, local_errstring);
+    }
+  else
+    {
+      /* Successful load, state should be consistent. Release the lock */
+      assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
+      __rtld_lock_unlock_recursive (GL(dl_load_lock));
     }
-
-  assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
-
+
 #ifndef SHARED
   DL_STATIC_INIT (args.map);
 #endif

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]