This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] Race in _dl_open()


Hi,

One of our application was getting terminated with the following error

"Inconsistency detected by ld.so: dl-open.c: 610: _dl_open: Assertion `_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT' failed!

with glibc-2.4.31. This race seems to be present in the libc I got from the CVS [at code inspection]. We were able to reproduce this consistently after running for 4-5hrs.

Upon debugging we found that it is due to a race between two threads doing a _dl_open().

The scenario is something like this :

In elf/dl-open.c, _dl_open:


/* Make sure we are alone. */ __rtld_lock_lock_recursive (GL(dl_load_lock));

[...]

  int errcode = _dl_catch_error (&objname, &errstring, &malloced,
                                 dl_open_worker, &args);
#ifndef MAP_COPY
  /* We must munmap() the cache file.  */
  _dl_unload_cache ();
#endif

  /* Release the lock.  */
  __rtld_lock_unlock_recursive (GL(dl_load_lock));

^^^^^ This would kick any other thread waiting on the lock.


if (__builtin_expect (errstring != NULL, 0)) { [...] }

assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);

And, if the thread which gets woken up is playing with the same namespace, and sets the r_state to RT_ADD in _dl_map_object_from_fd even before we reach here (truly possible in an SMP system), ( due to getting scheduled out ), we would hit the assert !

In our case we found that the r_state was RT_ADD and which was set by another thread.

So, it is not safe to believe that the r_state won't get changed once we release the lock.

We have tried the following solution for this issue, and it works fine.

Solution: Ensure the consistency in case of a successful dl_open_worker, before releasing the lock ! Our solution ignored the case where dl_open_worker may fail.

Attached patch addresses all the issues and we think this may be a complete solution for the problem.

1) For successful load by dl_open_worker or a failure where we don't have to remove any object from memory ( i.e, we don't have to do a _dl_close to ensure the consistency ), the assertion is performed before unlocking.

2) For the latter case, where we have to do a _dl_close(), the assertion is done just after the _dl_close.



Comments ?


Thanks,


Suzuki K P <suzuki@in.ibm.com>
Linux Technology Center,
IBM Systems & Technology Labs.



Index: glibc-2.4/elf/dl-open.c
===================================================================
--- glibc-2.4.orig/elf/dl-open.c	2006-10-03 11:10:22.000000000 -0700
+++ glibc-2.4/elf/dl-open.c	2006-10-10 15:28:33.000000000 -0700
@@ -554,6 +554,16 @@
   _dl_unload_cache ();
 #endif
 
+  /* Check the consistency state, for a successful load, before we release
+     the lock. Somebody else might come up and modify the states once we 
+     release this lock. In case of the failures, where we need a _dl_close,
+     consistency is achieved only after the _dl_close. So for that case
+     we delay the check until we finish _dl_close.  */
+  if (__builtin_expect (errstring == NULL, 1) 
+      || args.map == NULL)
+    assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
+    
+
   /* Release the lock.  */
   __rtld_lock_unlock_recursive (GL(dl_load_lock));
 
@@ -564,7 +574,8 @@
       size_t len_errstring;
 
       /* Remove the object from memory.  It may be in an inconsistent
-	 state if relocation failed, for example.  */
+	 state if relocation failed, for example. Enusre the conistent
+	 state.  */
       if (args.map)
 	{
 #ifdef USE_TLS
@@ -580,6 +591,7 @@
 #endif
 
 	  _dl_close (args.map);
+          assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
 	}
 
       /* Make a local copy of the error string so that we can release the
@@ -601,13 +613,9 @@
       if (malloced)
 	free ((char *) errstring);
 
-      assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
-
       /* Reraise the error.  */
       _dl_signal_error (errcode, objname, NULL, local_errstring);
-    }
-
-  assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
+    } 
 
 #ifndef SHARED
   DL_STATIC_INIT (args.map);

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]