This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Example ex9 fails on SMP systems


Continuing on this topic....

The patch that fixed ex11's hang does not improve the behavior in ex9. A
script that runs ex9 in a loop will on occasion see ex9 fail with a
SIGSEGV. ex9 running on glibc-2.2.4 would also see an occasional hang but
this seems to be fixed in glibc-2.2.5.

The net is that ex9 will segfault in "exit()" attempting to dereference "
__exit_funcs". "exit()" is not thread safe but in this case it has been
called by two different threads. The flight recorder trace shows this
clearly:

             pthread-
entry addr   descr addr value 1    value 2    entry type
------------ ---------- ---------- ---------- --------------------------
<0x001ab300> 0xffffe768 0x00000000 0xa617f135 exit
<0x001ab310> 0xffffe768 0x1009ec4c 0xa617f1d8 exit func
<0x001ab320> 0xffffe768 0x1009ec64 0xa617f2b9 exit idx
<0x001ab330> 0xffffe768 0x1009ec54 0xa617f42f exit idx
<0x001ab340> 0x10099b60 0x00000000 0xa617f482 pthread_onexit_process
<0x001ab350> 0x10099b60 0x00000020 0xa618091e
pthread_wait_for_restart_signal

<0x001ab4b0> 0xff5ffae8 0x00000000 0xa61834ed exit
<0x001ab4c0> 0xff5ffae8 0x1009ec4c 0xa6183512 exit func
<0x001ab4d0> 0xff5ffae8 0x00000000 0xa6183611 exit atexit
<0x001ab4e0> 0xff5ffc00 0x1009c390 0xa6183774 pthread_mutex_lock

<0x001ab5d0> 0xff5ffc00 0x1009c520 0xa6183fa2 pthread_mutex_lock exit

<0x001ab690> 0x10099b60 0x00000000 0xa6188397 pthread_handle_exit

<0x001ab7d0> 0xff5ffae8 0x00000000 0xa6198258 exit exit

<0x001ab870> 0x10099b60 0x00000000 0xa61ac5cc pthread_handle_exit exit
<0x001ab880> 0x10099b60 0x00000020 0xa61b0d8e pthread_handle_sigrestart
<0x001ab890> 0x10099b60 0x00000000 0xa61b15f7
pthread_wait_for_restart_signal exit
<0x001ab8a0> 0x10099b60 0x1009cc20 0xa61b3170 pthread_mutex_lock
<0x001ab8d0> 0x10099b60 0x1009cc20 0xa61b34ff pthread_mutex_lock exit
<0x001ab8e0> 0x10099b60 0x1009cc20 0xa61b549d pthread_mutex_unlock
<0x001ab8f0> 0x10099b60 0x1009cc30 0xa61b5703 pthread_alt_unlock
<0x001ab900> 0x10099b60 0x00000000 0xa61b58df pthread_onexit_process exit

The Main thread (stack address 0xffffe768, pthread descr address
0x10099b60) has called exit() and exit() has called pthread_onexit_process.
pthread_onexit_process sends a REQ_PROCESS_EXIT request to the
pthread_manager and suspends.

Mean while thread 0xff5... calls exit() zips through the __exit_funcs while
loop (leaving __exit_funcs == NULL) and calls IO_cleanup code before
calling _exit().

Finally the name thread gets control back from the suspend and continues
pthread_onexit_process and returns to exit(). exit() promptly segfaults on
the statement:

      __exit_funcs = __exit_funcs->next;

because thread 0xff5... has already set __exit_funcs to null.

But we already knew that exit() is not thread safe. The real question is
why is it called by two different threads. Looking at the "thread()"
function from ex9.c we see:

   static void *
   thread (void *arg)
   {
     int i;
     pthread_t self = pthread_self ();
     static pthread_t last_serial_thread;
     static int linecount; /* protected by flockfile(stdout) */

     for (i = 0; i < NUM_ITERS; i++)
       {
         switch (pthread_barrier_wait (&barrier))
      {
      case 0:
        flockfile (stdout);
        printf ("%04d: non-serial thread %lu\n", ++linecount,
              (unsigned long) self);
        funlockfile (stdout);
        break;
      case PTHREAD_BARRIER_SERIAL_THREAD:
        flockfile (stdout);
        printf ("%04d: serial thread %lu\n", ++linecount,
              (unsigned long) self);
        funlockfile (stdout);
        last_serial_thread = self;
        break;
      default:
        /* Huh? */
        error (EXIT_FAILURE, 0, "unexpected return value from barrier
   wait");
      }
       }

     if (pthread_equal (self, last_serial_thread))
     {
       flockfile (stdout);
       printf ("%04d: last serial thread %lu terminating process\n",
          ++linecount, (unsigned long) self);
       funlockfile (stdout);
       exit (0);
     }

     pthread_exit(NULL);
   }

The test "pthread_equal (self, last_serial_thread)" is used to detect that
the current thread is presumably the main thread and calls exit(). [The
main thread of ex9.c creates 10 pthreads then call the "thread function".
So the are effectively 11 threads involved in this pthread_barrier_wait
loop]

This test is not thread safe and does not guarantee that other threads have
completed or prevent last_serial_thread from being changed by any other
thread that happens to be the "SERIAL" thread.

It would be thread safe if the "last_serial_thread" would simply return. If
it is the main thread it will return to main() where it can execute a
pthread_join() loop to insure that all other thread complete before main
calls exit().

The following patch to ex9.c has run 10000 iterations without failure on a
4-way PowerPC:

>>>>>>>>>>>>>>
diff -rc2P glibc-2.2.5/ChangeLog glibc-2.2.5-pthreads/ChangeLog
*** glibc-2.2.5/ChangeLog     Sun Jan 20 21:20:18 2002
--- glibc-2.2.5-pthreads/ChangeLog  Fri Apr 26 14:58:58 2002
***************
*** 1,2 ****
--- 1,8 ----
+ 2002-04-26  Steven Munroe  <sjmunroe@us.ibm.com>
+
+     * linuxthreads/Examples/ex9.c
+     ex9 thread function is not thread safe and can call exit() from
+     two or more threads.
+
  2002-01-18  Andreas Schwab  <schwab@suse.de>


diff -rc2P glibc-2.2.5/linuxthreads/Examples/ex9.c
glibc-2.2.5-pthreads/linuxthreads/Examples/ex9.c
*** glibc-2.2.5/linuxthreads/Examples/ex9.c     Tue Jun 20 23:32:01 2000
--- glibc-2.2.5-pthreads/linuxthreads/Examples/ex9.c  Fri Apr 26 14:55:51
2002
***************
*** 33,37 ****
  main (void)
  {
!   pthread_t th;
    int i;

--- 33,38 ----
  main (void)
  {
!   pthread_t th;
!   pthread_t thread_list[NUM_THREADS];
    int i;

***************
*** 41,50 ****
    for (i = 0; i < NUM_THREADS; i++)
      {
!       if (pthread_create (&th, NULL, thread, NULL) != 0)
      error (EXIT_FAILURE, 0, "cannot create thread");
      }

    (void) thread (NULL);
!   /* notreached */
    return 0;
  }
--- 42,56 ----
    for (i = 0; i < NUM_THREADS; i++)
      {
!       if (pthread_create (&thread_list[i], NULL, thread, NULL) != 0)
      error (EXIT_FAILURE, 0, "cannot create thread");
      }

    (void) thread (NULL);
!
!   for (i = 0; i < NUM_THREADS; i++)
!     {
!       pthread_join(thread_list[i], NULL);
!     }
!
    return 0;
  }
***************
*** 88,92 ****
          ++linecount, (unsigned long) self);
      funlockfile (stdout);
!     exit (0);
    }

--- 94,100 ----
          ++linecount, (unsigned long) self);
      funlockfile (stdout);
!     return;
!
! /*    exit (0); */
    }

<<<<<<<<<<<<<<



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]