This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/5084] New: Calling getifaddrs from multiple threads or processes sometimes takes >100 seconds


One of our customers is seeing hangs in getifaddrs occasionally. TAO, the Corba
implementation we use, calls getifaddrs frequently from multiple threads.

The attached testcase reproduces the problem (unpack and run run-test.sh), if
left running for long enough. It seems to be much more frequent on hosts running
inside VMWare (ESX), but I've seen it on native Linux installations as well (but
much lower frequency). Sample output:

[-1218528336]: Got through 0 iterations
[-1229026384]: Got through 0 iterations
[-1239516240]: Got through 0 iterations
Took 661 seconds to call getifaddrs!
Took 661 seconds to call getifaddrs!
Took 661 seconds to call getifaddrs!
[-1218528336]: Got through 100000 iterations
[-1229026384]: Got through 100000 iterations
[-1239516240]: Got through 100000 iterations
[-1229026384]: Got through 200000 iterations
[-1218528336]: Got through 200000 iterations
[-1239516240]: Got through 200000 iterations
Took 781 seconds to call getifaddrs!
Took 781 seconds to call getifaddrs!
Took 781 seconds to call getifaddrs!

This shows 3 threads taking an unexpectedly long time to complete the call, but
they all complete at the same time.  This is with 4 processor cores.

Changing the attached testcase so NTHREADS is 1 and running more than one
gia-test process also reproduces the bug.

This is on Linux, 32 bit; standard Red Hat Enterprise Linux AS release 3 (Taroon
Update 5). That's Linux 2.4.21-32.ELsmp and glibc from rpm glibc-2.3.2-95.33

pstack from the gia-test process when it is hung shows:

Thread 4 (Thread -1218552912 (LWP 14946)):
#0  0x0084b5de in recvmsg () from /lib/tls/libc.so.6
#1  0x0086835f in netlink_receive () from /lib/tls/libc.so.6
#2  0x008673d9 in getifaddrs () from /lib/tls/libc.so.6
#3  0x080485b3 in testgia ()
#4  0x00487de8 in start_thread () from /lib/tls/libpthread.so.0
#5  0x0084a93a in clone () from /lib/tls/libc.so.6
Thread 3 (Thread -1229050960 (LWP 14947)):
#0  0x0084b5de in recvmsg () from /lib/tls/libc.so.6
#1  0x0086835f in netlink_receive () from /lib/tls/libc.so.6
#2  0x008673d9 in getifaddrs () from /lib/tls/libc.so.6
#3  0x080485b3 in testgia ()
#4  0x00487de8 in start_thread () from /lib/tls/libpthread.so.0
#5  0x0084a93a in clone () from /lib/tls/libc.so.6
Thread 2 (Thread -1241515088 (LWP 14948)):
#0  0x0084b5de in recvmsg () from /lib/tls/libc.so.6
#1  0x0086835f in netlink_receive () from /lib/tls/libc.so.6
#2  0x00867404 in getifaddrs () from /lib/tls/libc.so.6
#3  0x080485b3 in testgia ()
#4  0x00487de8 in start_thread () from /lib/tls/libpthread.so.0
#5  0x0084a93a in clone () from /lib/tls/libc.so.6
Thread 1 (Thread -1218551680 (LWP 14945)):
#0  0x00488c68 in pthread_join () from /lib/tls/libpthread.so.0
#1  0x080486ed in main ()


If I run the process under strace, the problem does not occur (or it's much less
frequent)

-- 
           Summary: Calling getifaddrs from multiple threads or processes
                    sometimes takes >100 seconds
           Product: glibc
           Version: 2.3.2
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
        AssignedTo: drepper at redhat dot com
        ReportedBy: cr at progress dot com
                CC: glibc-bugs at sources dot redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=5084

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]