Bug 13119 - Erroneous "libgcc_s.so.1 must be installed for pthread_cancel to work" message
Summary: Erroneous "libgcc_s.so.1 must be installed for pthread_cancel to work" message
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: nptl (show other bugs)
Version: 2.14
: P2 minor
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-08-19 21:35 UTC by Tavian Barnes
Modified: 2016-10-14 20:20 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tavian Barnes 2011-08-19 21:35:20 UTC
This is a bit awkward to reproduce, but I found it when attempting to run my raytracer with a ridiculous number (50,000) of threads.  Here's a reduced testcase:

$ cat foo.c#include <pthread.h>

void *
bar(void *ptr)
{
  return NULL;
}

void *
foo(void *ptr)
{
  pthread_t thread;
  while (1) {
    if (pthread_create(&thread, NULL, bar, NULL) != 0) {
      pthread_exit(NULL);
    }
  }
  return NULL;
}

int
main()
{
  pthread_t thread;
  pthread_create(&thread, NULL, foo, NULL);
  pthread_join(thread, NULL);
  return 0;
}
$ gcc foo.c -pthread && ./a.out
libgcc_s.so.1 must be installed for pthread_cancel to work

The error comes from nptl/sysdeps/pthread/unwind-forcedunwind.c, which tries to "__libc_dlopen (LIBGCC_S_SO);".  But in elf/dl-load.c, _dl_map_object_from_fd(), line 1281, the __mmap() fails with ENOMEM, presumably because the thousands of zombie threads have left no available memory.  Thus the dlopen fails, and that message gets printed.

Obviously this isn't a very important issue, but the error message is not exactly informative, since there's no usage of pthread_cancel() anywhere and libgcc_s.so.1 is certainly installed.
Comment 1 Florian Weimer 2014-06-27 12:23:55 UTC
Read barriers have been added to the cancellation initialization code.  Can you check if this fixes this issue?  If it does not, fixing this would have to use the double-checked locking idiom to prevent duplicating work.
Comment 2 Tavian Barnes 2014-06-27 18:09:59 UTC
Still reproduces against git glibc from today.
Comment 3 Vlad Frolov 2016-02-12 14:28:32 UTC
This bug is still reproducible on glibc 2.22.
Comment 4 Adhemerval Zanella 2016-10-14 20:20:30 UTC
This is not really due the fact process has no virtual memory left neither due synchronization issues on libgcc_s.so loading, but rather due Linux overcommit making subsequent mprotect on returned mmap memory fail with ENOMEM.

With vm.overcommit_memory=0 (default on my system) I get:

[pid 28817] mmap(NULL, 2185488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fc0b850b000
[pid 28817] mprotect(0x7fc0b8521000, 2093056, PROT_NONE) = -1 ENOMEM (Cannot allocate memory)
[pid 28817] mmap(0x7fc0b8720000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = -1 ENOMEM (Cannot allocate memory)

Not considering overcommit this should fail, since mmap requested a memory of size 0x215910 (2185488) and kernel returned 0x7fc0b850b000.  So a mprotect of (0x7fc0b8521000, 0x7fc0b8720000) should not fail since it is within (0x7fc0b850b000, 0x7fc0b8720910).

The test pass with vm.overcommit_memory=2 (always check, never overcommit):

[pid 29891] mmap(NULL, 2185488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f71c0028000
[pid 29891] mprotect(0x7f71c003e000, 2093056, PROT_NONE) = 0
[pid 29891] mmap(0x7f71c023d000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x7f71c023d000
[...]
[pid 29891] exit(0)                     = ?
[pid 29889] <... futex resumed> )       = 0
[pid 29891] +++ exited with 0 +++
exit_group(0) 

So I do not think this is an issue with glibc itself and afaik there is not non-portable mmap flag to use to avoid overcommit in such mmap call.