Bug 1128

Summary: Malloc not trimming filled-up sbrk space causes unnecessary system call overhead
Product: glibc Reporter: Jonathan Paisley <jp-www>
Component: mallocAssignee: GOTO Masanori <gotom>
Status: NEW ---    
Severity: normal CC: bugzilla1, fweimer, glibc-bugs, jtaylor.debian, kkylheku, neleai
Priority: P2 Flags: fweimer: security-
Version: 2.3.5   
Target Milestone: ---   
Host: i686-pc-linux-gnu Target: i686-pc-linux-gnu
Build: i686-pc-linux-gnu Last reconfirmed:
Attachments: test case

Description Jonathan Paisley 2005-07-27 13:46:34 UTC
Overview:

After filling up the sbrk-allocated space, each call to free() incurs a system call to brk(), thus consuming 
unnecessary CPU time.



Further Details:

This thread from libc-hacker:

   "Malloc not trimming sbrk space once it fills up"
   http://sourceware.org/ml/libc-hacker/2004-01/msg00012.html

describes how malloc doesn't trim sbrk space once it has started using an mmap-ed region.

In a program I'm developing that does hundreds of thousands of malloc and free per second, I noticed a 
sudden increase in CPU usage and system calls once this threshold had been reached. It turns out that 
every free() operation ends up invoking the brk() system call to find out what the current limit is. When 
this is compared to the current end-of-memory as known by the malloc library at line 3208 of malloc.c, 
in function sYSTRIm(), no trimming is performed so the brk() system call (via morecore) is called again 
at next free().

Relevant lines from malloc.c:

    /*
      Only proceed if end of memory is where we last set it.
      This avoids problems if there were foreign sbrk calls.
    */
    current_brk = (char*)(MORECORE(0));
    if (current_brk == (char*)(av->top) + top_size) {   // malloc.c:3208

I modified Jakub Jelinek's tst-malloc-trim.c from his original email linked above to demonstrate the 
problem. The output from the program contains "count %d" showing the number of times morecore was 
called when freeing blocks. This should ideally be small. As it is, it tends to be close to the number of 
blocks freed. For example:

Actual Result: "free did not trim sbrk area at all: midsum 540672 sum 540672 count 450"
Expected Result: "free did not trim sbrk area at all: midsum 540672 sum 540672 count 0"

I'm currently doing 400,000 frees per second, which adds a non-trivial overhead from the 400,000 brk
() system calls.



Workaround:

I can workaround this by disabling trimming (since it doesn't work anyway) through mallopt
(M_TRIM_THRESHOLD,-1).



Systems Tested:

Confirmed with glibc-2.3.2 from debian 3.0:
GNU C Library stable release version 2.3.2, by Roland McGrath et al.
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 3.3.5 (Debian 1:3.3.5-12).
Compiled on a Linux 2.6.0-test7 system on 2005-05-10.
Available extensions:
        GNU libio by Per Bothner
        crypt add-on version 2.1 by Michael Glad and others
        linuxthreads-0.10 by Xavier Leroy
        BIND-8.2.3-T5B
        libthread_db work sponsored by Alpha Processor Inc
        NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk
Thread-local storage support included.

Also with self-compiled glibc-2.3.5 on debian 3.0:
GNU C Library stable release version 2.3.5, by Roland McGrath et al.
Copyright (C) 2005 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 3.3.5 (Debian 1:3.3.5-13).
Compiled on a Linux 2.6.0-test7 system on 2005-07-27.
Available extensions:
        GNU libio by Per Bothner
        crypt add-on version 2.1 by Michael Glad and others
        Native POSIX Threads Library by Ulrich Drepper et al
        BIND-8.2.3-T5B
        NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk
Thread-local storage support included.




test-malloc-trim2.c
==SNIP==
/* Copyright (C) 2004 Free Software Foundation, Inc.
   This file is part of the GNU C Library.
   Contributed by Jakub Jelinek <jakub@redhat.com>, 2004.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, write to the Free
   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
   02111-1307 USA.  */

#include <malloc.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>

ptrdiff_t sum;
int count;

__malloc_ptr_t
counting_morecore (ptrdiff_t size)
{
  sum += size;
  count++;
  return __default_morecore (size);
}

int
main (void)
{
  size_t pagesize = sysconf (_SC_PAGESIZE);

  char *p = sbrk (0);
  p += (128 * pagesize) & ~(pagesize - 1);
  if (mmap (p, pagesize, PROT_NONE, MAP_PRIVATE|MAP_ANON, -1, 0) != p)
    {
      puts ("Couldn't test");
      return 0;
    }

  if (mallopt (M_TRIM_THRESHOLD, 32 * pagesize) == 0)
    {
      puts ("Couldn't test");
      return 0;
    }

  // uncomment to enable work-around
  //mallopt(M_TRIM_THRESHOLD, (unsigned long) -1);

  __morecore = counting_morecore;

  char *array[512];
  int i;
  for (i = 0; i < 512; ++i)
    array[i] = malloc (pagesize / 4);

  ptrdiff_t midsum = sum;
 
  count = 0;
  for (i = 511; i >= 0; --i)
    free (array[i]);

  __morecore = __default_morecore;

  if (midsum == sum && midsum > 32 * pagesize)
    {
      printf ("free did not trim sbrk area at all: midsum %td sum %td count %td\n",
	      midsum, sum, count);
      return 1;
    }
  return 0;
}
==SNIP==
Comment 1 Kaz Kylheku 2008-03-28 05:54:17 UTC
This bug can be addressed with a trivial one liner patch, which is not even in 
C code proper but a preprocessing directive!

ROFL!!!

I verified this to work on a MIPS system based on kernel 2.6.17.7 and glibc 
2.5.

The system boots just fine, everything seems to run, and my malloc test case 
shows that   1)  memory is being returned to the OS by free even after sbrk 
fails, and 2) sbrk is not repeatedly called after it fails. (The test case 
simply allocates about 500 megs worth of memory in 500 byte pieces). Malloc 
nicely switches to an alternate arena, whereby it uses mmap to grab heaps. 
These heaps are nicely freed when they become deallocated, and the memory 
originally obtained from sbrk is nicely returned with a negative sbrk. I ran 
it under gdb, placing breakpoints on mmap, munap, and sbrk, and also watching 
the virtual memory size of the process using ps aux.

--- glibc-2.5.orig/malloc/malloc.c      2006-09-07 09:06:02.000000000 -0700
+++ glibc-2.5/malloc/malloc.c   2008-04-04 01:01:32.862714704 -0800
@@ -3056,15 +3056,24 @@
       (*__after_morecore_hook) ();
   } else {
   /*
-    If have mmap, try using it as a backup when MORECORE fails or
-    cannot be used. This is worth doing on systems that have "holes" in
+    If we have arenas, then we are basically done with this
+    main_arena. We can return null, because the higher level malloc
+    routine like public_mALLOc will try fetching from another
+    arena, creating one if necessary. This way we won't
+    cause the main_arena to become non-contiguous.
+    A non-contiguous main_arena has the bug that it won't return
+    memory to the OS when it is freed, and that it keeps
+    hammering away on a nonworking sbrk for each allocation request
+
+    If have mmap, but not arenas, try using it as a backup when MORECORE fails
+    or cannot be used. This is worth doing on systems that have "holes" in
     address space, so sbrk cannot extend to give contiguous space, but
     space is available elsewhere.  Note that we ignore mmap max count
     and threshold limits, since the space will not be used as a
     segregated mmap region.
   */

-#if HAVE_MMAP
+#if HAVE_MMAP && !USE_ARENAS
     /* Cannot merge with old top, so add its size back in */
     if (contiguous(av))
       size = (size + old_size + pagemask) & ~pagemask;
Comment 2 Wolfram Gloger 2008-03-28 17:44:57 UTC
That would fix the test case in question, but only because it prohibits
allocation from the main arena altogether in this specific case.

I think it would not fix the following (IMHO in practice more likely) situation:

1. lot's of malloc/free from the main arena
2. a single sbrk(+something) call from the app, making the main arena
discontiguous after just one additional malloc()
3. lot's of free() from the main arena -- now every free() can still incur an
sbrk(0) call because the end of the main arena never can pass the user-allocated
space

I think the proper fix would be to prevent or alleviate the sbrk(0) in every
free(), given that sbrk(0) is very expensive (but see below).
Maybe we should give up expecting a discontiguous main_arena to magically become
contiguous again, i.e. not call trim at all when the arena is marked noncontiguous.

BUT, taking a step back, is this really a serious problem?  I consider the
statement that "each call to free() incurs a system call" _false_, because libc
caches the last brk value, so sbrk(0) does _not_ perform a system call AFAICS.
Or are we talking about specific systems where sbrk(0) is just like sbrk(+x)?
Comment 3 Kaz Kylheku 2008-07-18 20:07:07 UTC
(In reply to comment #2)
> I think it would not fix the following (IMHO in practice more likely) 
situation:
> 1. lot's of malloc/free from the main arena
> 2. a single sbrk(+something) call from the app

IMHO, apps that call sbrk deserve less than optimal behavior from malloc, such 
as memory not being returned when it is freed.

> 3. lot's of free() from the main arena
> -- now every free() can still incur an
> sbrk(0) call because the end of the main arena never can pass the user-
allocated
> space
> I think the proper fix would be to prevent or alleviate the sbrk(0) in every
> free(), given that sbrk(0) is very expensive (but see below).

But, seeing below, it is successfully argued that sbrk(0) is actually not 
expensive at all thanks to cashing. The remaining problem is that the freed 
memory won't be returned to the operating system.  But the application 
arguably deserves that, because it interfered with malloc by calling sbrk
(+delta).
Comment 4 Jonathan Paisley 2008-07-20 19:41:37 UTC
I reported this bug about three years ago now, so my memory is a little bit hazy. However, I do remember 
that calls to free() were definitely causing expensive syscalls. I note the recent comments about sbrk(0) 
not being expensive due to caching and, reviewing the glibc sources, I have to agree with this. So I suspect  
there may be some detail missing from my original report that explains more what was going on.

Sorry!
Comment 5 Simon Berger 2009-10-01 12:03:30 UTC
Created attachment 4242 [details]
test case

Is there any real solution for this problem? The "mallopt(M_TRIM_THRESHOLD,
(unsigned long) -1)" workaround does not work for me. It just causes every
malloc/free to result in a mmap/munmap syscall, which is even slower.

I have attached a small test program which can (kind of) work around the
problem by placing an explicit 'block buffer' behind the work buffers (set
'avoid_intelligence = true'). This would be hard to achieve in a real-world
program, though.

system info:
glibc 2.10.1-4
$ uname -a
Linux myhost 2.6.31-bfs #1 SMP PREEMPT Mon Sep 28 14:26:47 UTC 2009 x86_64
Intel(R) Core(TM)2 Duo CPU P9400 @ 2.40GHz GenuineIntel GNU/Linux

The strace output on the test program is:
brk(0xcab000)				= 0xcab000
brk(0xc2b000)				= 0xc2b000
brk(0xc6b000)				= 0xc6b000
brk(0xcab000)				= 0xcab000
(... continued ...)
Comment 6 Simon Berger 2009-10-01 12:19:48 UTC
oops, misinterpreted the topic of this bug:
my previous test case is for the Problem
'Malloc trimming filled-up sbrk space too aggressively causes unnecessary system
call overhead'
It is still a serious problem (caused about 50% slow down in a real world
program. The only possible work around was to introduce much more complicated
memory management in my application)

Comment 7 Wolfram Gloger 2009-10-01 17:14:50 UTC
In reply to comment #5:

The test program you've posted does potentially cause lots of system calls, but
for a different reason than the original report:
You're allocating 256k blocks which exceed the default mmap threshold.
So, to avoid any mmapping in that kind of app you have to specify _both_

mallopt(M_TRIM_THRESHOLD, (unsigned long) -1);
mallopt(M_MMAP_THRESHOLD, (unsigned long) -1);
Comment 8 Ondrej Bilka 2013-10-16 16:52:05 UTC
Does following patch fixes performance for you? I want to be sure that problem was in sbrk but not in something else.

diff --git a/malloc/malloc.c b/malloc/malloc.c
index 2938234..ab6d46f 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -2716,6 +2716,10 @@ static int systrim(size_t pad, mstate av)
   char* current_brk;     /* address returned by pre-check sbrk call */
   char* new_brk;         /* address returned by post-check sbrk call */
   size_t pagesz;
+  static int give_up;
+
+  if (give_up)
+    return 0;
 
   pagesz = GLRO(dl_pagesize);
   top_size = chunksize(av->top);
@@ -2762,7 +2766,8 @@ static int systrim(size_t pad, mstate av)
          return 1;
        }
       }
-    }
+    } else
+      give_up = 1;
   }
   return 0;
 }
Comment 9 Ondrej Bilka 2014-08-30 12:34:00 UTC
*** Bug 17195 has been marked as a duplicate of this bug. ***