This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/11261] malloc uses excessive memory for multi-threaded applications


------- Additional Comments From rich at testardi dot com  2010-02-10 13:10 -------
Hi Ulrich,

I apologize in advance and want you to know I will not reopen this bug again, 
but I felt I had to show you a new test program that clearly shows "The cost 
of large amounts of allocated address space is insignificant" can be 
exceedingly untrue for heavily threaded systems using large amounts of 
memory.  In our product, we require 2x the RAM on Linux vs other OS's because 
of this. :-(

I've reduced the problem to a program that you can invoke with no options and 
it runs fine, but with the "-x" option it thrashes wildly.  The only 
difference is that in the "-x" case we allow the threads to do some dummy 
malloc/frees up front to create thread-preferred arenas.

The program simply has a bunch of threads that, in turn (i.e., not 
concurrently), allocate a bunch of memory, and then free most (but not all!) 
of it.  The resulting allocations easily fit in RAM, even when fragmented.  It 
then attempts to memset the unfreed memory to 0.

The problem is that in the thread-preferred arena case, the fragmented 
allocations are now spread over 10x the virtual space, and when accessed, 
result in actual commitment of at least 2x the physical space -- enough to 
push us over the top of RAM and into thrashing.

So as a result, without the -x option, the program memset runs in two seconds 
or so on my system (8-way, 2GHz, 12GB RAM); with the -x option, the program 
memset can take hundreds to thousands of seconds.

I know this sounds contrived, but it was in fact *derived* from a real-life 
problem.

All I am hoping to convey is that there are memory intensive applications for 
which thread-preferred arenas actually hurt performance significantly.  
Furthermore, turning on MALLOC_PER_THREAD can actually have an even more 
devastating effect on these applications than the default behavior.  And 
unfortunately, neither MALLOC_ARENA_MAX nor MALLOC_ARENA_TEST can prevent the 
thread-preferred arena proliferation.

The test run output without and with "-x" option are below; the source code is 
below that.

Thank you for your time.  Like I said, I won't reopen this again, but I hope 
you'll consider giving applications like ours a "way out" of the thread-
preferred arenas in the future -- especially since it seems our future is even 
more bleak with MALLOC_PER_THREAD, and that's the way you are moving (and for 
certain applications, MALLOC_PER_THREAD makes sense!).

Anyway, I've already written a small block binned allocator that will live on 
top of mmap'd pages for us for Linux, so we're OK.  But I'd rather just use 
malloc(3).

-- Rich

[root@lab2-160 test_heap]# ./memx2
cpus = 8; pages = 3072694; pagesize = 4096
nallocs = 307200
--- creating 100 threads ---
--- waiting for threads to allocate memory ---
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
--- malloc_stats() ---
Arena 0:
system bytes     = 1557606400
in use bytes     =  743366944
Total (incl. mmap):
system bytes     = 1562529792
in use bytes     =  748290336
max mmap regions =          2
max mmap bytes   =    4923392
--- cat /proc/29565/status | grep -i vm ---
VmPeak:  9961304 kB
VmSize:  9951060 kB
VmLck:         0 kB
VmHWM:   2517656 kB
VmRSS:   2517656 kB
VmData:  9945304 kB
VmStk:        84 kB
VmExe:         8 kB
VmLib:      1532 kB
VmPTE:     19432 kB
--- accessing memory ---
--- done in 3 seconds ---


[root@lab2-160 test_heap]# ./memx2 -x
cpus = 8; pages = 3072694; pagesize = 4096
nallocs = 307200
--- creating 100 threads ---
--- allowing threads to create preferred arenas ---
--- waiting for threads to allocate memory ---
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
--- malloc_stats() ---
Arena 0:
system bytes     = 1264455680
in use bytes     =  505209392
Arena 1:
system bytes     = 1344937984
in use bytes     =  653695200
Arena 2:
system bytes     = 1396580352
in use bytes     =  705338800
Arena 3:
system bytes     = 1195057152
in use bytes     =  503815408
Arena 4:
system bytes     = 1295818752
in use bytes     =  604577136
Arena 5:
system bytes     = 1094295552
in use bytes     =  403053744
Arena 6:
system bytes     = 1245437952
in use bytes     =  554196272
Arena 7:
system bytes     = 1144676352
in use bytes     =  453434608
Arena 8:
system bytes     = 1346199552
in use bytes     =  654958000
Total (incl. mmap):
system bytes     = 2742448128
in use bytes     =  748234656
max mmap regions =          2
max mmap bytes   =    4923392
--- cat /proc/29669/status | grep -i vm ---
VmPeak: 49213720 kB
VmSize: 49182988 kB
VmLck:         0 kB
VmHWM:  12052384 kB
VmRSS:  11861284 kB
VmData: 49177232 kB
VmStk:        84 kB
VmExe:         8 kB
VmLib:      1532 kB
VmPTE:     95452 kB
--- accessing memory ---
60 secs... 120 secs... 180 secs... 240 secs... 300 secs... 360 secs... 420 
secs... 480 secs... 540 secs... 600 secs... 660 secs... 720 secs... 780 secs...
--- done in 818 seconds ---
[root@lab2-160 test_heap]#


[root@lab2-160 test_heap]# cat memx2.c
// ****************************************************************************

#include <stdio.h>
#include <errno.h>
#include <assert.h>
#include <limits.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <inttypes.h>

#define NTHREADS  100
#define ALLOCSIZE  16384
#define STRAGGLERS  100

static uint cpus;
static uint pages;
static uint pagesize;

static uint nallocs;

static volatile int go;
static volatile int done;
static volatile int spin;
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

static void **ps;  // allocations that are freed in turn by each thread
static int nps;
static void **ss;  // straggling allocations to prevent arena free
static int nss;

void
my_sleep(
    int ms
    )
{
    int rv;
    struct timespec ts;
    struct timespec rem;

    ts.tv_sec  = ms / 1000;
    ts.tv_nsec = (ms % 1000) * 1000000;
    for (;;) {
        rv = nanosleep(&ts, &rem);
        if (! rv) {
            break;
        }
        assert(errno == EINTR);
        ts = rem;
    }
}

void *
my_thread(
    void *context
    )
{
    int i;
    int n;
    int si;
    int rv;
    void *p;

    n = (int)(intptr_t)context;

    while (! go) {
        my_sleep(100);
    }

    // first we spin to get our own arena
    while (spin) {
        p = malloc(ALLOCSIZE);
        assert(p);
        if (rand()%20000 == 0) {
            my_sleep(10);
        }
        free(p);
    }

    my_sleep(1000);

    // then one thread at a time, do our big allocs
    rv = pthread_mutex_lock(&mutex);
    assert(! rv);
    for (i = 0; i < nallocs; i++) {
        assert(i < nps);
        ps[i] = malloc(ALLOCSIZE);
        assert(ps[i]);
    }
    // N.B. we leave 1 of every STRAGGLERS allocations straggling
    for (i = 0; i < nallocs; i++) {
        assert(i < nps);
        if (i%STRAGGLERS == 0) {
            si = nallocs/STRAGGLERS*n + i/STRAGGLERS;
            assert(si < nss);
            ss[si] = ps[i];
        } else {
            free(ps[i]);
        }
    }
    done++;
    printf("%d ", done);
    fflush(stdout);
    rv = pthread_mutex_unlock(&mutex);
    assert(! rv);
}

int
main(int argc, char **argv)
{
    int i;
    int rv;
    time_t n;
    time_t t;
    time_t lt;
    pthread_t thread;
    char command[128];


    if (argc > 1) {
        if (! strcmp(argv[1], "-x")) {
            spin = 1;
            argc--;
            argv++;
        }
    }
    if (argc > 1) {
        printf("usage: memx2 [-x]\n");
        return 1;
    }

    cpus = sysconf(_SC_NPROCESSORS_CONF);
    pages = sysconf (_SC_PHYS_PAGES);
    pagesize = sysconf (_SC_PAGESIZE);
    printf("cpus = %d; pages = %d; pagesize = %d\n", cpus, pages, pagesize);

    nallocs = pages/10/STRAGGLERS*STRAGGLERS;
    assert(! (nallocs%STRAGGLERS));
    printf("nallocs = %d\n", nallocs);

    nps = nallocs;
    ps = malloc(nps*sizeof(*ps));
    assert(ps);
    nss = NTHREADS*nallocs/STRAGGLERS;
    ss = malloc(nss*sizeof(*ss));
    assert(ss);

    if (pagesize != 4096) {
        printf("WARNING -- this program expects 4096 byte pagesize!\n");
    }

    printf("--- creating %d threads ---\n", NTHREADS);
    for (i = 0; i < NTHREADS; i++) {
        rv = pthread_create(&thread, NULL, my_thread, (void *)(intptr_t)i);
        assert(! rv);
        rv = pthread_detach(thread);
        assert(! rv);
    }
    go = 1;

    if (spin) {
        printf("--- allowing threads to create preferred arenas ---\n");
        my_sleep(5000);
        spin = 0;
    }

    printf("--- waiting for threads to allocate memory ---\n");
    while (done != NTHREADS) {
        my_sleep(1000);
    }
    printf("\n");

    printf("--- malloc_stats() ---\n");
    malloc_stats();
    sprintf(command, "cat /proc/%d/status | grep -i vm", (int)getpid());
    printf("--- %s ---\n", command);
    (void)system(command);

    // access the stragglers
    printf("--- accessing memory ---\n");
    t = time(NULL);
    lt = t;
    for (i = 0; i < nss; i++) {
        memset(ss[i], 0, ALLOCSIZE);
        n = time(NULL);
        if (n-lt >= 60) {
            printf("%d secs... ", (int)(n-t));
            fflush(stdout);
            lt = n;
        }
    }
    if (lt != t) {
        printf("\n");
    }
    printf("--- done in %d seconds ---\n", (int)(time(NULL)-t));

    return 0;
}
[root@lab2-160 test_heap]#


-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|WONTFIX                     |


http://sourceware.org/bugzilla/show_bug.cgi?id=11261

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]