This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: The direction of malloc?


On Wed, Dec 18, 2013 at 01:11:08PM +0100, Torvald Riegel wrote:
> > > > Real implementation will be bit faster as dynamic tls slows this down a
> > > > bit.
> > > > 
> > > > Memory system effects are not a factor here, as allocation pattern is
> > > > identical (stack in both cases).
> > > 
> > > I was referring to memory system effects when the data is actually
> > > accessed by an application.  It does matter on which page allocated data
> > > ends up (and where on a page relative to other allocations).
> > 
> > And as it ends at same virtual address as algorithms are identical this
> > does not matter here.
> 
> You seemed to say that you want to move from concurrent code with
> synchronization to nonconcurrent code.  As far as I was able to
> interpret what you wrote, it seemed that you wanted to move from
> (potentially) concurrent allocation from fastbins(?) to strictly
> per-thread allocation bins.  Unless the former is concurrent and uses
> synchronization for no reason, it should be possible that you can have
> situations in which threads allocate from more areas than before.
>
Not neccessarily, you could add a check if it belong to thread and use
standard free if not, I wrote bit more about that here which was mainly
to get comments.

https://www.sourceware.org/ml/libc-alpha/2013-12/msg00280.html

> > >  The speed
> > > of allocations is just one thing.  For example, just to illustrate, if
> > > you realloc by copying to a different arena, this can slow down programs
> > > due to NUMA effects if those programs expect the allocation to likely
> > > remain on the same page after a realloc; such effects can be much more
> > > costly than a somewhat slower allocation.
> > 
> > You cannot optimize code for unlikely case.
> 
> We do NOT know what the unlikely case is, currently.  This is why I
> suggested to start with analyzing application workloads and access
> patterns, building a model of it (ie, informal but at a level of detail
> that is sufficient to actually agree on a clear set of assumptions and
> not just handwaving), document it, and discuss it with the rest of the
> community.
> 
> > When a memory is allocated
> > in thread A and reallocated in thread B there could be three cases
> > 
> > 1) Memory is passed to thread B which primarily access it.
> > 2) Memory is shared between A and B.
> > 3) Memory is primarily accessed by thread A.
> > 
> > As effect of cases 1) and 3) is symetrical
> 
> Yes, both can happen, and there might always be a trade-off, and however
> you decide, you might decrease performance in some situations.
> 
> > it suffices to estimate which
> > one is more likely and case 1 seems a best candidate.
> 
> We do NOT know that.  If you do, please show the evidence.
> 
> > Realloc definitely does move in most cases as common usage pattern is
> > doubling size allocated and as we use best fit there is not enough room.
> 
> How do you know it's really a common usage pattern?  And, why should it
> not just be common but one of the most common usage patterns?  What is
> common?  Which applications?  And so on...
>
It is about only way how avoid quadratic slowdown when repeately reallocating.
Ideally this should not be neccessary as we preallocate twice than requested 
in realloc but we do not do this yet. This affects gcc which repeately tries
extend buffer by 8. 

To test these use following program.

#define _GNU_SOURCE
#include <stdlib.h>
#include <dlfcn.h>
#include <stdio.h>
#include <stdint.h>

int moved, unmoved;

struct header
{
  uint64_t prev;
  uint64_t size;
};

void *(*reallocp)(void *, size_t);
void __attribute__ ((constructor))
foo ()
{
  reallocp = dlsym (RTLD_NEXT, "realloc");
}

void *
realloc (void *old, size_t size)
{
  if (!old)
    return malloc (size);

  struct header *h = (struct header *) old;
  h--;
  size_t oldsize = (h->size & (~15)) - 16;
  void *n = reallocp (old, size);
  fprintf (stderr, "ptr: %llx old: %i new: %i moved: %i\n", old, oldsize, size, old != n);
  return n;
}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]