This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] malloc: add random offset to mmapped memory


On 24 Jan 2015 22:01, Maarten Bosmans wrote:
> When the current malloc implementation uses mmap to directly fulfil an
> allocation request, it returns an address that is always aligned to a
> page boundary + 16 bytes. When multiple such arrays are accessed in
> the same order, like in the following example code, performance is
> suboptimal due to cache conflicts.
> 
> static void mmap_alignment_test(unsigned n_arr, size_t length) {
>   /* allocate [n_arr] arrays of [length] integers */
>   int16_t *arr[n_arr];
>   for (unsigned a = 0; a < n_arr; a++) {
>     arr[a] = malloc(length * sizeof(int16_t));
>   }
>   /* fill the arrays, interleaving writes to each array */
>   for (size_t i = 0; i < length; i++) {
>     for (unsigned a = 0; a < n_arr; a++) {
>       arr[a][i] = i;
>     }
>   }
> }
> 
> The performance impact can be seen in this graph[1], where the
> results are shown for executing this code with n_arr=1 to 20 and
> length=50000. By default glibc satisfies these small (100kB) requests
> from its heap, but by setting MALLOC_MMAP_THRESHOLD_ to a suitably
> small value, they can be forced to come directly from the mmap system
> call. You can see quite clearly that the code is run on a cpu with an
> 8-way associative cache, as that is the point where the similarly
> aligned mmapped arrays start conflicting.
> 
> My proposal is to use the extra (unused) space that we get from mmap
> anyway (because it is page-aligned) to add an offset to the returned
> pointer. This would improve the performance of this example test case
> when the arrays are large enough to be mmapped directly.
> 
> I would like to get some feedback whether glibc developers think this
> is a worthwhile goal to pursue, before I start working on a patch.

while i'm not against making programs work faster when possible, i'm not sure 
your example here is a good one.  it seems like you're purposefully writing 
(imo) bad code that ignores the realities of cpu caches.  iow, if your program 
is at this level of optimization, maybe it'd be better reading:
	http://www.akkadia.org/drepper/cpumemory.pdf

especially when you start talking about creating artificially bad scenarios by 
turning up the MALLOC_MMAP_THRESHOLD_ knob.  forcing lots of allocations to 
come from direct mmap's will put pressure on the system and can be even worse 
for performance than cache-hostile code like you've shown here.

it might help your case if you had a real world example that didn't specifically 
do both of those things ...
-mike

Attachment: signature.asc
Description: Digital signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]