This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: malloc - cache locality - TLB


The TLB effects could be mostly interpreted as a cache locality problem
where cache-lines are page large. Links that I gave earlier also try to
handle these. 

On Fri, Dec 20, 2013 at 03:24:11AM +0100, OndÅej BÃlka wrote:
> 
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.43.6621&rep=rep1&type=pdf
> http://www.cs.technion.ac.il/~itai/Courses/Cache/pointers.pdf
> http://people.cs.umass.edu/~emery/pubs/p33-feng.pdf
>

A main ideas there to minimize tlb misses is to allocate chunks in address-order which causes
frequently allocated entries reside mostly at low address and eventualy free high addresses
when they are not needed. There is also a bitmap alternative where we
allocate from single page while we can and then pick a page with smallest usage ratio 
(and is reasonably fresh for cache concerns).

However technology progressed somewhat and main problems are
administrative rather than technological ones.

There is prerequisite that we should introduce a per-application file
where we could store application-specific profile feedback.

Second requirement is using separate heaps for small and large allocations.

As linux supported since 2003 huge pages we could try to use these.

One of benefits of paging is that system does not have allocate pages
that are never used. However when requests are less than page large
a single write will occupy it and benefits of virtual space vanish.

If we placed these allocations into huge pages then we could
considerably improve TLB cache considerably as small requests are
likeliest candidates for hot entries.

A downsite is that huge pages migth be too big so we could waste a page 
while we use only small part of it, give pages to process that does not
compute much and starve a computing one, copying too much on fork etc.

These problems are solvable with profiling, into file that I mentioned
earlier we could write statistics of small heap size and allocate huge
pages only when it is likely that we use more than page, there was no
fork, process is cpu-intensive...

Another hurdle is that in linux total number of available huge pages in system
needs to be set by root so we need to coordinate with distributions
about setting a default.




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]