This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] Add malloc micro benchmark


On 01/05/2018 06:26 PM, Carlos O'Donell wrote:
It would still be a win if we did not have co-located metadata (something
Florian whispered into my ear years ago now) for small constant sized blocks.

We would go from this:

N * 1-byte allocations => N * (32-byte header
                                + 1-byte allocation
                                + 15-bytes alignment)
                           [97% constant waste]

Actually, we have an 8-byte header, a 16-byte alignment requirement, and a 24-byte minimum allocation size. (i386: 4-byte header, 16-byte alignment, 12-byte minimum size.)

The non-main arenas actually need only a four-byte chunk header (or even fewer bits) because there, the chunk size is very limited.

The alignment is non-negotiable, considering our current stance regarding fundamental alignment, even for smaller allocations.

If we introduce heap layout for all arenas (which needs some way to compensate for the variable sbrk offset, preferably without wasting a gap there), then we should bring down the minimum allocation size to 12 bytes on 64-bit as well because for the smallest bin, we can use 4-byte forward/backward links within a heap, and keep 8-byte pointers separate for each heap.

None of this needs algorithm changes, but it's still difficult to identify all the places which need changing, due to the code duplication and some other issues with the code.

But it only helps with oddly-sized allocations (such as 12 bytes or 28 bytes), so it is unclear whether this work is worthwhile.

To this:

N * 1-byte allocations => N * (1-byte allocation
                                + 15-bytes alignment)
                           + (N/8)-bytes in-use-bit + 16-bytes header
                           [96% waste for 1-byte]
			  [94% waste for 100*1-byte]
                           ... towards a 93.75% constant waste (limit of the alignment e.g. 15/16)

Another significant win is for allocation sizes such as 32, where we currently need to allocate 48 bytes. In fact, powers-of-two are quite bad for the current allocator.

However, it is difficult to combine this with the existing allocator because you need to perform some table lookup during free to discover whether the pointer has adjacent metadata or not, or waste some address space and use bits inside the address for that.

If we want to bring the existing allocator further along, we should perhaps try to port dlmalloc changes which get rid of the unsorted bins, using balanced binary trees. I have a feeling that this would allow us to consolidate far more often, and this should help us to avoid some of the anomalies we have seen.

Thanks,
Florian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]