This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Add malloc micro benchmark
- From: DJ Delorie <dj at redhat dot com>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- Cc: carlos at redhat dot com, libc-alpha at sourceware dot org, nd at arm dot com
- Date: Mon, 18 Dec 2017 18:02:10 -0500
- Subject: Re: [PATCH] Add malloc micro benchmark
- Authentication-results: sourceware.org; auth=none
Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
> Since DJ didn't seem keen on increasing the tcache size despite it
> showing major gains across a wide range of benchmarks,
It's not that I'm not keen on increasing the size, it's that there are
drawbacks to doing so and I don't want to base such a change on a guess
(even a good guess). If you have benchmarks, let's collect them and add
them to the trace corpus. I can send you my corpus. (We don't have a
good solution for centrally storing such a corpus, yet) Let's run all
the tests against all the options and make an informed decision, that's
all. If it shows gains for synthetic benchmarks, but makes qemu slower,
we need to know that.
Also, as Carlos noted, there are some downstream uses where a larger
cache may be detrimental. Sometimes there are no universally "better"
defaults, and we provide tunables for those cases.
And, as always, I can be out-voted if the consensus disagrees with me ;-)
> I decided to fix the performance for the single-threaded case at
> least. It's now 2.5x faster on a few sever benchmarks (of course the
> next question is whether tcache is actually useful in its current
> form).
Again, tcache is intended to help the multi-threaded case. Your patches
help the single-threaded case. If you recall, I ran your patch against
my corpus of multi-threaded tests, and saw no regressions, which is
good.
So our paranoia here is twofold...
1. Make sure that when someone says "some benchmarks" we have those
benchmarks available to us, either as a microbenchmark in glibc or as
a trace we can simulate and benchmark. No more random benchmarks! :-)
2. When we say a patch "is faster", let's run all our benchmarks and
make sure that we don't mean "on some benchmarks." The whole point
of the trace/sim stuff is to make sure key downstream users aren't
left out of the optimization work, and end up with worse performance.
We probably should add "on all major architectures" too but that assumes
we have machines on which we can run the benchmarks.
So we should be able to answer your question, not just wonder...
> I'd have to check how easy it is to force it to use the thread arena.
I'm guessing we could have a glibc-internal API to tag the heap as
"corrupt" which would preclude using it.
> If consolidation doesn't work that's a serious bug.
Sometimes it's not a case of "doesn't work" as a case of "not attempted
for performance reasons". If we can show that a different design choice
is universally better[*], we should change it.
[*] or at least, universally-enough for a "system" allocator like glibc
must provide.