This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: memcpy performance regressions 2.19 -> 2.24(5)
I definitely think increasing the size in the case of processors with
a large number of cores makes sense. Hopefully with some testing we
can confirm it is a net win and/or find a more empirical number.
Thanks for that patch with the tunable support. I've just put a
similar patch in review for sharing right now. It adds support in the
case that HAVE_TUNABLES isn't defined like the similar code in arena.c
and also makes a minor change that turns init_cacheinfo into a
init_cacheinfo_impl (a hidden callable). init_cacheinfo is now a
constructor that just calls the impl and passes the cpu_features
struct. This is useful in that it makes the code a bit more modular
(something that we'll need to be able to test this internally).
On Mon, May 22, 2017 at 12:17 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, May 18, 2017 at 1:59 PM, Erich Elsen <eriche@google.com> wrote:
>> Hi H.J.,
>>
>> I was on vacation, sorry for the slow reply. The updated benchmark
>> still shows the same behavior, thanks.
>>
>> I'll try my hand at creating a patch that makes that variable
>> __x86_shared_non_temporal_threshold a tunable. It will be necessary
>> to do internal experiments anyway.
>>
>
> __x86_shared_non_temporal_threshold was set to 6 times of per-core
> shared cache size, based on the large memcpy micro benchmark in glibc
> on a 8-core processor. For a processor with more than 8 cores, the
> threshold is too low. Set __x86_shared_non_temporal_threshold to the
> 3/4 of the total shared cache size so that it is unchanged on 8-core
> processors. On processors with less than 8 cores, the threshold is
> lower.
>
> Any comments?
>
> --
> H.J.