This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: memcpy performance regressions 2.19 -> 2.24(5)

From: Erich Elsen <eriche at google dot com>
To: "H.J. Lu" <hjl dot tools at gmail dot com>
Cc: "Carlos O'Donell" <carlos at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>
Date: Mon, 22 May 2017 18:23:01 -0700
Subject: Re: memcpy performance regressions 2.19 -> 2.24(5)
Authentication-results: sourceware.org; auth=none
References: <CAOVZoAPo-A5-bRZFHeu_wvTASzh_4nYwmqfCVfHQ7h34GyWKAA@mail.gmail.com> <9c563a4b-424b-242f-b82f-4650ab2637f7@redhat.com> <CAOVZoAO9Ryz0uqGx4aXi+vf27c_+j8e+_opxcfQK27qO0OnBpw@mail.gmail.com> <CAMe9rOo8SbuodNhZugDLhdRPU=8OZuKVp4eT+3-6Gw-K0OLs3Q@mail.gmail.com> <CAOVZoAMmOoPU2ECGGiM6=tXdNXmd6HhKEqu67dP3u_WZ5gfm3Q@mail.gmail.com> <CAMe9rOr-E6OsDYsUCc7ci54e9R7A+BjH82eiMvLydtiJD61uzw@mail.gmail.com> <28e34264-e8c5-5570-c48c-9125893808b2@redhat.com> <CAOVZoAPp3_T+ourRkNFXHfCSQUOMFn4iBBm9j50==h=VJcGSzw@mail.gmail.com> <CAMe9rOpi75y5ATt8bUYSB6LJexQNfOXbOb=gYdMG3d3g4P6U9Q@mail.gmail.com> <CAMe9rOrznBXKWHL5bOUSs75A96j_5jiHF+W+D-U3tusYpbwp0Q@mail.gmail.com> <CAOVZoANj7Oqu66oXcfn-Fmi5UHVaxBRd-bkxVbaSr3bWUPXaXg@mail.gmail.com> <CAMe9rOpO_q8hMt+U4xUB8HoR9i7E0LO92SCiU_=v_306XdtjJQ@mail.gmail.com>

I definitely think increasing the size in the case of processors with
a large number of cores makes sense.  Hopefully with some testing we
can confirm it is a net win and/or find a more empirical number.

Thanks for that patch with the tunable support.  I've just put a
similar patch in review for sharing right now.  It adds support in the
case that HAVE_TUNABLES isn't defined like the similar code in arena.c
 and also makes a minor change that turns init_cacheinfo into a
init_cacheinfo_impl (a hidden callable).  init_cacheinfo is now a
constructor that just calls the impl and passes the cpu_features
struct.  This is useful in that it makes the code a bit more modular
(something that we'll need to be able to test this internally).

On Mon, May 22, 2017 at 12:17 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, May 18, 2017 at 1:59 PM, Erich Elsen <eriche@google.com> wrote:
>> Hi H.J.,
>>
>> I was on vacation, sorry for the slow reply.  The updated benchmark
>> still shows the same behavior, thanks.
>>
>> I'll try my hand at creating a patch that makes that variable
>> __x86_shared_non_temporal_threshold a tunable.  It will be necessary
>> to do internal experiments anyway.
>>
>
> __x86_shared_non_temporal_threshold was set to 6 times of per-core
> shared cache size, based on the large memcpy micro benchmark in glibc
> on a 8-core processor.  For a processor with more than 8 cores, the
> threshold is too low.  Set __x86_shared_non_temporal_threshold to the
> 3/4 of the total shared cache size so that it is unchanged on 8-core
> processors.  On processors with less than 8 cores, the threshold is
> lower.
>
> Any comments?
>
> --
> H.J.

Follow-Ups:
- Re: memcpy performance regressions 2.19 -> 2.24(5)
  - From: H.J. Lu

References:
- memcpy performance regressions 2.19 -> 2.24(5)
  - From: Erich Elsen
- Re: memcpy performance regressions 2.19 -> 2.24(5)
  - From: Carlos O'Donell
- Re: memcpy performance regressions 2.19 -> 2.24(5)
  - From: Erich Elsen
- Re: memcpy performance regressions 2.19 -> 2.24(5)
  - From: H.J. Lu
- Re: memcpy performance regressions 2.19 -> 2.24(5)
  - From: Erich Elsen
- Re: memcpy performance regressions 2.19 -> 2.24(5)
  - From: H.J. Lu
- Re: memcpy performance regressions 2.19 -> 2.24(5)
  - From: Carlos O'Donell
- Re: memcpy performance regressions 2.19 -> 2.24(5)
  - From: H.J. Lu
- Re: memcpy performance regressions 2.19 -> 2.24(5)
  - From: H.J. Lu
- Re: memcpy performance regressions 2.19 -> 2.24(5)
  - From: Erich Elsen
- Re: memcpy performance regressions 2.19 -> 2.24(5)
  - From: H.J. Lu

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]