This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Mon, 2017-03-27 at 11:52 +0100, Ramana Radhakrishnan wrote: > > > Also Adhemerval Zanella did some benchmarking that showed the > > prefetching done in the thunderx version might be appropriate for the > > generic version. However if you look at the prefetching we only do it > > every other time through the loop. This is because the loop copies 64 > > bytes and the ThunderX cache line size is 128 bytes. If other aarch64 > > chips have a 64 byte cache line they might want a different prefetching > > setup. > Can you link to the benchmark numbers, workloads and what systems ? > > Ramana The only reference I have to Adhemerval's results are at: https://sourceware.org/ml/libc-alpha/2017-02/msg00118.html Attached are my latest results on ThunderX with the IFUNC numbers from the glibc memcpy performance benchmarks. They include the new bench- memcpy-random benchmark which doesn't show much difference. It is really bench-memcpy-large that stands out. Steve Ellcey sellcey@cavium.com
Attachment:
bench-memcpy.out
Description: Text document
Attachment:
bench-memcpy-large.out
Description: Text document
Attachment:
bench-memcpy-random.out
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |