This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Ping: [Patch] aarch64: Thunderx specific memcpy and memmove


Andrew Pinski <pinskia@gmail.com> wrote:
> 
> One memcpy does not fit all micro-arch.  Just look at x86, where they
> have many different versions and even do selection based on cache size
> (see the current discussion about the memcpy regression).

Given the number of micro architectures already existing, it would be a really
bad situation to end up with one memcpy per micro architecture...

Micro architectures will tend to converge rather than diverge as performance
level increases. So I believe it's generally best to use the same instructions for
memcpy as for compiled code as that is what CPUs will actually encounter
and optimize for. For the rare, very large copies we could do something different
if it helps (eg. prefetch, non-temporals, SIMD registers etc).

> >> - non-thunderx systems are affected: static linked code using
> >> memcpy will start to go through an indirection (iplt) instead
> >> of direct call. if there are complaints about it or other ifunc
> >> related issues come up, then again we will have to reconsider it.
>
> Just to answer this.  This is true on x86 and PowerPC already so there
> should be no difference on aarch64 than those two targets.

An ifunc has a measurable overhead unfortunately, and that would no longer
be trivially avoidable via static linking. Most calls to memcpy tend to be very
small copies. Maybe we should investigate statically linking the small copy part
of memcpy with say -O3?

Cheers,
Wilco

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]