This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Variations of memset()
On 08/04/2017 03:11 PM, Carlos O'Donell wrote:
> On 08/04/2017 03:02 PM, Matthew Wilcox wrote:
>> Here's the sample usage from the symbios driver:
>>
>> - for (i = 0 ; i < 64 ; i++)
>> - tp->luntbl[i] = cpu_to_scr(vtobus(&np->badlun_sa));
>> + memset32(tp->luntbl, cpu_to_scr(vtobus(&np->badlun_sa)), 64);
>>
>> I expect a lot of users would be of this type; simply replacing the
>> explicit for-loop equivalent with a library call.
>
> Have you measured the performance of this kind of conversion when using a
> simple application and a library implementing your various memset routines?
> In the kernel is one thing, outside of the kernel we have dynamic linking
> and no-inling across that shared object boundary.
I want to reiterate that measuring the performance of various options in
userspace is going to be relevant (particularly when they vary from the kernel):
* Application doing the naive loop above (-O0).
* Application doing the naive loop above ([-O2,-O3] + <vectorize options>).
* Application calling memset32 (-O0)
* Application calling memset32 (-O3)
<vectorize options>="-ftree-vectorize [-msse2,-mavx] -fopt-info-missed=missed.all"
You need to split the memset32 into another DSO to simulate this accurately.
--
Cheers,
Carlos.