This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Variations of memset()


On 08/04/2017 03:11 PM, Carlos O'Donell wrote:
> On 08/04/2017 03:02 PM, Matthew Wilcox wrote:
>> Here's the sample usage from the symbios driver:
>>
>> -               for (i = 0 ; i < 64 ; i++)
>> -                       tp->luntbl[i] = cpu_to_scr(vtobus(&np->badlun_sa));
>> +               memset32(tp->luntbl, cpu_to_scr(vtobus(&np->badlun_sa)), 64);
>>
>> I expect a lot of users would be of this type; simply replacing the
>> explicit for-loop equivalent with a library call.
>  
> Have you measured the performance of this kind of conversion when using a
> simple application and a library implementing your various memset routines?
> In the kernel is one thing, outside of the kernel we have dynamic linking
> and no-inling across that shared object boundary.

I want to  reiterate that measuring the performance of various options in 
userspace is going to be relevant (particularly when they vary from the kernel):

* Application doing the naive loop above (-O0).

* Application doing the naive loop above ([-O2,-O3] + <vectorize options>).

* Application calling memset32 (-O0)

* Application calling memset32 (-O3)

<vectorize options>="-ftree-vectorize [-msse2,-mavx] -fopt-info-missed=missed.all"

You need to split the memset32 into another DSO to simulate this accurately.

-- 
Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]