This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.


On Fri, Jul 12, 2013 at 10:12:34AM +0400, Liubov Dmitrieva wrote:
> Do you mean AMD? For Intel there is no a machine without SSE4_1 where
> sse2 unaligned version is faster than ssse3.
>
Good to know. 

I looked at sources and found that memcmp is horribly misoptimized as usual.

As in 70% cases difference is found in first 16 characters and 99% in 64
characters loop case is cold. 

This is not much problem when n>48 as starting unaligned comparison handles 
this effectively for differences in first 16 characters.

However otherwise there is lot of jumps to choose based on size which is
ineffective.

Code also answered what I thought was roadblock and why I did not try to
optimize memcmp: That n is authoritative and we can seqfault when
there is unallocated memory after first difference in range specified by
n.

I will prepare patch with faster memcmp.
 
> --
> Liubov
> 
> On Fri, Jul 12, 2013 at 7:01 AM, OndÅej BÃlka <neleai@seznam.cz> wrote:
> > On Thu, Jul 11, 2013 at 06:07:49PM +0400, Liubov Dmitrieva wrote:
> >> My Silvermont patch in the latest edition doesn't touch memcmp and
> >> wmemcmp at all because I didn't see good boost from switching SSE42
> >> off for these 2 functions.
> >> Now I see why. There are no SSE42 instruction there. :)
> >> The patch looks good. I will just check performance regressions for Penryn.
> >>
> > Now question is if this is also good for other archs. An SSE 4.1 is not
> > really needed, we can just replace ptest with pmovmskb test pair and
> > performance will be nearly identical so it is worth checking if old
> > cores benefit. (I see possible optimizations which I will send later.)
> >
> >> --
> >> Liubov
> >>
> >> On Wed, Jul 10, 2013 at 10:23 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> >> > On Wed, Jul 10, 2013 at 11:19 AM, Matt Turner <mattst88@gmail.com> wrote:
> >> >> On Wed, Jul 10, 2013 at 11:16 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> >> >>> On Wed, Jul 10, 2013 at 10:41 AM, Matt Turner <mattst88@gmail.com> wrote:
> >> >>>> On Wed, Jul 10, 2013 at 8:30 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> >> >>>>> On Tue, Jul 9, 2013 at 9:37 PM, Andreas Jaeger <aj@suse.com> wrote:
> >> >>>>>> On 07/10/2013 03:17 AM, Matt Turner wrote:
> >> >>>>>>> It uses SSE 4.1 instructions (ptest) but no SSE 4.2 instructions.
> >> >>>>>>
> >> >>>>>> There are two parts to this: It should only run on cpus with those
> >> >>>>>> instructions but we also need to ensure that it gives a better
> >> >>>>>> performance on such cpus. HJ, Matt, please do run performance tests on a
> >> >>>>>> variety of affected cpus to show that this change really helps in all cases,
> >> >>>>>>
> >> >>>>>> Andreas
> >> >>>>>
> >> >>>>> Only Penryn has SSE4.1 without SSE4.2.  Liubov, can
> >> >>>>> you compare performance of memcmp-sse4.S vs
> >> >>>>> memcmp-ssse3.S on Penryn?
> >> >>>>
> >> >>>> Is it also the case that this path would now be used on Silvermont?
> >> >>>
> >> >>> It is used on Silvermont since it supports SSE4.2
> >> >>>
> >> >>> --
> >> >>> H.J.
> >> >>
> >> >> To confirm, setting bit_Slow_SSE4_2 on Silvermont (which we do)
> >> >> wouldn't prevent this path from executing?
> >> >
> >> > I don't think so.  Liubov, can you verify it?
> >> >
> >> > --
> >> > H.J.
> >
> > --
> >
> > The cord jumped over and hit the power switch.

-- 

endothermal recalibration


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]