This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Don't use SSE4_2 instructions on Intel Silvermont Micro Architecture.


On Mon, Jun 17, 2013 at 11:07:33AM -0700, H.J. Lu wrote:
> On Mon, Jun 17, 2013 at 10:56 AM, Dmitrieva Liubov
> <liubov.dmitrieva@gmail.com> wrote:
> > I checked that functions.
> > In case of strspn/strcspn/strpbrk to switch SSE4_2 off is bad because
> > there are no optimized sse2 versions to call instead.
> > Default versions are not sse there.
> >
> > So, it seems we need to create a new flag for Silvermont like
> > "slowPcmpistri" and fix switches in functions where optimized sse2
> > exist.
> >
> > Or implement optimized sse2 strspn/strcspn/strpbrk and switch SSE4_2 completely.
> >
I asked because these are about only case where I cannot get comparable
results with SSE2. A closest I could get try to split input into upto
four character intervals and check this in parallel. 
This has bit expensive preprocessing so I still look how to do it
better.
> 
> We can add bit_Prefer_SSE2_for_stringop.  When it is set, we
> will use SSE2 version if it is available.  Otherwise, we use
> SSE4_2 version if it is available.
> 
> 
As short term solution I would prefer bit_Slow_SSE4_2.

As long term solution I have optimized implementations for other
functions that do not use SSE4_2 and are faster.



When I run `git grep "cmp[ie]str[ie]"` I got

sysdeps/i386/i686/multiarch/strcmp-sse4.S
sysdeps/x86_64/multiarch/strcmp-sse42.S

I have several ideas but did not get to it yet. It has low priority as a
hot case is when strings differ in first 16 characters (for example when
you are sorting.)


sysdeps/x86_64/multiarch/rawmemchr.S

Not our case as it needs bit_SSE4_2 and not bit_Prefer_PMINUB_for_stringop

This is false on intel all processors. Most AMD processors are
misclassified because we do not set anything at all. They have slower
SSE4_2 which causes performance regression.


sysdeps/x86_64/multiarch/strchr.S
sysdeps/x86_64/multiarch/strrchr.S

I have implementation with faster asyptomatic time but I did not have
tunning in small cases.


sysdeps/x86_64/multiarch/strend-sse4.S

It is bit wierd why do we have this. Definitely you could improve
performance by taking strlen and modifying return value.

sysdeps/x86_64/multiarch/strstr.c

I have better implementation, I decided to wait for 2.19


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]