This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Don't use SSE4_2 instructions on Intel Silvermont Micro Architecture.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: "H.J. Lu" <hjl dot tools at gmail dot com>
- Cc: Dmitrieva Liubov <liubov dot dmitrieva at gmail dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Tue, 18 Jun 2013 08:49:10 +0200
- Subject: Re: [PATCH] Don't use SSE4_2 instructions on Intel Silvermont Micro Architecture.
- References: <CAHjhQ93=uegeZg9iTqoJ+PFuUrvn8e2mA8tZ96Jy4CaV6aPbWg at mail dot gmail dot com> <20130617163729 dot GA15981 at domone dot kolej dot mff dot cuni dot cz> <CAHjhQ93zmP525hqW-2RnHBREc_949XLnm7sE-CSv3Nj8PQgUig at mail dot gmail dot com> <CAMe9rOqT31AFq1S3V0Krh2CZnHu=FiyXqhg840fimRtfU4_hXQ at mail dot gmail dot com>
On Mon, Jun 17, 2013 at 11:07:33AM -0700, H.J. Lu wrote:
> On Mon, Jun 17, 2013 at 10:56 AM, Dmitrieva Liubov
> <liubov.dmitrieva@gmail.com> wrote:
> > I checked that functions.
> > In case of strspn/strcspn/strpbrk to switch SSE4_2 off is bad because
> > there are no optimized sse2 versions to call instead.
> > Default versions are not sse there.
> >
> > So, it seems we need to create a new flag for Silvermont like
> > "slowPcmpistri" and fix switches in functions where optimized sse2
> > exist.
> >
> > Or implement optimized sse2 strspn/strcspn/strpbrk and switch SSE4_2 completely.
> >
I asked because these are about only case where I cannot get comparable
results with SSE2. A closest I could get try to split input into upto
four character intervals and check this in parallel.
This has bit expensive preprocessing so I still look how to do it
better.
>
> We can add bit_Prefer_SSE2_for_stringop. When it is set, we
> will use SSE2 version if it is available. Otherwise, we use
> SSE4_2 version if it is available.
>
>
As short term solution I would prefer bit_Slow_SSE4_2.
As long term solution I have optimized implementations for other
functions that do not use SSE4_2 and are faster.
When I run `git grep "cmp[ie]str[ie]"` I got
sysdeps/i386/i686/multiarch/strcmp-sse4.S
sysdeps/x86_64/multiarch/strcmp-sse42.S
I have several ideas but did not get to it yet. It has low priority as a
hot case is when strings differ in first 16 characters (for example when
you are sorting.)
sysdeps/x86_64/multiarch/rawmemchr.S
Not our case as it needs bit_SSE4_2 and not bit_Prefer_PMINUB_for_stringop
This is false on intel all processors. Most AMD processors are
misclassified because we do not set anything at all. They have slower
SSE4_2 which causes performance regression.
sysdeps/x86_64/multiarch/strchr.S
sysdeps/x86_64/multiarch/strrchr.S
I have implementation with faster asyptomatic time but I did not have
tunning in small cases.
sysdeps/x86_64/multiarch/strend-sse4.S
It is bit wierd why do we have this. Definitely you could improve
performance by taking strlen and modifying return value.
sysdeps/x86_64/multiarch/strstr.c
I have better implementation, I decided to wait for 2.19