This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Don't use SSE4_2 instructions on Intel Silvermont Micro Architecture.


My patch is ready. Ok to commit?

Change Log.

2013-06-19  Liubov Dmitrieva  <liubov.dmitrieva@intel.com>

* sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
Set bit_Slow_SSE4_2 for Intel Silvermont architecture.
Set bit_Prefer_PMINUB_for_stringop for Intel Silvermont.
* sysdeps/x86_64/multiarch/init-arch.h: Define
bit_Slow_SSE4_2 and index_Slow_SSE4_2.
Define index_Prefer_PMINUB_for_stringop which was undefined.
* sysdeps/x86_64/multiarch/strchr.S: Use SSE2 version if
bit_Slow_SSE4_2 is on.
* sysdeps/x86_64/multiarch/strrchr.S: Use SSE2 version if
bit_Slow_SSE4_2 is on.
* sysdeps/x86_64/multiarch/strcmp.S: Use SSSE3 or SSE2 version if
bit_Slow_SSE4_2 is on.



Is it still actual don't use optimized versions for static glibc for
strrchr and strcmp?

--
Liubov Dmitrieva
Intel Corporation

On Wed, Jun 19, 2013 at 12:17 PM, Dmitrieva Liubov
<liubov.dmitrieva@gmail.com> wrote:
> Moreover SSSE3 is not good for Silvermont and there are no sse2
> unaligned versions for strcmp and memcmp to switch at the moment. I
> think we need to have unaligned versions for Core i7 as well.
>
> This is another room for optimization.
>
> I will add new flag bit_Slow_SSE4_2 and switch some function as a
> short term solution.
>
> --
> Liubov Dmitrieva
>
> On Tue, Jun 18, 2013 at 10:49 AM, OndÅej BÃlka <neleai@seznam.cz> wrote:
>> On Mon, Jun 17, 2013 at 11:07:33AM -0700, H.J. Lu wrote:
>>> On Mon, Jun 17, 2013 at 10:56 AM, Dmitrieva Liubov
>>> <liubov.dmitrieva@gmail.com> wrote:
>>> > I checked that functions.
>>> > In case of strspn/strcspn/strpbrk to switch SSE4_2 off is bad because
>>> > there are no optimized sse2 versions to call instead.
>>> > Default versions are not sse there.
>>> >
>>> > So, it seems we need to create a new flag for Silvermont like
>>> > "slowPcmpistri" and fix switches in functions where optimized sse2
>>> > exist.
>>> >
>>> > Or implement optimized sse2 strspn/strcspn/strpbrk and switch SSE4_2 completely.
>>> >
>> I asked because these are about only case where I cannot get comparable
>> results with SSE2. A closest I could get try to split input into upto
>> four character intervals and check this in parallel.
>> This has bit expensive preprocessing so I still look how to do it
>> better.
>>>
>>> We can add bit_Prefer_SSE2_for_stringop.  When it is set, we
>>> will use SSE2 version if it is available.  Otherwise, we use
>>> SSE4_2 version if it is available.
>>>
>>>
>> As short term solution I would prefer bit_Slow_SSE4_2.
>>
>> As long term solution I have optimized implementations for other
>> functions that do not use SSE4_2 and are faster.
>>
>>
>>
>> When I run `git grep "cmp[ie]str[ie]"` I got
>>
>> sysdeps/i386/i686/multiarch/strcmp-sse4.S
>> sysdeps/x86_64/multiarch/strcmp-sse42.S
>>
>> I have several ideas but did not get to it yet. It has low priority as a
>> hot case is when strings differ in first 16 characters (for example when
>> you are sorting.)
>>
>>
>> sysdeps/x86_64/multiarch/rawmemchr.S
>>
>> Not our case as it needs bit_SSE4_2 and not bit_Prefer_PMINUB_for_stringop
>>
>> This is false on intel all processors. Most AMD processors are
>> misclassified because we do not set anything at all. They have slower
>> SSE4_2 which causes performance regression.
>>
>>
>> sysdeps/x86_64/multiarch/strchr.S
>> sysdeps/x86_64/multiarch/strrchr.S
>>
>> I have implementation with faster asyptomatic time but I did not have
>> tunning in small cases.
>>
>>
>> sysdeps/x86_64/multiarch/strend-sse4.S
>>
>> It is bit wierd why do we have this. Definitely you could improve
>> performance by taking strlen and modifying return value.
>>
>> sysdeps/x86_64/multiarch/strstr.c
>>
>> I have better implementation, I decided to wait for 2.19

Attachment: silvermont.patch
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]