This is the mail archive of the
mailing list for the glibc project.
Re: RFC: Make string/memory functions optimized for unaligned SSE2 as default
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: "H.J. Lu" <hjl dot tools at gmail dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Thu, 27 Aug 2015 21:37:29 +0200
- Subject: Re: RFC: Make string/memory functions optimized for unaligned SSE2 as default
- Authentication-results: sourceware.org; auth=none
- References: <CAMe9rOo29oE+nHc+U9KvUU3OqdmGi+A--Sy4kKySFxGikKszbw at mail dot gmail dot com>
On Thu, Aug 27, 2015 at 10:02:36AM -0700, H.J. Lu wrote:
> The current default string/memory functions for x86-64 in libc and ld.so were
> implemented before SSE is allowed in ld.so and unaligned SSE load/store
> is faster on most processors. Today, we can use the same string/memory
> functions in libc and ld.so, most of x86-64 processors have fast unaligned
> SSE load/store. We should update the default string/memory functions for
> x864-64 to unaligned SSE2 version. Those functions are
> memcpy-sse2-unaligned.S strcat-sse2-unaligned.S strncat-sse2-unaligned.S
> stpcpy-sse2-unaligned.S strcmp-sse2-unaligned.S strncpy-sse2-unaligned.S
> stpncpy-sse2-unaligned.S strcpy-sse2-unaligned.S strstr-sse2-unaligned.S
I would like that.
As I recall these are also faster for older
processors in practice. I do not have data now, I would need to retest
as I omitted sse2 implemenatations from tests as they were too slow to
consider. I will write when I will retest that.
I had patches that improve ssse3 implementations by handling sizes upto
64 bytes as in unaligned case and then using ssse3 to align, same with
sse2. So until these patches will come in unaligned should be default
except perhaphs memset and memcpy.