This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
RE: [RFC] Clean up SSE variable shifts
[hjl@gnu-6 strcspn]$ size sse4-strcspn-vshft1.o sse4-strcspn-vshft2.o sse4-strcspn-vshft3.o sse4-strcspn-vshft4.o sse4-strcspn-vshft5.o varshift?.o
text data bss dec hex filename
684 0 0 684 2ac sse4-strcspn-vshft1.o
591 0 0 591 24f sse4-strcspn-vshft2.o
335 0 0 335 14f sse4-strcspn-vshft3.o
335 0 0 335 14f sse4-strcspn-vshft4.o
324 0 0 324 144 sse4-strcspn-vshft5.o
174 0 0 174 ae varshift3.o
256 0 0 256 100 varshift4.o
31 0 0 31 1f varshift5.o
The order of size with the smallest first is
Replace palignr with unaligned load, replace intrinsic with pshufb + unaligned load
Replace palignr with unaligned load, replace intrinsic with a function call
Replace palignr with unaligned load, replace intrinsic with pshufb
Replace palignr with unaligned load, replace intrinsic with asm statement
Replace palignr with unaligned load
H.J.
> -----Original Message-----
> From: Ulrich Drepper [mailto:drepper@redhat.com]
> Sent: Monday, August 23, 2010 10:05 PM
> To: Lu, Hongjiu
> Cc: libc-alpha@sourceware.org; Richard Henderson
> Subject: Re: [RFC] Clean up SSE variable shifts
>
> ----- "Hongjiu Lu" <hongjiu.lu@intel.com> wrote:
> > Here are TSC deltas between different implementations. It is
> > hard to tell which one is faster.
>
> Agreed. This is, though, the best result. It means the
> implementations really don't differ at all in micro-benchmarks.
> THerefore we can use the one with has the minimum resource use, in
> code and data. Can you post that data, too? I mean the 'size' for
> the various sections.
>
> --
> â Ulrich Drepper â Red Hat, Inc. â 444 Castro St â Mountain View, CA â