This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [RFC] Clean up SSE variable shifts


Your email doesn't show up on libc-alpha.

Your patch has the wrong path for varshift.h.

> 
> (1) Instead of having the compiler generate a jump table, use a
> computed branch inside inline assembly.

Have you compared the generated code against C version in varshift.h?

> It's tempting to actually share code here, and generate the table out-
> of-line with entries like
> 
> 	psrldq $1, %xmm0
> 	ret
> 
> and use call *%1 in the inline assembly.  The use of
> 
>   register __m128i value __asm__("%xmm0");
> 
> could be used to restrict the compiler to the single register
> supported by the out-of-line table.  It doesn't look like this would
> unduly hamper the compiler in the places it's used.
> 
> There are currently 5 copies of this jump table in libc.
> We'd save 4*8*16 = 512 bytes of code space with the out-of-line
> version.

What is the performance impact of extra function call
vs. multiple copies of the same jump table?
 
> (2) The two instances of jump tables involving palignr can me done
> just as easily by re-reading the data via an unaligned load.  From a
> hot cache, that's surely faster than anything else we could do here.
> 

Sure.

I have been wishing for variable vector shift instructions.


H.J.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]