This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Compile AVX libm functions with -mavx
On Tue, Oct 2, 2012 at 4:45 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Oct 2, 2012 at 4:07 PM, Matt Turner <mattst88@gmail.com> wrote:
>> On Tue, Oct 2, 2012 at 1:19 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Tue, Oct 2, 2012 at 12:47 PM, OndÅej BÃlka <neleai@seznam.cz> wrote:
>>>> On Tue, Oct 02, 2012 at 03:31:50PM -0400, Mike Frysinger wrote:
>>>>> On Tuesday 02 October 2012 15:20:54 H.J. Lu wrote:
>>>>> > On Tue, Oct 2, 2012 at 12:02 PM, Mike Frysinger <vapier@gentoo.org> wrote:
>>>>> > > On Tuesday 02 October 2012 09:53:25 H.J. Lu wrote:
>>>>> > >> This patch compiles AVX libm functions with -mavx. It reduces text size
>>>>> > >
>>>>> > >> of libm.so by about 1%:
>>>>> > > looks like you're reverting 56f6f6a2403cfa7267cad722597113be35ecf70d.
>>>>> > > shouldn't you revert all of it and not just change the CFLAGS back ?
>>>>> >
>>>>> > Doesn't this patch:
>>>>> >
>>>>> > http://sourceware.org/ml/libc-alpha/2012-10/msg00055.html
>>>>> >
>>>>> > do that?
>>>>>
>>>>> yes, i missed the follow up
>>>>>
>>>>> > > it'd be useful to know *why* Ulrich moved away from -mavx, but
>>>>> > > unfortunately his commit message is useless.
>>>>> >
>>>>> > I can only guess:
>>>>>
>>>>> might be useful to put some notes (like referring to the older commit) into
>>>>> the commit message when you do commit things
>>>>> -mike
>>>>
>>>> could it be a 60 cycle penalty when switching between legagy sse and avx
>>>> state?
>>>
>>> This true. We can use -mprefer-avx128 to make sure that only 128bit AVX
>>> instructions are used.
>>>
>>> --
>>> H.J.
>>
>> The latency for switching between old SSE and new (AVX-style
>
> Latency comes from switching between the 128-bit SSE context and
> the 256-bit AVX context. If we only use the lower 128-bit AVX context,
> there is no latency.
I'm having a hard time confirming that.
>From pages 53/54 of the pdf -- http://software.intel.com/file/36945 :
> However, there is a performance impact with intermixing VEX-encoded SIMD
> instructions (AVX, FMA) and legacy SSE instructions that only operate on
> the XMM register state.
And more to the point:
> Intermixed 256-bit, 128-bit or scalar SIMD instructions that are encoded
> with VEX prefixes have no transition delay due to internal state management.
>> 3-operand) form is what causes the penalty. What is the purpose of
>> -mprefer-avx128? I can't find a description of it online.
>
> I just fixed it:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54785
>
> -mprefer-avx128 will avoid 256-bit AVX instructions. Only 128-bit
> AVX instructions are generated. It has the same effect on context
> switch as -msse2avx.
I think that your claim is that legacy 128-bit SSE + 256-bit AVX
produces stalls, but I believe the documentation to say that it's
VEX-prefixed instructions in general (256-bit or otherwise) plus
legacy SSE instructions that lead to stalls.