This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: x86 optimization notes
- From: "Jan Beulich" <JBeulich at suse dot com>
- To: "H.J. Lu" <hjl dot tools at gmail dot com>
- Cc: "Binutils" <binutils at sourceware dot org>
- Date: Thu, 08 Mar 2018 06:59:50 -0700
- Subject: Re: x86 optimization notes
- Authentication-results: sourceware.org; auth=none
- References: <5AA1012A02000078001AFB01@prv-mh.provo.novell.com> <CAMe9rOp3cVkeEs4rzuVaKBc=ZS8r54M5cH3t9rVgqhgTC+Hw8Q@mail.gmail.com>
>>> On 08.03.18 at 13:56, <hjl.tools@gmail.com> wrote:
> On Thu, Mar 8, 2018 at 12:23 AM, Jan Beulich <JBeulich@suse.com> wrote:
>> H.J.,
>>
>> having taken another look at the optimizations you've added
>> recently, I have a couple of remarks to make:
>>
>> 1) I don't think optimizations should raise the ISA requirements.
>> The conversions you do from AVX512F to AVX512VL insns are in
>> direct contradiction to the Disp32 -> Disp8 conversion I had
>> suggested a couple of weeks ago, and that you objected to even if
>> done very carefully (I still intend to produce a patch to that effect,
>> to see whether you would want to reconsider). Since changing the
>> vector length doesn't alter the encoding length, and doesn't - afaict -
>> provide any other benefits, I don't think those conversions are
>> useful at all. All that is useful imo are conversions from EVEX to VEX.
>
> It does reduce the vector size which may reduce CPU power and boost
> CPU frequency. I am checking this patch to use AVX512VL only if it is
> enabled.
Hmm, for the moment this is indeed an option, but it'll be one more
thing needing to be re-thought when finally merging the AVX512VL
templates into their AVX512* base variants, which is the long term
goal with my template folding work (but will require quite a few
more preparatory steps).
I would assume that said displacement optimization could be done
using a similar cpu_arch_isa_flags check - would that address the
concerns you had earlier on?
>> 3) While merge masking indeed precludes the optimization, zeroing
>> masking doesn't - after all it doesn't matter for what reason the
>> respective part of the destination gets zeroed.
>
> Would you mind creating a patch to do that?
Also added to my todo list.
Jan