This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] X86-64: Add _dl_runtime_resolve_avx[512]_opt [BZ #20508]
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: Richard Henderson <rth at twiddle dot net>
- Cc: Florian Weimer <fweimer at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Mon, 29 Aug 2016 17:01:14 -0700
- Subject: Re: [PATCH] X86-64: Add _dl_runtime_resolve_avx[512]_opt [BZ #20508]
- Authentication-results: sourceware.org; auth=none
- References: <CAMe9rOojpuFz1jTbMpNcqZK1KVDqaWozNuEuS3E67dvD3Rh=hw@mail.gmail.com> <c6e21847-bc95-3c75-ed54-798c62194072@redhat.com> <CAMe9rOp=bvrk0MBF8SrP4JqLHembRdeyfGNqNkcVWvoEs3R=tA@mail.gmail.com> <CAMe9rOo7MmGNPKct=AzzbtR564yH1P96tUcLD7pFv7GtxF3-Ng@mail.gmail.com> <48eeea78-99e8-f255-bd26-b6d28929b4f0@twiddle.net>
On Mon, Aug 29, 2016 at 4:07 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 08/26/2016 10:18 AM, H.J. Lu wrote:
>>
>> + vpcmpeqd %xmm8, %xmm8, %xmm8
>> + vorpd %ymm9, %ymm10, %ymm10
>> + vptest %ymm10, %ymm8
>
>
> No need to create a mask of all -1; use vptest ymm10, ymm10.
>
ymm8 isn't all -1. Only the lower 128 bis are all -1:
(gdb) p/x $ymm8
$4 = {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {
0x8000000000000000, 0x8000000000000000, 0x0, 0x0}, v32_int8 = {
0xff <repeats 16 times>, 0x0 <repeats 16 times>}, v16_int16 = {0xffff,
0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0xffffffffffffffff,
0xffffffffffffffff, 0x0, 0x0}, v2_int128 = {
0xffffffffffffffffffffffffffffffff, 0x00000000000000000000000000000000}}
(gdb)
ymm10 (ymm0|..|ymm7) has
(gdb) p/x $ymm10
$2 = {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {
0x8000000000000000, 0x8000000000000000, 0x0, 0x0}, v32_int8 = {0x6d,
0x79, 0x72, 0x6f, 0x7f, 0x74, 0x6f, 0x73, 0x77, 0x6f, 0x6f, 0x67, 0x6f,
0xff, 0x6f, 0xff, 0x0 <repeats 16 times>}, v16_int16 = {0x796d, 0x6f72,
0x747f, 0x736f, 0x6f77, 0x676f, 0xff6f, 0xff6f, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0}, v8_int32 = {0x6f72796d, 0x736f747f, 0x676f6f77,
0xff6fff6f, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x736f747f6f72796d,
0xff6fff6f676f6f77, 0x0, 0x0}, v2_int128 = {
0xff6fff6f676f6f77736f747f6f72796d, 0x00000000000000000000000000000000}}
Since
vptest %ymm10, %ymm8
IF (SRC[255:0] BITWISE AND NOT DEST[255:0] = 0)
THEN CF = 1;
ELSE CF = 0;
this ignores the lower 128 bits of ymm10 and sets CF = 0
only if the upper 128 bits of ymm10 aren't zero. If we use
vptest ymm10, ymm10
CF is always 1 and we will always preserve ymm0-ymm7 even
when the upper 128 bits are zero.
--
H.J.