This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] X86-64: Add _dl_runtime_resolve_avx[512]_opt [BZ #20508]


On Mon, Aug 29, 2016 at 4:07 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 08/26/2016 10:18 AM, H.J. Lu wrote:
>>
>> +       vpcmpeqd %xmm8, %xmm8, %xmm8
>> +       vorpd %ymm9, %ymm10, %ymm10
>> +       vptest %ymm10, %ymm8
>
>
> No need to create a mask of all -1; use vptest ymm10, ymm10.
>

ymm8 isn't all -1.  Only the lower 128 bis are all -1:


(gdb) p/x $ymm8
$4 = {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {
    0x8000000000000000, 0x8000000000000000, 0x0, 0x0}, v32_int8 = {
    0xff <repeats 16 times>, 0x0 <repeats 16 times>}, v16_int16 = {0xffff,
    0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0x0, 0x0, 0x0,
    0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0xffffffff, 0xffffffff, 0xffffffff,
    0xffffffff, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0xffffffffffffffff,
    0xffffffffffffffff, 0x0, 0x0}, v2_int128 = {
    0xffffffffffffffffffffffffffffffff, 0x00000000000000000000000000000000}}
(gdb)

ymm10 (ymm0|..|ymm7) has

(gdb) p/x $ymm10
$2 = {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {
    0x8000000000000000, 0x8000000000000000, 0x0, 0x0}, v32_int8 = {0x6d,
    0x79, 0x72, 0x6f, 0x7f, 0x74, 0x6f, 0x73, 0x77, 0x6f, 0x6f, 0x67, 0x6f,
    0xff, 0x6f, 0xff, 0x0 <repeats 16 times>}, v16_int16 = {0x796d, 0x6f72,
    0x747f, 0x736f, 0x6f77, 0x676f, 0xff6f, 0xff6f, 0x0, 0x0, 0x0, 0x0, 0x0,
    0x0, 0x0, 0x0}, v8_int32 = {0x6f72796d, 0x736f747f, 0x676f6f77,
    0xff6fff6f, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x736f747f6f72796d,
    0xff6fff6f676f6f77, 0x0, 0x0}, v2_int128 = {
    0xff6fff6f676f6f77736f747f6f72796d, 0x00000000000000000000000000000000}}

Since

vptest %ymm10, %ymm8

IF (SRC[255:0] BITWISE AND NOT DEST[255:0] = 0)
THEN CF = 1;
ELSE CF = 0;

this ignores the lower 128 bits of ymm10 and sets CF = 0
only if the upper 128 bits of ymm10 aren't zero.  If we use

vptest ymm10, ymm10

CF is always 1 and we will always preserve ymm0-ymm7 even
when the upper 128 bits are zero.

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]