This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] X86-64: Add _dl_runtime_resolve_avx[512]_opt [BZ #20508]


On Thu, Sep 1, 2016 at 2:52 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Aug 30, 2016 at 1:30 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Mon, Aug 29, 2016 at 5:01 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Mon, Aug 29, 2016 at 4:07 PM, Richard Henderson <rth@twiddle.net> wrote:
>>>> On 08/26/2016 10:18 AM, H.J. Lu wrote:
>>>>>
>>>>> +       vpcmpeqd %xmm8, %xmm8, %xmm8
>>>>> +       vorpd %ymm9, %ymm10, %ymm10
>>>>> +       vptest %ymm10, %ymm8
>>>>
>>>>
>>>> No need to create a mask of all -1; use vptest ymm10, ymm10.
>>>>
>>>
>>> ymm8 isn't all -1.  Only the lower 128 bis are all -1:
>>>
>>>
>>> (gdb) p/x $ymm8
>>> $4 = {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {
>>>     0x8000000000000000, 0x8000000000000000, 0x0, 0x0}, v32_int8 = {
>>>     0xff <repeats 16 times>, 0x0 <repeats 16 times>}, v16_int16 = {0xffff,
>>>     0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0x0, 0x0, 0x0,
>>>     0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0xffffffff, 0xffffffff, 0xffffffff,
>>>     0xffffffff, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0xffffffffffffffff,
>>>     0xffffffffffffffff, 0x0, 0x0}, v2_int128 = {
>>>     0xffffffffffffffffffffffffffffffff, 0x00000000000000000000000000000000}}
>>> (gdb)
>>>
>>> ymm10 (ymm0|..|ymm7) has
>>>
>>> (gdb) p/x $ymm10
>>> $2 = {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {
>>>     0x8000000000000000, 0x8000000000000000, 0x0, 0x0}, v32_int8 = {0x6d,
>>>     0x79, 0x72, 0x6f, 0x7f, 0x74, 0x6f, 0x73, 0x77, 0x6f, 0x6f, 0x67, 0x6f,
>>>     0xff, 0x6f, 0xff, 0x0 <repeats 16 times>}, v16_int16 = {0x796d, 0x6f72,
>>>     0x747f, 0x736f, 0x6f77, 0x676f, 0xff6f, 0xff6f, 0x0, 0x0, 0x0, 0x0, 0x0,
>>>     0x0, 0x0, 0x0}, v8_int32 = {0x6f72796d, 0x736f747f, 0x676f6f77,
>>>     0xff6fff6f, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x736f747f6f72796d,
>>>     0xff6fff6f676f6f77, 0x0, 0x0}, v2_int128 = {
>>>     0xff6fff6f676f6f77736f747f6f72796d, 0x00000000000000000000000000000000}}
>>>
>>> Since
>>>
>>> vptest %ymm10, %ymm8
>>>
>>> IF (SRC[255:0] BITWISE AND NOT DEST[255:0] = 0)
>>> THEN CF = 1;
>>> ELSE CF = 0;
>>>
>>> this ignores the lower 128 bits of ymm10 and sets CF = 0
>>> only if the upper 128 bits of ymm10 aren't zero.  If we use
>>>
>>> vptest ymm10, ymm10
>>>
>>> CF is always 1 and we will always preserve ymm0-ymm7 even
>>> when the upper 128 bits are zero.
>>>
>>
>> Here is the updated patch to add PRESERVE_BND_REGS_PREFIX
>> before branches.  Otherwise bound registers will be cleared.  OK
>> for master?
>>
>
> Any comments? I will check it in next week if there is no objection.

I'd like to backport it to 2.23 and 2.24 branches.  Any objections?



-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]