This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v3 04/18] Add string vectorized find and detection functions



On 11/01/2018 16:54, Adhemerval Zanella wrote:
> 
> 
> On 11/01/2018 14:47, Paul Eggert wrote:
>> On 01/10/2018 04:47 AM, Adhemerval Zanella wrote:
>>> +  op_t lsb = (op_t)-1 / 0xff;
>>> +  op_t msb = lsb << (CHAR_BIT - 1);
>> This would be simpler and clearer if it were rewritten as:
>>
>>     opt_t lsb = repeat_bytes (0x01);
>>     opt_t msb = repeat_bytes (0x80);
>>
>> There are several other opportunities for this kind of simplification.
> 
> Indeed, I changed it locally
> 
>>
>>> +static inline op_t
>>> +find_zero_eq_low (op_t x1, op_t x2)
>>> +{
>>> +  op_t lsb = (op_t)-1 / 0xff;
>>> +  op_t msb = lsb << (CHAR_BIT - 1);
>>> +  op_t eq = x1 ^ x2;
>>> +  return (((x1 - lsb) & ~x1) | ((eq - lsb) & ~eq)) & msb;
>>> +}
>>
>> How about the following simpler implementation instead? I expect it's just as fast:
>>
>>    return find_zero_low (x1) | find_zero_low (x1 ^ x2);
>>
>> Similarly for find_zero_eq_all, find_zero_ne_low, find_zero_ne_all.
> 
> I think this seems ok and code generation for at least aarch64, powerpc64le,
> sparc64, and x86_64 seems similar.

While trying to compose find_zero_new_{low,all} with find_zero_{low,all}
made me not so sure if it would be gain. To accomplish we will need to add
another operation, such as:

---
static inline op_t
find_zero_ne_low (op_t x1, op_t x2)
{
  op_t x = repeat_bytes (0x80);
  return find_zero_low (x1) | (find_zero_low (x1 ^ x2) ^ x);
}
---

Which seems slight worse than current regarding generated instructions.
Using GCC 7.2.1 for x86_64 I see:

* Patch version:

find_zero_ne_low.constprop.0:
.LFB28: 
        .cfi_startproc
        movabsq $1229782938247303441, %rdx
        movq    %rdx, %rcx
        movabsq $9187201950435737471, %rdx
        leaq    (%rdi,%rdx), %rax
        xorq    %rdi, %rcx
        addq    %rcx, %rdx
        orq     %rdi, %rax
        orq     %rcx, %rdx
        movabsq $-9187201950435737472, %rdi
        notq    %rax
        orq     %rdx, %rax
        andq    %rdi, %rax
        ret


* find_zero_low version above:

find_zero_ne_low.constprop.0:
.LFB28: 
        .cfi_startproc
        movabsq $1229782938247303441, %rax
        movabsq $-72340172838076673, %rdx
        movabsq $-1229782938247303442, %rcx
        xorq    %rdi, %rax
        xorq    %rdi, %rcx
        addq    %rdx, %rax
        addq    %rdi, %rdx
        notq    %rdi
        andq    %rcx, %rax
        andq    %rdi, %rdx
        movabsq $-9187201950435737472, %rdi
        notq    %rax
        orq     %rdx, %rax
        andq    %rdi, %rax
        ret


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]