This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v3 04/18] Add string vectorized find and detection functions
On 11/01/2018 16:54, Adhemerval Zanella wrote:
>
>
> On 11/01/2018 14:47, Paul Eggert wrote:
>> On 01/10/2018 04:47 AM, Adhemerval Zanella wrote:
>>> + op_t lsb = (op_t)-1 / 0xff;
>>> + op_t msb = lsb << (CHAR_BIT - 1);
>> This would be simpler and clearer if it were rewritten as:
>>
>> opt_t lsb = repeat_bytes (0x01);
>> opt_t msb = repeat_bytes (0x80);
>>
>> There are several other opportunities for this kind of simplification.
>
> Indeed, I changed it locally
>
>>
>>> +static inline op_t
>>> +find_zero_eq_low (op_t x1, op_t x2)
>>> +{
>>> + op_t lsb = (op_t)-1 / 0xff;
>>> + op_t msb = lsb << (CHAR_BIT - 1);
>>> + op_t eq = x1 ^ x2;
>>> + return (((x1 - lsb) & ~x1) | ((eq - lsb) & ~eq)) & msb;
>>> +}
>>
>> How about the following simpler implementation instead? I expect it's just as fast:
>>
>> return find_zero_low (x1) | find_zero_low (x1 ^ x2);
>>
>> Similarly for find_zero_eq_all, find_zero_ne_low, find_zero_ne_all.
>
> I think this seems ok and code generation for at least aarch64, powerpc64le,
> sparc64, and x86_64 seems similar.
While trying to compose find_zero_new_{low,all} with find_zero_{low,all}
made me not so sure if it would be gain. To accomplish we will need to add
another operation, such as:
---
static inline op_t
find_zero_ne_low (op_t x1, op_t x2)
{
op_t x = repeat_bytes (0x80);
return find_zero_low (x1) | (find_zero_low (x1 ^ x2) ^ x);
}
---
Which seems slight worse than current regarding generated instructions.
Using GCC 7.2.1 for x86_64 I see:
* Patch version:
find_zero_ne_low.constprop.0:
.LFB28:
.cfi_startproc
movabsq $1229782938247303441, %rdx
movq %rdx, %rcx
movabsq $9187201950435737471, %rdx
leaq (%rdi,%rdx), %rax
xorq %rdi, %rcx
addq %rcx, %rdx
orq %rdi, %rax
orq %rcx, %rdx
movabsq $-9187201950435737472, %rdi
notq %rax
orq %rdx, %rax
andq %rdi, %rax
ret
* find_zero_low version above:
find_zero_ne_low.constprop.0:
.LFB28:
.cfi_startproc
movabsq $1229782938247303441, %rax
movabsq $-72340172838076673, %rdx
movabsq $-1229782938247303442, %rcx
xorq %rdi, %rax
xorq %rdi, %rcx
addq %rdx, %rax
addq %rdi, %rdx
notq %rdi
andq %rcx, %rax
andq %rdi, %rdx
movabsq $-9187201950435737472, %rdi
notq %rax
orq %rdx, %rax
andq %rdi, %rax
ret