This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/*] Optimize generic strchrnul and strchr


Ondřej Bílka wrote:
> This is my generic strchr algorithm resubmitted to use skeleton.
>
> Idea to split into cases c<128 and c>128 didn't change.

Why do this?

> So comments? How this perform on different architectures?

In my view using 9 operations for a combined zero check and test 
for another character is too much, it should be 5-7 operations at 
most (the general form is (x - 0x01010101) & ~x & 0x80808080
which is just 3).

You can optimize things further by calculating partial masks for each
of the unrolled cases, ORing them together and only doing a single test
per loop iteration rather than 4 or 8. This also avoids adding a lot of
code and branches to the inner loop which makes the unrolling pointless.

The other thing is support for big-endian - this is generally tricky as
the mask returned by the zero check won't work even if byte-reversed.

Finally first_nonzero_byte should just use __builtin_ffsl (yet another
function that should be inlined by default in the generic string.h...).

Wilco



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]