This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH 4/*] Generic string memchr and strnlen


> OndÅej BÃlka wrote:
> On Fri, Jul 24, 2015 at 05:38:43PM +0100, Wilco Dijkstra wrote:
> > > OndÅej BÃlka wrote:
> > > On Fri, Jul 24, 2015 at 04:10:24PM +0100, Wilco Dijkstra wrote:
> > > > Getting back to this, if you don't have an optimized strnlen then
> > > > it is always better to try to use memchr (there are 14 optimized
> > > > implementations of memchr but only 6 for strnlen).
> > > >
> > > > So I'd suggest changing strnlen in an independent patch as:
> > > >
> > > > __strnlen (const char *str, size_t n)
> > > > {
> > > >   char *ret = __memchr (str, 0, n);
> > > >   return ret ? ret - str : n;
> > > > }
> > > >
> > > > It also looks worthwhile to express strlen and rawmemchr as memchr
> > > > so that you only need one highly optimized function rather than many.
> > > > Deferring to more widely implemented optimized assembler functions
> > > > should result in better performance than trying to optimize these
> > > > functions in C.
> > > >
> > > No, that is bad idea. Unless you inline strnlen or memchr then you add
> > > extra call overhead.
> >
> > The goal is to call the optimized assembler version of memchr when there
> > isn't one for strnlen - you could inline the above in headers if a target
> > decides that there will only be an optimized memchr and not a strnlen
> > (assuming that strnlen shows similar performance as memchr on a particular
> > target).
> >
> Which as I explained is worse than alternatives, unless saving size.

Which alternatives? I didn't see a mention of an alternative that would
actually be faster.

> > > That is unless you want to claim that you want to save size.
> > >
> > > As for optimized implementations of strnlen vs memchr it isn't clear
> > > that we will delete all of them as they are slower.
> >
> > Delete what? We could certainly decide on a core set of functions which
> > every target should implement in assembler. Candidates are memcpy, memset,
> > memmove, memchr, strchr, strlen. Then for those we do not try to provide
> > an optimized C implementation as it won't ever be used. But deleting them
> > seems a bridge too far.
> >
> This patch is about generic string functions. When they have good
> performance they will replace current ones for architectures. So soon
> there won't be architecture where it holds.

I'd find it hard to believe you can beat assembly implementations. Do you
have any performance results for your patches? There were a lot of patches
posted but I don't recall any performance results in any.

> > > Also its wrong way to solve it, a architecture maintainer should add
> > > optimized strnlen implementations, that quite easy when you have memchr
> > > implementation, add few macros to initially add start and different end
> > > handling.
> >
> > The problem with the non-standard functions that are rarely used is that
> > there are very few optimized implementations. We can't force maintainers to
> > implement all string functions in assembler, so the generic code should use
> > the fastest possible alternative if there isn't an optimized implementation.
> > And that is pretty much always a more commonly used function which does
> > have an optimized implementation.
> >
> But that isn't about what I said. I said that if there is optimized
> memchr implementation then other function assembly is trivial to add for
> maintainer. That gives you better performance.

That's only possible in a few cases. I'm talking about missing optimized
implementations. Are you saying we should continue to use slow C code rather
than trying to call an optimized assembler function?

> > > Suggestion to express strlen as memchr would just cause regression. On
> > > my system there happened 9535682 calls of strlen while memchr was called
> > > just 11633 times and rawmemchr 1742 times.
> >
> > Why would it cause a regression? If you don't have an optimized strlen,
> > what other implementation would be the fastest alternative?
> >
> It would be my generic strlen implementation. If you don't have
> optimized strlen then you certainly don't have optimized memchr that is
> called 819 times less often.

Well I'd like to see results that show a C version of strlen beating an
optimized memchr on x64. Still it seems to me there is no real need for an
optimized C version of strlen - every target already provides an optimized
version and it is hard to believe it is possible to beat those.

> > > Also purpose of strlen and rawmechr is to be faster than memchr. Again
> > > these should be implemented by architecture maintainer by removing size
> > > checks from memchr implementation.
> >
> > Yes it would be perfect if we had optimized assembler implementations for
> > all functions. However that's unfortunately not the case given there is a
> > high cost for creating assembler implementations.
> 
> No, there isn't. If you have optimized memchr then deriving these is
> simple mechanic work. Just do equivalent of dead code elimination on
> memchr and you will get strlen.

That's only true for few cases. Note given its rarity, it seems better
to change any call to rawmemchr into memchr(s, c, SIZE_MAX) - the gain
due to cache sharing should far outweigh the small loss due to the extra
length checks.

Wilco



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]