This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
RE: Gcc builtin review: strcpy, stpcpy, strcat, stpcat?
- From: "Wilco Dijkstra" <wdijkstr at arm dot com>
- To: 'OndÅej BÃlka' <neleai at seznam dot cz>
- Cc: "'Richard Earnshaw'" <Richard dot Earnshaw at foss dot arm dot com>, "GNU C Library" <libc-alpha at sourceware dot org>
- Date: Wed, 10 Jun 2015 11:35:30 +0100
- Subject: RE: Gcc builtin review: strcpy, stpcpy, strcat, stpcat?
- Authentication-results: sourceware.org; auth=none
- References: <A610E03AD50BFC4D95529A36D37FA55E769B14FEFF at GEORGE dot Emea dot Arm dot com> <000901d09ecd$5dc2b4b0$19481e10$ at com> <20150609085323 dot GB26925 at domone>
> OndÅej BÃlka wrote:
> On Thu, Jun 04, 2015 at 02:50:07PM +0100, Wilco Dijkstra wrote:
> > The usual problem of knowing whether all targets define assembler versions of
> > stpcpy applies - so I don't think it is a good idea to change all strcpy into
> > stpcpy in general. The only useful case is strcpy(x,y)+strlen(x) which could
> > potentially give a major speedup.
> >
> Then its situation where it decision depends on implementation details,
> as on some architectures you could save some cycles with stpcpy itself.
Yes, I think the optimization to convert strcpy into stpcpy would need
to be done in a target specific way in GLIBC headers for targets where it
makes sense. It's not something you could easily do in GCC as stpcpy is
not a standard function. In general it is best to optimize to use simpler,
standard C90 functions (eg. mempcpy->memcpy eventhough mempcpy might
be a better ABI to standardize on).
> As useful cases, on gcc thread I said that gcc could use available
> length to convert strchr to memchr and similar optimizations so strcpy
> will be called more.
>
> Then as I mentioned cache issues so far I measured mostly noise. I know
> that overall stpcpy is often five times less called than strcpy, so
> potential is there but it depends on actual savings when strcpy costs
> cycle less.
> Data about strcpy and stpcpy when running make of zlib with debian gcc-5
> are following:
>
> ./summary_strcpy calls 52218 average n: 71.0
> ./summary_stpcpy calls 4950 average n: 7.5
This says that stpcpy processes only 1% of the data that strcpy does,
so that means optimization of strcpy is 100 times more important. Ie.
slowing down strcpy just to share with stpcpy does not make any sense.
Also given the relatively small strings the generic version of stpcpy would
be quite competitive already (the generic version using strlen+memcpy was
beating optimized strcpy/stpcpy implementations on several targets at the
time I made the change). So I'm just not convinced stpcpy needs a lot more
optimization.
Wilco