This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Improve generic strcspn performance
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: 'GNU C Library' <libc-alpha at sourceware dot org>, "adhemerval dot zanella at linaro dot org" <adhemerval dot zanella at linaro dot org>
- Cc: nd <nd at arm dot com>
- Date: Mon, 18 Jan 2016 19:09:38 +0000
- Subject: Re: [PATCH] Improve generic strcspn performance
- Authentication-results: sourceware.org; auth=none
- Nodisclaimer: True
- References: <AM3PR08MB0088F82AB469E058650B493283F60 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:23
Adhemerval Zanella Netto - Jan. 8, 2016, 8:05 p.m. wrote:
> > + if (reject[0] == '\0')
> > + return strlen (str);
> > + if (reject[1] == '\0')
> > + return __strchrnul (str, reject [0]) - str;
>
> I am not sure how often strcspn is used with empty or one char argument to
> validate this optimization in specific since it adds more branch cases for
> more general inputs.
An empty string is extremely unlikely, however one and two characters seem to
occur frequently (grep the GLIC sources for str(c)spn/strpbrk). My goal was
to get rid of the odd inlines in the headers and enable the generic C implementation
to beat the special cases by a good margin. Compared to the overhead of the
initialization of the table, these extra checks cost very little (and once you check
for a single-character string, you also need to check for an empty string).
> > - return count;
> > + /* Use multiple small memsets to enable inlining on most targets. */
> > + p = memset (table, 0, 64);
> > + memset (p + 64, 0, 64);
> > + memset (p + 128, 0, 64);
> > + memset (p + 192, 0, 64);
>
> It is unfortunate we need to use this to force inline instead to let the
> compiler handle it directly (and also simplifying the code by using
> c99 initializers). I noted x86_64 does no inline, although aarch64 and
> powerpc64le calls memset. How bad is avoiding this explicit calls now
> and work on compiler side to detect this aligned memset?
Yes but unfortunately inlining of memset is essential to get reasonable
performance on small sizes. Eg. for sizes 30-60 the overhead of not inlining
is 25-30% on Cortex-A57.
We could maybe add a --param max-inline-memset=N option to a future GCC for
building GLIBC (or just these files), however this doesn't help when GLIBC is
built using any current GCC versions.
Another possibility might be to write a loop with stores of size_t and build with
a huge value for max-completely-peeled-insns. Or just give up and use macros
to write out all stores explicitly...
Wilco