This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] avoid initialization overhead in strcoll
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Siddhesh Poyarekar <siddhesh at redhat dot com>
- Cc: Leonhard Holz <leonhard dot holz at web dot de>, libc-alpha at sourceware dot org
- Date: Thu, 4 Jun 2015 22:02:58 +0200
- Subject: Re: [PATCH] avoid initialization overhead in strcoll
- Authentication-results: sourceware.org; auth=none
- References: <55673695 dot 8010804 at web dot de> <20150604183837 dot GR32684 at spoyarek dot pnq dot redhat dot com>
On Fri, Jun 05, 2015 at 12:08:37AM +0530, Siddhesh Poyarekar wrote:
> On Thu, May 28, 2015 at 05:39:01PM +0200, Leonhard Holz wrote:
> > This patch removes some initialization overhead in the hot path of strcoll. It
> > improves the file listing benchmark by 20% on x86.
>
> You've removed a memset call which seems to be the reason for the 20%
> overhead; I can't see anything else that could have done this, but I
> haven't looked hard enough.
>
> This change however should be split up to only include the useful
> stuff.
>
> > diff --git a/string/strcoll_l.c b/string/strcoll_l.c
> > index 0fa005f..14bfb8b 100644
> > --- a/string/strcoll_l.c
> > +++ b/string/strcoll_l.c
> > @@ -62,7 +62,6 @@ typedef struct
> > int len; /* Length of the current sequence. */
> > size_t val; /* Position of the sequence relative to the
> > previous non-ignored sequence. */
> > - size_t idxnow; /* Current index in sequences. */
>
> This should go in as a separate change since it is valid without any
> performance justification. idxnow is unused. Please post this as a
> separate patch.
>
> > size_t idxmax; /* Maximum index in sequences. */
> > size_t idxcnt; /* Current count of indices. */
> > size_t backw; /* Current Backward sequence index. */
> > @@ -313,20 +312,22 @@ STRCOLL (const STRING_TYPE *s1, const STRING_TYPE *s2,
> > __locale_t l)
> > assert (((uintptr_t) extra) % __alignof__ (extra[0]) == 0);
> > assert (((uintptr_t) indirect) % __alignof__ (indirect[0]) == 0);
> >
> > - int result = 0, rule = 0;
> > -
> > + int result = 0;
> > coll_seq seq1, seq2;
> > - memset (&seq1, 0, sizeof (seq1));
> > - seq2 = seq1;
> > + seq1.len = 0;
> > + seq1.idxmax = 0;
> > + seq1.rule = 0;
> > + seq2.len = 0;
> > + seq2.idxmax = 0;
>
> I'm surprised that this actually makes a difference. The memset
> should have been optimized out and replaced with movs. The change to
> remove the RULE variable seems unrelated to me and should just be
> avoided if it does not result in any interesting code change..
>
> Can you please post a more detailed justification of what caused the
> performance improvement?
>
Main difference from assembly output is saving 64 bytes on stack space,
otherwise looks like random renaming of jumps.