This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[PING^5][PATCH 3/2] Use strspn/strcspn/strpbrk ifunc in internal calls.

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Carlos O'Donell <carlos at redhat dot com>
Cc: libc-alpha at sourceware dot org
Date: Sat, 24 May 2014 01:23:13 +0200
Subject: [PING^5][PATCH 3/2] Use strspn/strcspn/strpbrk ifunc in internal calls.
Authentication-results: sourceware.org; auth=none
References: <5318A03D dot 3000705 at redhat dot com> <20140306163241 dot GA11843 at domone dot podge> <5318B58B dot 5040704 at redhat dot com> <20140306205212 dot GB11843 at domone dot podge> <53192422 dot 2050101 at redhat dot com> <20140318100138 dot GC8415 at domone dot podge> <20140327211806 dot GA23645 at domone dot podge> <20140405144841 dot GA25242 at domone dot podge> <20140412192447 dot GA1608 at domone dot podge> <20140512120011 dot GB7220 at domone dot podge>

ping
On Mon, May 12, 2014 at 02:00:11PM +0200, OndÅej BÃlka wrote:
> ping
> On Sat, Apr 12, 2014 at 09:24:47PM +0200, OndÅej BÃlka wrote:
> > On Sat, Apr 05, 2014 at 04:48:41PM +0200, OndÅej BÃlka wrote:
> > > ping
> > > On Thu, Mar 27, 2014 at 10:18:06PM +0100, OndÅej BÃlka wrote:
> > > > ping
> > > > On Tue, Mar 18, 2014 at 11:01:38AM +0100, OndÅej BÃlka wrote:
> > > > > To make a strtok faster and improve performance in general we need to do one
> > > > > additional change.
> > > > > 
> > > > > A comment:
> > > > > 
> > > > > /* It doesn't make sense to send libc-internal strcspn calls through a PLT.
> > > > >    The speedup we get from using SSE4.2 instruction is likely eaten away
> > > > >    by the indirect call in the PLT.  */
> > > > > 
> > > > > Does not make sense at all because nobody bothered to check it. Gap
> > > > > between these implementations is quite big, when haystack is empty a
> > > > > sse2 is around 40 cycles slower because it needs to populate a lookup
> > > > > table and difference only increases with size. That is much bigger than
> > > > > plt slowdown which is few cycles.
> > > > > 
> > > > > Even benchtest show a gap which also may be reverse by branch
> > > > > misprediction but my internal benchmark shown.
> > > > > 
> > > > >  simple_strspn stupid_strspn __strspn_sse42  __strspn_sse2
> > > > > Length    0, alignment  0, acc len  6:  18.6562 35.2344 17.0469 61.6719
> > > > > Length    6, alignment  0, acc len  6:  59.5469 72.5781 16.4219 73.625
> > > > > 
> > > > > This patch also handles strpbrk which is implemented by including a
> > > > > x86_64/multiarch/strcspn.S file.
> > > > > 
> > > > > 	* sysdeps/x86_64/multiarch/strspn.S: Remove plt indirection.
> > > > > 	* sysdeps/x86_64/multiarch/strcspn.S: Likewise.
> > > > > 
> > > > > diff --git a/sysdeps/x86_64/multiarch/strcspn.S b/sysdeps/x86_64/multiarch/strcspn.S
> > > > > index 24f55e9..1b3e1aa 100644
> > > > > --- a/sysdeps/x86_64/multiarch/strcspn.S
> > > > > +++ b/sysdeps/x86_64/multiarch/strcspn.S
> > > > > @@ -65,14 +65,7 @@ END(STRCSPN)
> > > > >  # undef END
> > > > >  # define END(name) \
> > > > >  	cfi_endproc; .size STRCSPN_SSE2, .-STRCSPN_SSE2
> > > > > -# undef libc_hidden_builtin_def
> > > > > -/* It doesn't make sense to send libc-internal strcspn calls through a PLT.
> > > > > -   The speedup we get from using SSE4.2 instruction is likely eaten away
> > > > > -   by the indirect call in the PLT.  */
> > > > > -# define libc_hidden_builtin_def(name) \
> > > > > -	.globl __GI_STRCSPN; __GI_STRCSPN = STRCSPN_SSE2
> > > > >  #endif
> > > > > -
> > > > >  #endif /* HAVE_SSE4_SUPPORT */
> > > > >  
> > > > >  #ifdef USE_AS_STRPBRK
> > > > > diff --git a/sysdeps/x86_64/multiarch/strspn.S b/sysdeps/x86_64/multiarch/strspn.S
> > > > > index bf7308e..fde1e1e 100644
> > > > > --- a/sysdeps/x86_64/multiarch/strspn.S
> > > > > +++ b/sysdeps/x86_64/multiarch/strspn.S
> > > > > @@ -50,12 +50,6 @@ END(strspn)
> > > > >  # undef END
> > > > >  # define END(name) \
> > > > >  	cfi_endproc; .size __strspn_sse2, .-__strspn_sse2
> > > > > -# undef libc_hidden_builtin_def
> > > > > -/* It doesn't make sense to send libc-internal strspn calls through a PLT.
> > > > > -   The speedup we get from using SSE4.2 instruction is likely eaten away
> > > > > -   by the indirect call in the PLT.  */
> > > > > -# define libc_hidden_builtin_def(name) \
> > > > > -	.globl __GI_strspn; __GI_strspn = __strspn_sse2
> > > > >  #endif
> > > > >  
> > > > >  #endif /* HAVE_SSE4_SUPPORT */
> > > > 
> > > > -- 
> > > > 
> > > > Too many little pins on CPU confusing it, bend back and forth until 10-20% are neatly removed. Do _not_ leave metal bits visible!
> > > 
> > > -- 
> > > 
> > > Look, buddy:  Windows 3.1 IS A General Protection Fault.
> > 
> > -- 
> > 
> > Failure to adjust for daylight savings time.
> 
> -- 
> 
> monitor VLF leakage

-- 

Stale file handle (next time use Tupperware(tm)!)

References:
- [PING^4][PATCH 3/2] Use strspn/strcspn/strpbrk ifunc in internal calls.
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]