This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Save and restore xmm0-xmm7 in _dl_runtime_resolve

From: OndÅej BÃlka <neleai at seznam dot cz>
To: "H.J. Lu" <hjl dot tools at gmail dot com>
Cc: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
Date: Sun, 26 Jul 2015 15:16:22 +0200
Subject: Re: [PATCH] Save and restore xmm0-xmm7 in _dl_runtime_resolve
Authentication-results: sourceware.org; auth=none
References: <CAMe9rOoUr1fjCHDsd+kbiFZ5KL_HoDB_GG65epxqpX7AcocvZw at mail dot gmail dot com> <0EFAB2BDD0F67E4FB6CCC8B9F87D75696A9220AE at IRSMSX101 dot ger dot corp dot intel dot com> <CAMe9rOo81zoKpt+QmmVYjfjV2=KwLkYiqeADv3kMyeouM+9uug at mail dot gmail dot com> <0EFAB2BDD0F67E4FB6CCC8B9F87D75696A9235F6 at IRSMSX101 dot ger dot corp dot intel dot com> <CAMe9rOoXLPUr_LUexoRKjrCdNhP0J8EMY+1XNAaLnpW1qknb7w at mail dot gmail dot com> <20150709142827 dot GA18030 at domone> <CAMe9rOoXCwiPdQVP7_tV7599f6y9w_n1P+SXsE7urb69f3v7gA at mail dot gmail dot com> <20150711104654 dot GA26570 at domone> <20150711202742 dot GA9074 at gmail dot com> <20150711235002 dot GA7543 at gmail dot com>

On Sat, Jul 11, 2015 at 04:50:02PM -0700, H.J. Lu wrote:
> On Sat, Jul 11, 2015 at 01:27:42PM -0700, H.J. Lu wrote:
> > On Sat, Jul 11, 2015 at 12:46:54PM +0200, OndÅej BÃlka wrote:
> > > On Thu, Jul 09, 2015 at 09:07:24AM -0700, H.J. Lu wrote:
> > > > On Thu, Jul 9, 2015 at 7:28 AM, OndÅej BÃlka <neleai@seznam.cz> wrote:
> > > > > On Thu, Jul 09, 2015 at 07:12:24AM -0700, H.J. Lu wrote:
> > > > >> On Thu, Jul 9, 2015 at 6:37 AM, Zamyatin, Igor <igor.zamyatin@intel.com> wrote:
> > > > >> >> On Wed, Jul 8, 2015 at 8:56 AM, Zamyatin, Igor <igor.zamyatin@intel.com>
> > > > >> >> wrote:
> > > > >> >> > Fixed in the attached patch
> > > > >> >> >
> > > > >> >>
> > > > >> >> I fixed some typos and updated sysdeps/i386/configure for
> > > > >> >> HAVE_MPX_SUPPORT.  Please verify both with HAVE_MPX_SUPPORT and
> > > > >> >> without on i386 and x86-64.
> > > > >> >
> > > > >> > Done, all works fine
> > > > >> >
> > > > >>
> > > > >> I checked it in for you.
> > > > >>
> > > > > These are nice but you could have same problem with lazy tls allocation.
> > > > > I wrote patch to merge trampolines, which now conflicts. Could you write
> > > > > similar patch to solve that? Original purpose was to always save xmm
> > > > > registers so we could use sse2 routines which speeds up lookup time.
> > > > 
> > > > So we will preserve only xmm0 to xmm7 in _dl_runtime_resolve? How
> > > > much gain it will give us?
> > > >
> > > I couldn't measure that without patch. Gain now would be big as we now
> > > use byte-by-byte loop to check symbol name which is slow, especially
> > > with c++ name mangling. Would be following benchmark good to measure
> > > speedup or do I need to measure startup time which is bit harder?
> > > 
> > 
> > Please try this.
> > 
> 
> We have to use movups instead of movaps due to
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
> 
>
Thanks, this looks promising.

I think how to do definite benchmark, Now I have evidence that its
likely improvement but not definite.

I found that benchmark that i intended causes too much noise and I
didn't get useful from that yet. It was creating 1000 functions in
library and calling them from main where performance between runs vary
by factor of 3 for same implementation.

I have indirect evidence. With attached patch to use sse2 routines I
decreased startup time of running binaries when you run "make bench" 
by ~6000 cycles and dlopen time by 4% on haswell and ivy bridge.

See results on haswell of 

LD_DEBUG=statistics make bench &> old_rtld

that are large so you could browse these here

http://kam.mff.cuni.cz/~ondra/old_rtld
http://kam.mff.cuni.cz/~ondra/new_rtld

For dlopen benchmark I measure ten times performance of
dlopen(RTLD_DEFAULT,"memcpy");
dlopen(RTLD_DEFAULT,"strlen");

Without patch I get
 624.49  559.58  556.6 556.04  558.42  557.86  559.46  555.17  556.93  555.32
and with patch
  604.71  536.74  536.08  535.78  534.11  533.67  534.8 534.8 533.46 536.08

I attached vip patches, I didn't change memcpy yet.

So if you have idea how directly measure fixup change it would be
welcome.

Attachment: 0002-dlopen-benchmark.patch
Description: Text document

Attachment: 0004-rtld.patch
Description: Text document

Follow-Ups:
- Re: [PATCH] Save and restore xmm0-xmm7 in _dl_runtime_resolve
  - From: H.J. Lu

References:
- Re: [PATCH, MPX] MPX-specific changes in dl_runtime routines
  - From: H.J. Lu
- RE: [PATCH, MPX] MPX-specific changes in dl_runtime routines
  - From: Zamyatin, Igor
- Re: [PATCH, MPX] MPX-specific changes in dl_runtime routines
  - From: H.J. Lu
- RE: [PATCH, MPX] MPX-specific changes in dl_runtime routines
  - From: Zamyatin, Igor
- Re: [PATCH, MPX] MPX-specific changes in dl_runtime routines
  - From: H.J. Lu
- Re: [PATCH, MPX] MPX-specific changes in dl_runtime routines
  - From: OndÅej BÃlka
- Re: [PATCH, MPX] MPX-specific changes in dl_runtime routines
  - From: H.J. Lu
- Re: [PATCH, MPX] MPX-specific changes in dl_runtime routines
  - From: OndÅej BÃlka
- Re: [PATCH, MPX] MPX-specific changes in dl_runtime routines
  - From: H.J. Lu
- [PATCH] Save and restore xmm0-xmm7 in _dl_runtime_resolve
  - From: H.J. Lu

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]