This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Builtin expansion versus headers optimization: Reductions

From: Andi Kleen <andi at firstfloor dot org>
To: OndÅej BÃlka <neleai at seznam dot cz>
Cc: gcc at gcc dot gnu dot org, law at redhat dot org, libc-alpha at sourceware dot org
Date: Fri, 05 Jun 2015 09:40:45 -0700
Subject: Re: Builtin expansion versus headers optimization: Reductions
Authentication-results: sourceware.org; auth=none
References: <20150604105929 dot GA19141 at domone> <87fv67nonj dot fsf at tassilo dot jf dot intel dot com> <20150605090203 dot GA16032 at domone>

OndÅej BÃlka <neleai@seznam.cz> writes:
>
> On ivy bridge I got that Using rep stosq for memset(x,0,4096) is 20%
> slower than libcall for L1 cache resident data while 50% faster for data
> outside cache. How do you teach compiler that?

It would be in theory possible with autofdo. Profile with a cache miss
event. Correlate. Maintain the information in addition to the basic
block frequencies.

Probably not simple, but definitely possible.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

References:
- Builtin expansion versus headers optimization: Reductions
  - From: OndÅej BÃlka
- Re: Builtin expansion versus headers optimization: Reductions
  - From: Andi Kleen
- Re: Builtin expansion versus headers optimization: Reductions
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]