This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Builtin expansion versus headers optimization: Reductions
- From: Andi Kleen <andi at firstfloor dot org>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: gcc at gcc dot gnu dot org, law at redhat dot org, libc-alpha at sourceware dot org
- Date: Fri, 05 Jun 2015 09:40:45 -0700
- Subject: Re: Builtin expansion versus headers optimization: Reductions
- Authentication-results: sourceware.org; auth=none
- References: <20150604105929 dot GA19141 at domone> <87fv67nonj dot fsf at tassilo dot jf dot intel dot com> <20150605090203 dot GA16032 at domone>
OndÅej BÃlka <neleai@seznam.cz> writes:
>
> On ivy bridge I got that Using rep stosq for memset(x,0,4096) is 20%
> slower than libcall for L1 cache resident data while 50% faster for data
> outside cache. How do you teach compiler that?
It would be in theory possible with autofdo. Profile with a cache miss
event. Correlate. Maintain the information in addition to the basic
block frequencies.
Probably not simple, but definitely possible.
-Andi
--
ak@linux.intel.com -- Speaking for myself only