This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: glibc.cpu.cached_memopt (was Re: [PATCH] Rename the glibc.tune namespace to glibc.cpu)
- From: Siddhesh Poyarekar <siddhesh at sourceware dot org>
- To: Tulio Magno Quites Machado Filho <tuliom at ascii dot art dot br>, Carlos O'Donell <carlos at redhat dot com>, libc-alpha at sourceware dot org, "H.J. Lu" <hjl dot tools at gmail dot com>
- Date: Mon, 6 Aug 2018 16:30:34 +0530
- Subject: Re: glibc.cpu.cached_memopt (was Re: [PATCH] Rename the glibc.tune namespace to glibc.cpu)
- References: <20180716141633.6948-1-siddhesh@sourceware.org> <902a4076-7b87-ea27-bab4-3740ab0a04ec@redhat.com> <25d88c07-1e8f-bd73-cc28-989930a55933@sourceware.org> <87tvozx83k.fsf@linux.ibm.com> <56f276ad-f0da-a075-b5d1-0d03520ea4fd@sourceware.org> <87o9ejgpgf.fsf@linux.ibm.com>
On 08/04/2018 02:03 AM, Tulio Magno Quites Machado Filho wrote:
Maybe it isn't restricted only to powerpc:
https://sourceware.org/ml/libc-alpha/2018-08/msg00069.html
Obviously other machine maintainers may not be interested on cached_memopt,
but this thread helps me to explain why I was thinking cached_memopt was
generic.
OK.
Notice the optimization is not specific to a CPU, but specific to an user
scenario (cacheable memory). In other words, the optimization can't be used
whenever PPC_FEATURE2_ARCH_2_07 because it could downgrade the performance when
cache-inhibited memory is being used.
Ahh OK, I got thrown off by the fact that there's a separate routine for
it and assumed that it is Power8-specific. I have a different concern
then; a tunable is process-wide so the cached_memopt tunable essentially
assumes that the entire process is using cache-inhibited memory. Is
that a reasonable assumption?
It's the opposite.
When cached_memopt=1, it's assumed the process only uses cacheable memory.
If cached_memopt=0 (default value), nothing is assumed and a safe execution
is taken.
OK, thanks for the clarification. It doesn't change my question though;
is there a performance loss when you do a safe execution and does it
make sense to fix this in glibc? I haven't formed a strong opinion
either way for the latter yet but one thing that would be nice to ensure
is that we don't do different things for different architectures. There
seems to be scope to come to a consensus across architectures for this
and we should try to do that.
Given that Cauldron is only a month away, we could have a more detailed
conversation on this in the glibc BoF too if necessary.
1. A new relocation that overlays on top of ifuncs and allows selection
of routines based on specific properties. I have had this idea for a
while but no time to implement it and it has much more general scope
than memory type; for example memory alignment could also be a factor to
short-cut parts of string routines at compile time itself. It does not
have the runtime flexibility of a tunable but is probably far more
configurable.
Sounds interesting. Where are these properties coming from?
I haven't thought this through tbh, but something like this:
- Add new relocations for each special case: R_MEMCPY_REG,
R_MEMCPY_CACHE_INHIBITED, R_MEMCPY_ALIGN16, etc. that can be generated
based on properties of the inputs such as volatileness, alignment, etc.
- Create separate entry points memcpy@plt and memcpy_noncached@plt for
each relocation we end up using for that TU.
- Have the ifunc resolver take into consideration the relocation type
when patching in the PLT.
It may be simpler to just emit different entry points (similar to the
*_finite math functions) and separate ifunc resolvers if there is no
overlap between ifunc implementations for these entry points.
I don't think this option would help in this case.
I can't correlate size to cache-inhibited memory.
Right, I had not understood where you were coming from then and assumed
you were talking about non-temporal accesses.
Siddhesh