This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: glibc.cpu.cached_memopt (was Re: [PATCH] Rename the glibc.tune namespace to glibc.cpu)


On 08/04/2018 02:03 AM, Tulio Magno Quites Machado Filho wrote:
Maybe it isn't restricted only to powerpc:
https://sourceware.org/ml/libc-alpha/2018-08/msg00069.html

Obviously other machine maintainers may not be interested on cached_memopt,
but this thread helps me to explain why I was thinking cached_memopt was
generic.

OK.

Notice the optimization is not specific to a CPU, but specific to an user
scenario (cacheable memory).  In other words, the optimization can't be used
whenever PPC_FEATURE2_ARCH_2_07 because it could downgrade the performance when
cache-inhibited memory is being used.

Ahh OK, I got thrown off by the fact that there's a separate routine for
it and assumed that it is Power8-specific.  I have a different concern
then; a tunable is process-wide so the cached_memopt tunable essentially
assumes that the entire process is using cache-inhibited memory.  Is
that a reasonable assumption?

It's the opposite.
When cached_memopt=1, it's assumed the process only uses cacheable memory.
If cached_memopt=0 (default value), nothing is assumed and a safe execution
is taken.

OK, thanks for the clarification. It doesn't change my question though; is there a performance loss when you do a safe execution and does it make sense to fix this in glibc? I haven't formed a strong opinion either way for the latter yet but one thing that would be nice to ensure is that we don't do different things for different architectures. There seems to be scope to come to a consensus across architectures for this and we should try to do that.

Given that Cauldron is only a month away, we could have a more detailed conversation on this in the glibc BoF too if necessary.

1. A new relocation that overlays on top of ifuncs and allows selection
of routines based on specific properties.  I have had this idea for a
while but no time to implement it and it has much more general scope
than memory type; for example memory alignment could also be a factor to
short-cut parts of string routines at compile time itself.  It does not
have the runtime flexibility of a tunable but is probably far more
configurable.

Sounds interesting.  Where are these properties coming from?

I haven't thought this through tbh, but something like this:

- Add new relocations for each special case: R_MEMCPY_REG, R_MEMCPY_CACHE_INHIBITED, R_MEMCPY_ALIGN16, etc. that can be generated based on properties of the inputs such as volatileness, alignment, etc.

- Create separate entry points memcpy@plt and memcpy_noncached@plt for each relocation we end up using for that TU.

- Have the ifunc resolver take into consideration the relocation type when patching in the PLT.

It may be simpler to just emit different entry points (similar to the *_finite math functions) and separate ifunc resolvers if there is no overlap between ifunc implementations for these entry points.

I don't think this option would help in this case.
I can't correlate size to cache-inhibited memory.

Right, I had not understood where you were coming from then and assumed you were talking about non-temporal accesses.

Siddhesh


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]