This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: glibc.cpu.cached_memopt (was Re: [PATCH] Rename the glibc.tune namespace to glibc.cpu)


Siddhesh Poyarekar <siddhesh@sourceware.org> writes:

> On 08/04/2018 02:03 AM, Tulio Magno Quites Machado Filho wrote:
>>>> Notice the optimization is not specific to a CPU, but specific to an user
>>>> scenario (cacheable memory).  In other words, the optimization can't be used
>>>> whenever PPC_FEATURE2_ARCH_2_07 because it could downgrade the performance when
>>>> cache-inhibited memory is being used.
>>>
>>> Ahh OK, I got thrown off by the fact that there's a separate routine for
>>> it and assumed that it is Power8-specific.  I have a different concern
>>> then; a tunable is process-wide so the cached_memopt tunable essentially
>>> assumes that the entire process is using cache-inhibited memory.  Is
>>> that a reasonable assumption?
>> 
>> It's the opposite.
>> When cached_memopt=1, it's assumed the process only uses cacheable memory.
>> If cached_memopt=0 (default value), nothing is assumed and a safe execution
>> is taken.
>
> OK, thanks for the clarification. It doesn't change my question though; 
> is there a performance loss when you do a safe execution

Yes, for cacheable memory.  A safe execution uses only naturally aligned memory
accesses and doesn't provide the best performance we have.

However an unsafe execution on cached inhibited memory is catastrophic because
every naturally unaligned memory access generates an alignment interruption
that is treated by the kernel, causing an even greater performance impact than
a safe execution on cacheable memory.

> does it make sense to fix this in glibc?

IMHO, yes.  I haven't seen yet a good explanation on why userspace programs
should not be using memcpy in these conditions, e.g. AFAIK, ISO C 11 does not
prohibit this.

> There seems to be scope to come to a consensus across architectures for this 
> and we should try to do that.

Agreed.

>>> 1. A new relocation that overlays on top of ifuncs and allows selection
>>> of routines based on specific properties.  I have had this idea for a
>>> while but no time to implement it and it has much more general scope
>>> than memory type; for example memory alignment could also be a factor to
>>> short-cut parts of string routines at compile time itself.  It does not
>>> have the runtime flexibility of a tunable but is probably far more
>>> configurable.
>> 
>> Sounds interesting.  Where are these properties coming from?
>
> I haven't thought this through tbh, but something like this:
>
> - Add new relocations for each special case: R_MEMCPY_REG, 
> R_MEMCPY_CACHE_INHIBITED, R_MEMCPY_ALIGN16, etc. that can be generated 
> based on properties of the inputs such as volatileness, alignment, etc.
>
> - Create separate entry points memcpy@plt and memcpy_noncached@plt for 
> each relocation we end up using for that TU.
>
> - Have the ifunc resolver take into consideration the relocation type 
> when patching in the PLT.
>
> It may be simpler to just emit different entry points (similar to the 
> *_finite math functions) and separate ifunc resolvers if there is no 
> overlap between ifunc implementations for these entry points.

I still believe this could help, but there is still one open issue: how do we
know a memcpy call is accessing cached inhibited memory?
I'm afraid this property is not that easy to detect.

-- 
Tulio Magno


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]