This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] powerpc: Use aligned stores in memset



On 18/08/2017 06:10, Florian Weimer wrote:
> On 08/18/2017 08:51 AM, Rajalakshmi Srinivasaraghavan wrote:
>>
>>
>> On 08/18/2017 11:51 AM, Florian Weimer wrote:
>>> On 08/18/2017 07:11 AM, Rajalakshmi Srinivasaraghavan wrote:
>>>>     * sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte
>>>>     for unaligned inputs if size is less than 8.
>>>
>>> This makes me rather nervous.  powerpc64le was supposed to have
>>> reasonable efficient unaligned loads and stores.  GCC happily generates
>>> them, too.
>>
>> This is meant ONLY for caching inhibited accesses.  Caching Inhibited
>> accesses are required to be Guarded and properly aligned.
> 
> The intent is to support memset for such memory regions, right?  This
> change is insufficient.  You have to fix GCC as well because it will
> inline memset of unaligned pointers, like this:
> 
> typedef long __attribute__ ((aligned(1))) long_unaligned;
> 
> void
> clear (long_unaligned *p)
> {
>   memset (p, 0, sizeof (*p));
> }
> 
> clear:
> 	li 9,0
> 	std 9,0(3)
> 	blr
> 
> That's why I think your change is not useful in isolation.


POWER8 does have fast unaligned access memory and in fact unaligned access
could be used to provide a faster memcpy/memmove implementation (I created
one that I never sent upstream some time ago [1]). Unaligned accesses are
used extensively in some optimized str* implementation I created for POWER8. 
It also allows GCC to use unaligned access for builtin mem* operation without
issue on *most* of the cases.

The problem is memset/memcpy/memmove *specifically* are used in some userland
drivers for DMA (if I recall correctly for some XORG drivers) and for this
specific user cases using unaligned access, specially vector ones, will case
the kernel to trap on *every* unaligned instruction leading to abysmal
performance. That's why I pushed 87868c2418fb74357757e3b739ce5b76b17a8929
to fix this very issue for POWER7 memcpy.

We already discussed this same issue some time ago [2] to try overcome this
limitation. I think ideally the drivers that rely on aligned mem* operations
should we its own mem* operations (similar to how dpdk does [3]).

[1] https://github.com/zatrazz/glibc/commits/memopt-power8
[2] https://sourceware.org/ml/libc-alpha/2015-01/msg00130.html
[3] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/common/include/arch/ppc_64/rte_memcpy.h


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]