This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] powerpc: unaligned memcpy and DMA
- From: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- To: libc-alpha at sourceware dot org
- Date: Tue, 06 Jan 2015 18:44:50 -0200
- Subject: Re: [PATCH] powerpc: unaligned memcpy and DMA
- Authentication-results: sourceware.org; auth=none
- References: <54A59CAC dot 1070303 at linux dot vnet dot ibm dot com> <20150106185317 dot GA27726 at domone> <54AC3381 dot 9040808 at linux dot vnet dot ibm dot com> <20150106203548 dot GA29771 at domone>
On 06-01-2015 18:35, OndÅej BÃlka wrote:
> On Tue, Jan 06, 2015 at 05:12:01PM -0200, Adhemerval Zanella wrote:
>> On 06-01-2015 16:53, OndÅej BÃlka wrote:
>>> Main question is why there is no power8 memcpy using unaligned loads yet?
>>>
>>> Memcpy is called about hundred times more often than strcpy(and no
>>> strncpy call) on my computer so possible gains are bigger and with
>>> optimized memcpy a generic strncpy will be faster as well.
>> Mainly because powerpc still triggers kernel traps when issuing VMX/VSX instruction
>> on non-cacheable memory. That's why I pushed 87868c2418fb74357757e3b739ce5b76b17a8929
>> by the way.
>>
>> Although it is not really an issue for 99% of cases, where memory will be cacheable;
>> some code (specially libdrm and xorg), uses memcpy (and possible memset) on DMA mapped
>> memory. And that's why memcpy/memset for POWER8 are still using aligned accesses all
>> 5b76b17a8929
> That looks like overkill. Better way would be add variable that detects
> if application can do it.
>
> A probably simplest way would be add variable in vdso that kernel sets
> to 1 when doing trap.
>
> Otherwise it would be more complicated as we would need set it when
> application allocates noncachable memory, is mmap only way to do that?
>
My understanding is DMA memory is allocated only through mmap plus specific flags.
However, I don't see how a vDSO variable would help us in this case: any process
can mmap and DMA area and it will have a mix of pages with and without cacheable
states.
The correct way in my understanding would be to use an specialized memcpy on
non-cacheable memory (either by environment flags or by using an app-specific one).
But this is another issue.
Anyway, I need to evaluate which kind of gain it would yield for unaligned cases.