This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] powerpc: unaligned memcpy and DMA

From: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
To: libc-alpha at sourceware dot org
Date: Tue, 06 Jan 2015 18:44:50 -0200
Subject: Re: [PATCH] powerpc: unaligned memcpy and DMA
Authentication-results: sourceware.org; auth=none
References: <54A59CAC dot 1070303 at linux dot vnet dot ibm dot com> <20150106185317 dot GA27726 at domone> <54AC3381 dot 9040808 at linux dot vnet dot ibm dot com> <20150106203548 dot GA29771 at domone>

On 06-01-2015 18:35, OndÅej BÃlka wrote:
> On Tue, Jan 06, 2015 at 05:12:01PM -0200, Adhemerval Zanella wrote:
>> On 06-01-2015 16:53, OndÅej BÃlka wrote:
>>> Main question is why there is no power8 memcpy using unaligned loads yet?
>>>
>>> Memcpy is called about hundred times more often than strcpy(and no
>>> strncpy call) on my computer so possible gains are bigger and with 
>>> optimized memcpy a generic strncpy will be faster as well.
>> Mainly because powerpc still triggers kernel traps when issuing VMX/VSX instruction
>> on non-cacheable memory. That's why I pushed 87868c2418fb74357757e3b739ce5b76b17a8929
>> by the way.
>>
>> Although it is not really an issue for 99% of cases, where memory will be cacheable;
>> some code (specially libdrm and xorg), uses memcpy (and possible memset) on DMA mapped
>> memory.  And that's why memcpy/memset for POWER8 are still using aligned accesses all
>> 5b76b17a8929
> That looks like overkill. Better way would be add variable that detects
> if application can do it.
>
> A probably simplest way would be add variable in vdso that kernel sets
> to 1 when doing trap.
>
> Otherwise it would be more complicated as we would need set it when
> application allocates noncachable memory, is mmap only way to do that?
>
My understanding is DMA memory is allocated only through mmap plus specific flags.
However, I don't see how a vDSO variable would help us in this case: any process
can mmap and DMA area and it will have a mix of pages with and without cacheable
states.

The correct way in my understanding would be to use an specialized memcpy on
non-cacheable memory (either by environment flags or by using an app-specific one).
But this is another issue.

Anyway, I need to evaluate which kind of gain it would yield for unaligned cases.

Follow-Ups:
- Re: [PATCH] powerpc: unaligned memcpy and DMA
  - From: OndÅej BÃlka

References:
- [PATCH] powerpc: Optimized st{r,p}ncpy for POWER8/PPC64
  - From: Adhemerval Zanella
- Re: [PATCH] powerpc: Optimized st{r,p}ncpy for POWER8/PPC64
  - From: OndÅej BÃlka
- Re: [PATCH] powerpc: Optimized st{r,p}ncpy for POWER8/PPC64
  - From: Adhemerval Zanella
- [PATCH] powerpc: unaligned memcpy and DMA
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]