This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [Aarch64] libmvec development status
- From: "Sekhar, Ashwin" <Ashwin dot Sekhar at cavium dot com>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, "Pinski, Andrew" <Andrew dot Pinski at cavium dot com>, Szabolcs Nagy <szabolcs dot nagy at arm dot com>, Ramana Radhakrishnan <Ramana dot Radhakrishnan at arm dot com>
- Cc: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, nd <nd at arm dot com>
- Date: Fri, 17 Mar 2017 06:19:38 +0000
- Subject: Re: [Aarch64] libmvec development status
- Authentication-results: sourceware.org; auth=none
- Authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=cavium.com;
- References: <AM5PR0802MB26107DF7330A0D893DC3CF4283260@AM5PR0802MB2610.eurprd08.prod.outlook.com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
On Friday 17 March 2017 01:19 AM, Wilco Dijkstra wrote:
> Andrew Pinski wrote:
>
>> The main justification is that Ashwin is working on the libmvec too.
>> He has proposed the ABI:
>> https://gcc.gnu.org/ml/gcc/2017-03/msg00077.html
>>
>> Basically I would like this collaboration upstream rather than in the
>> private and not on the mailing list. Also delaying upstreaming the
>> base support means there will be two versions out there in the wild
>> starting soon. This is not a good thing.
>
> Agreed, there is no point in having 2 ABIs for the same feature.
Since ARM has already started working on libmvec, I believe the ABI
would already be in place (atleast as a draft). Appreciate if ARM could
share the same.
>
>> I think he means core specific versions. For an example it might make
>> sense to have a different version that is specific to ThunderX2
>> CN99xx. There are some specific instructions sequences are faster to
>> do on cn99xx compared to other cores.
>
> My general feeling is that the scope for microarchitecture specific tuning is
> very limited. Most of the gains are due to (a) having a vector math function in
> the first place, and (b) good algorithm&polynomial. After that you're typically
> limited by FMUL/FMA latency with little potential for improvement (eg. using
> FMA may be essential to achieve the ULP goal, and the polynomial might have
> been designed for FMA, so changing it would increase the worst-case error,
> potentially significantly so).
>
> Wilco
>
Ashwin