This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
RE: [PATCH][X86_64] Set bit_Fast_Unaligned_Load for Excavator family CPU's
- From: "Pawar, Amit" <Amit dot Pawar at amd dot com>
- To: Carlos O'Donell <carlos at redhat dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
- Date: Wed, 13 Jan 2016 16:30:52 +0000
- Subject: RE: [PATCH][X86_64] Set bit_Fast_Unaligned_Load for Excavator family CPU's
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=none (sender IP is ) smtp dot mailfrom=Amit dot Pawar at amd dot com;
- References: <SN1PR12MB07330EBFA0ED52D659F4F46C97EF0 at SN1PR12MB0733 dot namprd12 dot prod dot outlook dot com> <5695C851 dot 6020700 at redhat dot com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:23
Thank you for your comments and is it possible to put this fix in GLIBC-2.23 release?
-----Original Message-----
From: Carlos O'Donell [mailto:carlos@redhat.com]
Sent: Wednesday, January 13, 2016 9:15 AM
To: Pawar, Amit; libc-alpha@sourceware.org
Subject: Re: [PATCH][X86_64] Set bit_Fast_Unaligned_Load for Excavator family CPU's
On 12/16/2015 05:03 AM, Pawar, Amit wrote:
> For latest AMD Excavator family 0x15h CPU's, it is found that by
> running GLIBC benchtest SSE2_Unaligned based implementations are
> performing faster compare to SSE2 based implementations for routines
> mentioned below. Based on this observation bit_Fast_Unaligned_Load
> flag is set for Excavator family CPU's. This makes SSE2_Unaligned
> implementations as default for these routines.
>
> Routines: strcmp, strcat, strncat, stpcpy, stpncpy, strcpy, strncpy and strstr.
>
> Is attached patch OK for trunk ?
The implementation looks correct, but it needs some cleanup.
Next steps:
(1) Please review the contribution checklist:
https://sourceware.org/glibc/wiki/Contribution%20checklist
(2) This is a user visible change and needs a bug filed for it.
File a bug please. Once you fix the bug you mark the bug fixed
and set the milestone to the devel release you fixed it in e.g.
if you fix it now, you set the milestone to 2.23.
Include your bug number as requested below.
(3) Cleanup nits e.g. unused ecx, wrong indentation etc.
(4) Repost v2 and indicate changes from v1.
> From dba107d0483fff9a1440216fdb17a0a4cab28bfa Mon Sep 17 00:00:00 2001
> From: Amit Pawar <Amit.Pawar@amd.com>
> Date: Wed, 16 Dec 2015 15:23:24 +0530
> Subject: [PATCH] X86_64: Set bit_Fast_Unaligned_Load for family 0x15h
> Excavator CPU's.
Add "(Bug XXXX)" to subject when you have a bug number.
>
> GLIBC benchtest testcases shows SSE2_Unaligned based implementations
> are performing faster compare to SSE2 based implementations for
> routines: strcmp, strcat, strncat, stpcpy, stpncpy, strcpy, strncpy
> and strstr. Flag bit_Fast_Unaligned_Load is set for Excavator family
> 0x15h CPU's. This makes SSE2_Unaligned based implementations as
> default for these routines.
>
> * sysdeps/x86/cpu-features.c (init_cpu_features):
> Set bit_Fast_Unaligned_Load flag for Excavator family 0x15h CPU's.
> This makes SSE2_Unaligned version as default for string routines like
> strcmp, strcat, strncat, stpcpy, stpncpy, strcpy, strncpy and strstr.
Just the canonical changes:
* sysdeps/x86/cpu-features.c (init_cpu_features): If CPU is Excavator
then set index_Fast_Unaligned_Load.
Your comment above includes all the information you need.
> ---
> ChangeLog | 7 +++++++
> sysdeps/x86/cpu-features.c | 11 +++++++++++
> 2 files changed, 18 insertions(+)
>
Add the BZ # to the ChangeLog.
> +2015-12-16 Amit Pawar <amit.pawar@amd.com>
> +
[BZ #XXXX]
> + * sysdeps/x86/cpu-features.c (init_cpu_features):
> + Set bit_Fast_Unaligned_Load flag for Excavator family 0x15h CPU's.
> + This makes SSE2_Unaligned version as default for string routines like
> + strcmp, strcat, strncat, stpcpy, stpncpy, strcpy, strncpy and strstr.
Again, just the canonical changes:
* sysdeps/x86/cpu-features.c (init_cpu_features): If CPU is Excavator
then set index_Fast_Unaligned_Load.
Your git commit message should include these details, but the ChangeLog should be more terse to help people find related changes.
> +
> 2015-12-15 H.J. Lu <hongjiu.lu@intel.com>
>
> [BZ #19367]
> diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> index aff894c..cedbc45 100644
> --- a/sysdeps/x86/cpu-features.c
> +++ b/sysdeps/x86/cpu-features.c
> @@ -152,6 +152,17 @@ init_cpu_features (struct cpu_features *cpu_features)
> cpu_features->cpuid[COMMON_CPUID_INDEX_80000001].ebx,
> cpu_features->cpuid[COMMON_CPUID_INDEX_80000001].ecx,
> cpu_features->cpuid[COMMON_CPUID_INDEX_80000001].edx);
> +
> + if (family == 0x15)
> + {
> + ecx = cpu_features->cpuid[COMMON_CPUID_INDEX_1].ecx;
Why set ecx? What uses it?
> +
> + /* "Excavator" */
> + if (model >= 0x60 && model <= 0x7f)
> + cpu_features->feature[index_Fast_Unaligned_Load]
> + |= bit_Fast_Unaligned_Load;
Wrong indentation?
Should be (with leading indentation removed):
if (model >= 0x60 && model <= 0x7f)
cpu_features->feature[index_Fast_Unaligned_Load]
|= bit_Fast_Unaligned_Load;
And then indent that as required for the braces.
> +
> + }
> }
> else
> kind = arch_kind_other;
> -- 1.8.3.1
Cheers,
Carlos.