This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v2] aarch64: Hoist ZVA check out of the memset function
- From: Siddhesh Poyarekar <siddhesh at sourceware dot org>
- To: Szabolcs Nagy <szabolcs dot nagy at arm dot com>, libc-alpha at sourceware dot org
- Cc: nd at arm dot com, adhemerval dot zanella at linaro dot org
- Date: Thu, 5 Oct 2017 09:52:00 +0530
- Subject: Re: [PATCH v2] aarch64: Hoist ZVA check out of the memset function
- Authentication-results: sourceware.org; auth=none
- References: <1507031555-3046-1-git-send-email-siddhesh@sourceware.org> <59D507C1.4000101@arm.com>
- Reply-to: siddhesh at sourceware dot org
On Wednesday 04 October 2017 09:39 PM, Szabolcs Nagy wrote:
>> +/* Memset entry point. */
>> +ENTRY_ALIGN (MEMSET, 6)
>>
>> DELOUSE (0)
>> DELOUSE (2)
>> @@ -46,9 +147,9 @@ ENTRY_ALIGN (__memset, 6)
>> add dstend, dstin, count
>>
>> cmp count, 96
>> - b.hi L(set_long)
>> + b.hi MEMSET_L(set_long)
>> cmp count, 16
>> - b.hs L(set_medium)
>> + b.hs MEMSET_L(set_medium)
>> mov val, v0.D[0]
>>
>> /* Set 0..15 bytes. */
>> @@ -68,9 +169,9 @@ ENTRY_ALIGN (__memset, 6)
>> 3: ret
>>
>> /* Set 17..96 bytes. */
>> -L(set_medium):
>> +MEMSET_L(set_medium):
>
> why do we need different label names for the different memsets?
>
> i'd expect a localized change where there are 4 variants of a
> piece of code under different ifdefs.
All of the variants are built in the same source file (memset_zva) which
is why they need different sets of labels. I chose that so that I don't
blow up the number of files, but I can split it into individual files
and get rid of this macro.
> now it's hard to see what's going on, e.g. how code alignment of
> various parts of memset is affected.
Code alignments are not the focus of this change, I'll fine tune those
in further iterations because they give minor comparative gains.
> i don't think we need this global,
> i'd just do mrs dczid_el0 here.
> (this code should be unreachable on current cpus)
Fair enough. FWIW, I test this routine unconditionally.
>> +# undef MEMSET
>> +# undef MEMSET_ZVA
>> +# undef MEMSET_L
>> +# define MEMSET __memset_zva_default
>> +# define MEMSET_ZVA 1
>> +# define MEMSET_L(label) L(label ## _zvadef)
>> +# include <sysdeps/aarch64/memset.S>
>> +#endif
>
> i think this is not ok, these should be in different files.
>
> you put code that is never used together in the same object file.
>
> (ideally the ifunc code for various cpus should be grouped
> together by cpu and on different pages for each cpu so you
> dont even load the code at runtime that is not for your cpu)
Yeah but these are not different CPU types. Anyway, I don't have a
strong opinion on this, so I'll just split it up. It'll get rid of the
macro you got confused about too.
>> diff --git a/sysdeps/aarch64/multiarch/rtld-memset.S b/sysdeps/aarch64/multiarch/rtld-memset.S
>> new file mode 100644
>> index 0000000..969cf14
>> --- /dev/null
>> +++ b/sysdeps/aarch64/multiarch/rtld-memset.S
>> @@ -0,0 +1,25 @@
>> +/* Memset for aarch64 for use within the dynamic linker.
>> + Copyright (C) 2017 Free Software Foundation, Inc.
>> +
>> + This file is part of the GNU C Library.
>> +
>> + The GNU C Library is free software; you can redistribute it and/or
>> + modify it under the terms of the GNU Lesser General Public
>> + License as published by the Free Software Foundation; either
>> + version 2.1 of the License, or (at your option) any later version.
>> +
>> + The GNU C Library is distributed in the hope that it will be useful,
>> + but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
>> + Lesser General Public License for more details.
>> +
>> + You should have received a copy of the GNU Lesser General Public
>> + License along with the GNU C Library. If not, see
>> + <http://www.gnu.org/licenses/>. */
>> +
>> +#if IS_IN (rtld)
>> +# define MEMSET memset
>> +# define INTERNAL_MEMSET
>> +# define MEMSET_L(label) L(label)
>> +# include <sysdeps/aarch64/memset.S>
>> +#endif
>
> is this performance critical?
> (i don't know if it's only used for early dynlinker startup code
> or all libc internal memsets)
>
> if not then we don't need it to be different variant, it could be
> just a copy of the _nozva/_defaultzva case
This is used only in rtld, the libc internal one is __memset_generic.
It is not really performance critical, but given that I'll be including
one or the other anyway, I chose to include memset.S instead of
memset_nozva.S. I can do the latter.
> from this code it's not clear to me that cpu_features->zva_size
> is always initialized.
cpu_features is static, so it is always initialized to zero.
Thanks, I'll post an updated version.
Siddhesh