This is the mail archive of the libc-ports@sources.redhat.com mailing list for the libc-ports project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

PING Re: [ARM] architecture specific subdirectories, optimised memchr and some questions


Hi,
  Has anyone got any views on this patch and series of questions I
posted a couple of weeks ago:


On 15 July 2011 19:11, Dr. David Alan Gilbert <david.gilbert@linaro.org> wrote:
> Hi,
> ?Please find attached a patch that:
> ? ?* Adds directories for architecture specific optimised versions of
> ? ? ?routines on ARM.
> ? ?* Adds a memchr optimised for ARMv6T2/ARMv7
> ? ? ?(for other archs it just drops back to the .c)
> ? ? ?It seems to help most in things that use memchr heavily for finding
> ? ? ?things like line endings (git seems to do this a lot).
>
> However, I've got a bunch of questions about how I've done things in
> that patch:
>
> ?* Is the preconfigure the right place to check for the current architecture
> ?* and is it right to set $submachine there?
> ?* Why did the preconfigure previously append /arm to the end of $machine?
> ?* Ideally I don't think the architecture specifics should be in the eabi
> ? ?subdir; they should be at the top (they aren't eabi specific) - but I
> ? ?can't see a sensible way to rework the search order to do that
> ? ?- suggestions?
>
> ? (The arm use of arm/eabi looks a bit different from most other archs
> ? ?which would seem to do arm/cpuvarient)
> ?* Does the memchr boiler plate look OK? (It seems to work!) ?The code is
> ? ?thumb-2 only which is a little unusual, but the 6T2 and 7-a that it
> ? ?supports can both do that.
> ?* Given this directory structure - where would I put some code that
> ? ?was Neon specific? It's a feature that's available in 7-a varients
> ? ?(and later?) ? arch/arm/eabi/armv7-a/neon?
>
> The code was built and tested against an eglibc-ports svn (rev r14461 from
> last week); glibc-ports didn't seem to build happily on ARM (complaining
> about requiring TLS).
>
> Dave
>
> ?* Add architecture specific directories for ARM
> ?* Add an ARMv6t2 and later memchr
>
> Index: sysdeps/arm/preconfigure
> ===================================================================
> --- sysdeps/arm/preconfigure ? ?(revision 14523)
> +++ sysdeps/arm/preconfigure ? ?(working copy)
> @@ -3,11 +3,38 @@
> ? ? ? ?base_machine=arm
> ? ? ? ?case $config_os in
> ? ? ? ?linux-gnueabi*)
> - ? ? ? ? ? ? ? machine=arm/eabi/$machine
> + ? ? ? ? ? ? ? machine=arm/eabi
> ? ? ? ? ? ? ? ?if [ "${CFLAGS+set}" != "set" ]; then
> ? ? ? ? ? ? ? ? ?CFLAGS="-g -O2"
> ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ?CFLAGS="$CFLAGS -fno-unwind-tables"
> +
> + ? ? ? ? ? ? ? if [ "${submachine}notset" == "notset" ]; then
> + ? ? ? ? ? ? ? ? ? ? ? # User didn't specify a CPU so lets ask the compiler
> + ? ? ? ? ? ? ? ? ? ? ? # Note if you add patterns here you must ensure that
> + ? ? ? ? ? ? ? ? ? ? ? # an appropriate directory exists in sysdeps/arm/eabi
> + ? ? ? ? ? ? ? ? ? ? ? archcppflag=`echo "" |
> + ? ? ? ? ? ? ? ? ? ? ? ? ? $CC $CFLAGS $CPPFLAGS -E -dM - |
> + ? ? ? ? ? ? ? ? ? ? ? ? ? grep __ARM_ARCH |
> + ? ? ? ? ? ? ? ? ? ? ? ? ? sed -e 's/^#define //' -e 's/ .*//'`
> +
> + ? ? ? ? ? ? ? ? ? ? ? case x$archcppflag in
> + ? ? ? ? ? ? ? ? ? ? ? x__ARM_ARCH_7A__)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? submachine=armv7-a
> + ? ? ? ? ? ? ? ? ? ? ? ? ? echo "Found compiler is configured for $submachine"
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ;;
> +
> + ? ? ? ? ? ? ? ? ? ? ? x__ARM_ARCH_6T2__)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? submachine=armv6t2
> + ? ? ? ? ? ? ? ? ? ? ? ? ? echo "Found compiler is configured for $submachine"
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ;;
> +
> + ? ? ? ? ? ? ? ? ? ? ? *)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? echo 2>&1 "Did not find ARM architecture type; using base - use --with-cpu= to specify it explicitly"
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ;;
> + ? ? ? ? ? ? ? ? ? ? ? esac
> + ? ? ? ? ? ? ? fi
> +
> ? ? ? ? ? ? ? ?;;
> ? ? ? ?*)
> ? ? ? ? ? ? ? ?machine=arm/$machine
> Index: sysdeps/arm/eabi/armv7-a/Implies
> ===================================================================
> --- sysdeps/arm/eabi/armv7-a/Implies ? ?(revision 0)
> +++ sysdeps/arm/eabi/armv7-a/Implies ? ?(revision 0)
> @@ -0,0 +1,2 @@
> +# We can do everything that 6T2 can
> +arm/eabi/armv6t2
> Index: sysdeps/arm/eabi/armv6t2/memchr.S
> ===================================================================
> --- sysdeps/arm/eabi/armv6t2/memchr.S ? (revision 0)
> +++ sysdeps/arm/eabi/armv6t2/memchr.S ? (revision 0)
> @@ -0,0 +1,142 @@
> +/* Copyright (C) 2011 Free Software Foundation, Inc.
> + ? This file is part of the GNU C Library.
> + ? Code contributed by Dave Gilbert <david.gilbert@linaro.org>
> +
> + ? The GNU C Library is free software; you can redistribute it and/or
> + ? modify it under the terms of the GNU Lesser General Public
> + ? License as published by the Free Software Foundation; either
> + ? version 2.1 of the License, or (at your option) any later version.
> +
> + ? The GNU C Library is distributed in the hope that it will be useful,
> + ? but WITHOUT ANY WARRANTY; without even the implied warranty of
> + ? MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ?See the GNU
> + ? Lesser General Public License for more details.
> +
> + ? You should have received a copy of the GNU Lesser General Public
> + ? License along with the GNU C Library; if not, write to the Free
> + ? Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
> + ? 02111-1307 USA. ?*/
> +
> +
> +#include <sysdep.h>
> +
> +@ This memchr routine is optimised on a Cortex-A9 and should work on all ARMv7
> +@ and ARMv6T2 processors. ?It has a fast past for short sizes, and has an
> +@ optimised path for large data sets; the worst case is finding the match early
> +@ in a large data set.
> +@ Note: The use of cbz/cbnz means it's Thumb only
> +
> +@ 2011-07-15 david.gilbert@linaro.org
> +@ ? ?Copy from Cortex strings release 21 and change license
> +@ http://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/view/head:/src/linaro-a9/memchr.S
> +@ ? ?Change function declarations/entry/exit
> +
> +@ this lets us check a flag in a 00/ff byte easily in either endianness
> +#ifdef __ARMEB__
> +#define CHARTSTMASK(c) 1<<(31-(c*8))
> +#else
> +#define CHARTSTMASK(c) 1<<(c*8)
> +#endif
> + ? ? ? ?.syntax unified
> +
> + ? ? ? ?.text
> + ? ? ? ?.thumb
> +
> +@ ---------------------------------------------------------------------------
> + ? ? ? .thumb_func
> + ? ? ? .global memchr
> + ? ? ? .type memchr,%function
> +ENTRY(memchr)
> + ?@ r0 = start of memory to scan
> + ?@ r1 = character to look for
> + ?@ r2 = length
> + ?@ returns r0 = pointer to character or NULL if not found
> + ?and ? r1,r1,#0xff ? ? ?@ Don't think we can trust the caller to actually pass a char
> +
> + ?cmp ? r2,#16 ? ? ? ? ? @ If it's short don't bother with anything clever
> + ?blt ? 20f
> +
> + ?tst ? r0, #7 ? ? ? ? ? @ If it's already aligned skip the next bit
> + ?beq ? 10f
> +
> + ?@ Work up to an aligned point
> +5:
> + ?ldrb ?r3, [r0],#1
> + ?subs ?r2, r2, #1
> + ?cmp ? r3, r1
> + ?beq ? 50f ? ? ? ? ? ? ?@ If it matches exit found
> + ?tst ? r0, #7
> + ?cbz ? r2, 40f ? ? ? ? ?@ If we run off the end, exit not found
> + ?bne ? 5b ? ? ? ? ? ? ? @ If not aligned yet then do next byte
> +
> +10:
> + ?@ At this point, we are aligned, we know we have at least 8 bytes to work with
> + ?push ?{r4,r5,r6,r7}
> + ?orr ? r1, r1, r1, lsl #8 ? @ expand the match word across to all bytes
> + ?orr ? r1, r1, r1, lsl #16
> + ?bic ? r4, r2, #7 ? ? ? ? ? @ Number of double words to work with
> + ?mvns ?r7, #0 ? ? ? ? ? ? ? @ all F's
> + ?movs ?r3, #0
> +
> +15:
> + ?ldmia r0!,{r5,r6}
> + ?subs ?r4, r4, #8
> + ?eor ? r5,r5, r1 ? ?@ Get it so that r5,r6 have 00's where the bytes match the target
> + ?eor ? r6,r6, r1
> + ?uadd8 r5, r5, r7 ? @ Parallel add 0xff - sets the GE bits for anything that wasn't 0
> + ?sel ? r5, r3, r7 ? @ bytes are 00 for none-00 bytes, or ff for 00 bytes - NOTE INVERSION
> + ?uadd8 r6, r6, r7 ? @ Parallel add 0xff - sets the GE bits for anything that wasn't 0
> + ?sel ? r6, r5, r7 ? @ chained....bytes are 00 for none-00 bytes, or ff for 00 bytes - NOTE INVERSION
> + ?cbnz ?r6, 60f
> + ?bne ? 15b ? ? ? ? ?@ (Flags from the subs above) If not run out of bytes then go around again
> +
> + ?pop ? {r4,r5,r6,r7}
> + ?and ? r1,r1,#0xff ?@ Get r1 back to a single character from the expansion above
> + ?and ? r2,r2,#7 ? ? @ Leave the count remaining as the number after the double words have been done
> +
> +20:
> + ?cbz ? r2, 40f ? ? ?@ 0 length or hit the end already then not found
> +
> +21: ?@ Post aligned section, or just a short call
> + ?ldrb ?r3,[r0],#1
> + ?subs ?r2,r2,#1
> + ?eor ? r3,r3,r1 ? ? @ r3 = 0 if match - doesn't break flags from sub
> + ?cbz ? r3, 50f
> + ?bne ? 21b ? ? ? ? ?@ on r2 flags
> +
> +40:
> + ?movs ?r0,#0 ? ?@ not found
> + ?DO_RET(lr)
> +
> +50:
> + ?subs ?r0,r0,#1 @ found
> + ?DO_RET(lr)
> +
> +60: ?@ We're here because the fast path found a hit - now we have to track down exactly which word it was
> + ? ? @ r0 points to the start of the double word after the one that was tested
> + ? ? @ r5 has the 00/ff pattern for the first word, r6 has the chained value
> + ?cmp ? r5, #0
> + ?itte ?eq
> + ?moveq r5, r6 ? ? ? ?@ the end is in the 2nd word
> + ?subeq r0,r0,#3 ? ? ?@ Points to 2nd byte of 2nd word
> + ?subne r0,r0,#7 ? ? ?@ or 2nd byte of 1st word
> +
> + ?@ r0 currently points to the 3rd byte of the word containing the hit
> + ?tst ? r5, # CHARTSTMASK(0) ?@ 1st character
> + ?bne ? 61f
> + ?adds ?r0,r0,#1
> + ?tst ? r5, # CHARTSTMASK(1) ?@ 2nd character
> + ?ittt ?eq
> + ?addeq r0,r0,#1
> + ?tsteq r5, # (3<<15) @ 2nd & 3rd character
> + ?@ If not the 3rd must be the last one
> + ?addeq r0,r0,#1
> +
> +61:
> + ?pop ? ? {r4,r5,r6,r7}
> + ?subs ? ?r0,r0,#1
> + ?DO_RET(lr)
> +
> +END(memchr)
> +libc_hidden_builtin_def (memchr)
> +
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]