This is the mail archive of the glibc-cvs@sourceware.org mailing list for the glibc project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
GNU C Library master sources branch master updated. glibc-2.23-379-g7ab1de2

From: stli at sourceware dot org
To: glibc-cvs at sourceware dot org
Date: 25 May 2016 15:19:13 -0000
Subject: GNU C Library master sources branch master updated. glibc-2.23-379-g7ab1de2
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  7ab1de21067d72460ac14089bf6541b10fc14c80 (commit)
       via  8f25676c83eef5c85db98f9cf027890fbe810447 (commit)
       via  a42a95c43133d69b1108f582cffa0f6986a9c3da (commit)
       via  52f8a48e24563daa807f94824ce9782b9a9eece9 (commit)
       via  ee518b7070b1bcb41382b6db10f513e071b2c20e (commit)
       via  6896776c3c9c32fd22324e6de6737dd69ae73213 (commit)
       via  5bd11b19099b3f22d821515f9c93f1ecc1a7e15e (commit)
       via  421c5278d83e72740150259960a431706ac343f9 (commit)
       via  81c6380887c6d62c56e5f0f85a241f759f58b2fd (commit)
       via  3b704e26b33e35d99de920f8462d8e438f89be39 (commit)
       via  4690dab084f854bf0013b5eaabcf90c2d5b692ff (commit)
       via  9b7f05599a92dead97d6683bc838a57bc63ac52b (commit)
       via  c70e9913d2fc2d0bf6a3ca98a4dece759d40a4ec (commit)
      from  5ff81530dd14552a48a8fcb119e5867a1b504cc6 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=7ab1de21067d72460ac14089bf6541b10fc14c80

commit 7ab1de21067d72460ac14089bf6541b10fc14c80
Author: Stefan Liebler <stli@linux.vnet.ibm.com>
Date:   Wed May 25 17:18:06 2016 +0200

    Fix UTF-16 surrogate handling. [BZ #19727]
    
    According to the latest Unicode standard, a conversion from/to UTF-xx has
    to report an error if the character value is in range of an utf16 surrogate
    (0xd800..0xdfff). See https://sourceware.org/ml/libc-help/2015-12/msg00015.html.
    Thus this patch fixes this behaviour for converting from utf32 to internal and
    from internal to utf8.
    
    Furthermore the conversion from utf16 to internal does not report an error if the
    input-stream consists of two low-surrogate values. If an uint16_t value is in the
    range of 0xd800 .. 0xdfff, the next uint16_t value is checked, if it is in the
    range of a low surrogate (0xdc00 .. 0xdfff). Afterwards these two uint16_t
    values are interpreted as a high- and low-surrogates pair. But there is no test
    if the first uint16_t value is really in the range of a high-surrogate
    (0xd800 .. 0xdbff). If there would be two uint16_t values in the range of a low
    surrogate, then they will be treated as a valid high- and low-surrogates pair.
    This patch adds this test.
    
    This patch also adds a new testcase, which checks UTF conversions with input
    values in range of UTF16 surrogates. The test converts from UTF-xx to INTERNAL,
    INTERNAL to UTF-xx and directly between UTF-xx to UTF-yy. The latter conversion
    is needed because s390 has iconv-modules, which converts from/to UTF in one step.
    The new testcase was tested on a s390, power and intel machine.
    
    ChangeLog:
    
    	[BZ #19727]
    	* iconvdata/utf-16.c (BODY): Report an error if first word is not a
    	valid high surrogate.
    	* iconvdata/utf-32.c (BODY): Report an error if the value is in range
    	of an utf16 surrogate.
    	* iconv/gconv_simple.c (BODY): Likewise.
    	* iconvdata/bug-iconv12.c: New file.
    	* iconvdata/Makefile (tests): Add bug-iconv12.
    
    rename test

diff --git a/ChangeLog b/ChangeLog
index 9c2d14a..1ecd6d5 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,16 @@
 2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
 
+	[BZ #19727]
+	* iconvdata/utf-16.c (BODY): Report an error if first word is not a
+	valid high surrogate.
+	* iconvdata/utf-32.c (BODY): Report an error if the value is in range
+	of an utf16 surrogate.
+	* iconv/gconv_simple.c (BODY): Likewise.
+	* iconvdata/bug-iconv12.c: New file.
+	* iconvdata/Makefile (tests): Add bug-iconv12.
+
+2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
+
 	[BZ #19726]
 	* iconv/gconv_simple.c (ucs4le_internal_loop): Update inptrp and
 	outptrp in case of an illegal input.
diff --git a/iconv/gconv_simple.c b/iconv/gconv_simple.c
index f66bf34..e5284e4 100644
--- a/iconv/gconv_simple.c
+++ b/iconv/gconv_simple.c
@@ -892,7 +892,8 @@ ucs4le_internal_loop_single (struct __gconv_step *step,
     if (__glibc_likely (wc < 0x80))					      \
       /* It's an one byte sequence.  */					      \
       *outptr++ = (unsigned char) wc;					      \
-    else if (__glibc_likely (wc <= 0x7fffffff))				      \
+    else if (__glibc_likely (wc <= 0x7fffffff				      \
+			     && (wc < 0xd800 || wc > 0xdfff)))		      \
       {									      \
 	size_t step;							      \
 	unsigned char *start;						      \
diff --git a/iconvdata/Makefile b/iconvdata/Makefile
index f9826b3..3df5aa4 100644
--- a/iconvdata/Makefile
+++ b/iconvdata/Makefile
@@ -68,7 +68,7 @@ modules.so := $(addsuffix .so, $(modules))
 ifeq (yes,$(build-shared))
 tests = bug-iconv1 bug-iconv2 tst-loading tst-e2big tst-iconv4 bug-iconv4 \
 	tst-iconv6 bug-iconv5 bug-iconv6 tst-iconv7 bug-iconv8 bug-iconv9 \
-	bug-iconv10 bug-iconv11
+	bug-iconv10 bug-iconv11 bug-iconv12
 ifeq ($(have-thread-library),yes)
 tests += bug-iconv3
 endif
@@ -309,6 +309,8 @@ $(objpfx)tst-iconv7.out: $(objpfx)gconv-modules \
 			 $(addprefix $(objpfx),$(modules.so))
 $(objpfx)bug-iconv10.out: $(objpfx)gconv-modules \
 			  $(addprefix $(objpfx),$(modules.so))
+$(objpfx)bug-iconv12.out: $(objpfx)gconv-modules \
+			  $(addprefix $(objpfx),$(modules.so))
 
 $(objpfx)iconv-test.out: run-iconv-test.sh $(objpfx)gconv-modules \
 			 $(addprefix $(objpfx),$(modules.so)) \
diff --git a/iconvdata/bug-iconv12.c b/iconvdata/bug-iconv12.c
new file mode 100644
index 0000000..49f5208
--- /dev/null
+++ b/iconvdata/bug-iconv12.c
@@ -0,0 +1,263 @@
+/* bug 19727: Testing UTF conversions with UTF16 surrogates as input.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <string.h>
+#include <inttypes.h>
+#include <iconv.h>
+#include <byteswap.h>
+
+static int
+run_conversion (const char *from, const char *to, char *inbuf, size_t inbuflen,
+		int exp_errno, int line)
+{
+  char outbuf[16];
+  iconv_t cd;
+  char *inptr;
+  size_t inlen;
+  char *outptr;
+  size_t outlen;
+  size_t n;
+  int e;
+  int fails = 0;
+
+  cd = iconv_open (to, from);
+  if (cd == (iconv_t) -1)
+    {
+      printf ("line %d: cannot convert from %s to %s: %m\n", line, from, to);
+      return 1;
+    }
+
+  inptr = (char *) inbuf;
+  inlen = inbuflen;
+  outptr = outbuf;
+  outlen = sizeof (outbuf);
+
+  errno = 0;
+  n = iconv (cd, &inptr, &inlen, &outptr, &outlen);
+  e = errno;
+
+  if (exp_errno == 0)
+    {
+      if (n == (size_t) -1)
+	{
+	  puts ("n should be >= 0, but n == -1");
+	  fails ++;
+	}
+
+      if (e != 0)
+	{
+	  printf ("errno should be 0: 'Success', but errno == %d: '%s'\n"
+		  , e, strerror(e));
+	  fails ++;
+	}
+    }
+  else
+    {
+      if (n != (size_t) -1)
+	{
+	  printf ("n should be -1, but n == %zd\n", n);
+	  fails ++;
+	}
+
+      if (e != exp_errno)
+	{
+	  printf ("errno should be %d: '%s', but errno == %d: '%s'\n"
+		  , exp_errno, strerror (exp_errno), e, strerror (e));
+	  fails ++;
+	}
+    }
+
+  iconv_close (cd);
+
+  if (fails > 0)
+    {
+      printf ("Errors in line %d while converting %s to %s.\n\n"
+	      , line, from, to);
+    }
+
+  return fails;
+}
+
+static int
+do_test (void)
+{
+  int fails = 0;
+  char buf[4];
+
+  /* This test runs iconv() with UTF character in range of an UTF16 surrogate.
+     UTF-16 high surrogate is in range 0xD800..0xDBFF and
+     UTF-16 low surrogate is in range 0xDC00..0xDFFF.
+     Converting from or to UTF-xx has to report errors in those cases.
+     In UTF-16, surrogate pairs with a high surrogate in front of a low
+     surrogate is valid.  */
+
+  /* Use RUN_UCS4_UTF32_INPUT to test conversion ...
+
+     ... from INTERNAL to UTF-xx[LE|BE]:
+     Converting from UCS4 to UTF-xx[LE|BE] first converts UCS4 to INTERNAL
+     without checking for UTF-16 surrogate values
+     and then converts from INTERNAL to UTF-xx[LE|BE].
+     The latter conversion has to report an error in those cases.
+
+     ... from UTF-32[LE|BE] to INTERNAL:
+     Converting directly from UTF-32LE to UTF-8|16 is needed,
+     because e.g. s390x has iconv-modules which converts directly.  */
+#define RUN_UCS4_UTF32_INPUT(b0, b1, b2, b3, err, line)			\
+  buf[0] = b0;								\
+  buf[1] = b1;								\
+  buf[2] = b2;								\
+  buf[3] = b3;								\
+  fails += run_conversion ("UCS4", "UTF-8", buf, 4, err, line);		\
+  fails += run_conversion ("UCS4", "UTF-16LE", buf, 4, err, line);	\
+  fails += run_conversion ("UCS4", "UTF-16BE", buf, 4, err, line);	\
+  fails += run_conversion ("UCS4", "UTF-32LE", buf, 4, err, line);	\
+  fails += run_conversion ("UCS4", "UTF-32BE", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32BE", "WCHAR_T", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32BE", "UTF-8", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32BE", "UTF-16LE", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32BE", "UTF-16BE", buf, 4, err, line);	\
+  buf[0] = b3;								\
+  buf[1] = b2;								\
+  buf[2] = b1;								\
+  buf[3] = b0;								\
+  fails += run_conversion ("UTF-32LE", "WCHAR_T", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32LE", "UTF-8", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32LE", "UTF-16LE", buf, 4, err, line);	\
+  fails += run_conversion ("UTF-32LE", "UTF-16BE", buf, 4, err, line);
+
+  /* Use UCS4/UTF32 input of 0xD7FF.  */
+  RUN_UCS4_UTF32_INPUT (0x0, 0x0, 0xD7, 0xFF, 0, __LINE__);
+
+  /* Use UCS4/UTF32 input of 0xD800.  */
+  RUN_UCS4_UTF32_INPUT (0x0, 0x0, 0xD8, 0x00, EILSEQ, __LINE__);
+
+  /* Use UCS4/UTF32 input of 0xDBFF.  */
+  RUN_UCS4_UTF32_INPUT (0x0, 0x0, 0xDB, 0xFF, EILSEQ, __LINE__);
+
+  /* Use UCS4/UTF32 input of 0xDC00.  */
+  RUN_UCS4_UTF32_INPUT (0x0, 0x0, 0xDC, 0x00, EILSEQ, __LINE__);
+
+  /* Use UCS4/UTF32 input of 0xDFFF.  */
+  RUN_UCS4_UTF32_INPUT (0x0, 0x0, 0xDF, 0xFF, EILSEQ, __LINE__);
+
+  /* Use UCS4/UTF32 input of 0xE000.  */
+  RUN_UCS4_UTF32_INPUT (0x0, 0x0, 0xE0, 0x00, 0, __LINE__);
+
+
+  /* Use RUN_UTF16_INPUT to test conversion from UTF16[LE|BE] to INTERNAL.
+     Converting directly from UTF-16 to UTF-8|32 is needed,
+     because e.g. s390x has iconv-modules which converts directly.
+     Use len == 2 or 4 to specify one or two UTF-16 characters.  */
+#define RUN_UTF16_INPUT(b0, b1, b2, b3, len, err, line)			\
+  buf[0] = b0;								\
+  buf[1] = b1;								\
+  buf[2] = b2;								\
+  buf[3] = b3;								\
+  fails += run_conversion ("UTF-16BE", "WCHAR_T", buf, len, err, line);	\
+  fails += run_conversion ("UTF-16BE", "UTF-8", buf, len, err, line);	\
+  fails += run_conversion ("UTF-16BE", "UTF-32LE", buf, len, err, line); \
+  fails += run_conversion ("UTF-16BE", "UTF-32BE", buf, len, err, line); \
+  buf[0] = b1;								\
+  buf[1] = b0;								\
+  buf[2] = b3;								\
+  buf[3] = b2;								\
+  fails += run_conversion ("UTF-16LE", "WCHAR_T", buf, len, err, line);	\
+  fails += run_conversion ("UTF-16LE", "UTF-8", buf, len, err, line);	\
+  fails += run_conversion ("UTF-16LE", "UTF-32LE", buf, len, err, line); \
+  fails += run_conversion ("UTF-16LE", "UTF-32BE", buf, len, err, line);
+
+  /* Use UTF16 input of 0xD7FF.  */
+  RUN_UTF16_INPUT (0xD7, 0xFF, 0xD7, 0xFF, 4, 0, __LINE__);
+
+  /* Use [single] UTF16 high surrogate 0xD800 [with a valid character behind].
+     And check an UTF16 surrogate pair [without valid low surrogate].  */
+  RUN_UTF16_INPUT (0xD8, 0x0, 0x0, 0x0, 2, EINVAL, __LINE__);
+  RUN_UTF16_INPUT (0xD8, 0x0, 0xD7, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xD8, 0x0, 0xD8, 0x0, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xD8, 0x0, 0xE0, 0x0, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xD8, 0x0, 0xDC, 0x0, 4, 0, __LINE__);
+
+  /* Use [single] UTF16 high surrogate 0xDBFF [with a valid character behind].
+     And check an UTF16 surrogate pair [without valid low surrogate].  */
+  RUN_UTF16_INPUT (0xDB, 0xFF, 0x0, 0x0, 2, EINVAL, __LINE__);
+  RUN_UTF16_INPUT (0xDB, 0xFF, 0xD7, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDB, 0xFF, 0xDB, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDB, 0xFF, 0xE0, 0x0, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDB, 0xFF, 0xDF, 0xFF, 4, 0, __LINE__);
+
+  /* Use single UTF16 low surrogate 0xDC00 [with a valid character behind].
+     And check an UTF16 surrogate pair [without valid high surrogate].   */
+  RUN_UTF16_INPUT (0xDC, 0x0, 0x0, 0x0, 2, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDC, 0x0, 0xD7, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xD8, 0x0, 0xDC, 0x0, 4, 0, __LINE__);
+  RUN_UTF16_INPUT (0xD7, 0xFF, 0xDC, 0x0, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDC, 0x0, 0xDC, 0x0, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xE0, 0x0, 0xDC, 0x0, 4, EILSEQ, __LINE__);
+
+  /* Use single UTF16 low surrogate 0xDFFF [with a valid character behind].
+     And check an UTF16 surrogate pair [without valid high surrogate].   */
+  RUN_UTF16_INPUT (0xDF, 0xFF, 0x0, 0x0, 2, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDF, 0xFF, 0xD7, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDB, 0xFF, 0xDF, 0xFF, 4, 0, __LINE__);
+  RUN_UTF16_INPUT (0xD7, 0xFF, 0xDF, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xDF, 0xFF, 0xDF, 0xFF, 4, EILSEQ, __LINE__);
+  RUN_UTF16_INPUT (0xE0, 0x0, 0xDF, 0xFF, 4, EILSEQ, __LINE__);
+
+  /* Use UCS4/UTF32 input of 0xE000.  */
+  RUN_UTF16_INPUT (0xE0, 0x0, 0xE0, 0x0, 4, 0, __LINE__);
+
+
+  /* Use RUN_UTF8_3BYTE_INPUT to test conversion from UTF-8 to INTERNAL.
+     Converting directly from UTF-8 to UTF-16|32 is needed,
+     because e.g. s390x has iconv-modules which converts directly.  */
+#define RUN_UTF8_3BYTE_INPUT(b0, b1, b2, err, line)			\
+  buf[0] = b0;								\
+  buf[1] = b1;								\
+  buf[2] = b2;								\
+  fails += run_conversion ("UTF-8", "WCHAR_T", buf, 3, err, line);	\
+  fails += run_conversion ("UTF-8", "UTF-16LE", buf, 3, err, line);	\
+  fails += run_conversion ("UTF-8", "UTF-16BE", buf, 3, err, line);	\
+  fails += run_conversion ("UTF-8", "UTF-32LE", buf, 3, err, line);	\
+  fails += run_conversion ("UTF-8", "UTF-32BE", buf, 3, err, line);
+
+  /* Use UTF-8 input of 0xD7FF.  */
+  RUN_UTF8_3BYTE_INPUT (0xED, 0x9F, 0xBF, 0, __LINE__);
+
+  /* Use UTF-8 input of 0xD800.  */
+  RUN_UTF8_3BYTE_INPUT (0xED, 0xA0, 0x80, EILSEQ, __LINE__);
+
+  /* Use UTF-8 input of 0xDBFF.  */
+  RUN_UTF8_3BYTE_INPUT (0xED, 0xAF, 0xBF, EILSEQ, __LINE__);
+
+  /* Use UTF-8 input of 0xDC00.  */
+  RUN_UTF8_3BYTE_INPUT (0xED, 0xB0, 0x80, EILSEQ, __LINE__);
+
+  /* Use UTF-8 input of 0xDFFF.  */
+  RUN_UTF8_3BYTE_INPUT (0xED, 0xBF, 0xBF, EILSEQ, __LINE__);
+
+  /* Use UTF-8 input of 0xF000.  */
+  RUN_UTF8_3BYTE_INPUT (0xEF, 0x80, 0x80, 0, __LINE__);
+
+  return fails > 0 ? EXIT_FAILURE : EXIT_SUCCESS;
+}
+
+#define TEST_FUNCTION do_test ()
+#include "../test-skeleton.c"
diff --git a/iconvdata/utf-16.c b/iconvdata/utf-16.c
index 2d74a13..dbbcd6d 100644
--- a/iconvdata/utf-16.c
+++ b/iconvdata/utf-16.c
@@ -295,6 +295,12 @@ gconv_end (struct __gconv_step *data)
 	  {								      \
 	    uint16_t u2;						      \
 									      \
+	    if (__glibc_unlikely (u1 >= 0xdc00))			      \
+	      {								      \
+		/* This is no valid first word for a surrogate.  */	      \
+		STANDARD_FROM_LOOP_ERR_HANDLER (2);			      \
+	      }								      \
+									      \
 	    /* It's a surrogate character.  At least the first word says      \
 	       it is.  */						      \
 	    if (__glibc_unlikely (inptr + 4 > inend))			      \
@@ -329,6 +335,12 @@ gconv_end (struct __gconv_step *data)
 	  }								      \
 	else								      \
 	  {								      \
+	    if (__glibc_unlikely (u1 >= 0xdc00))			      \
+	      {								      \
+		/* This is no valid first word for a surrogate.  */	      \
+		STANDARD_FROM_LOOP_ERR_HANDLER (2);			      \
+	      }								      \
+									      \
 	    /* It's a surrogate character.  At least the first word says      \
 	       it is.  */						      \
 	    if (__glibc_unlikely (inptr + 4 > inend))			      \
diff --git a/iconvdata/utf-32.c b/iconvdata/utf-32.c
index 0d6fe30..25f6fc6 100644
--- a/iconvdata/utf-32.c
+++ b/iconvdata/utf-32.c
@@ -239,7 +239,7 @@ gconv_end (struct __gconv_step *data)
     if (swap)								      \
       u1 = bswap_32 (u1);						      \
 									      \
-    if (__glibc_unlikely (u1 >= 0x110000))				      \
+    if (__glibc_unlikely (u1 >= 0x110000 || (u1 >= 0xd800 && u1 < 0xe000)))   \
       {									      \
 	/* This is illegal.  */						      \
 	STANDARD_FROM_LOOP_ERR_HANDLER (4);				      \

http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=8f25676c83eef5c85db98f9cf027890fbe810447

commit 8f25676c83eef5c85db98f9cf027890fbe810447
Author: Stefan Liebler <stli@linux.vnet.ibm.com>
Date:   Wed May 25 17:18:06 2016 +0200

    Fix ucs4le_internal_loop in error case. [BZ #19726]
    
    When converting from UCS4LE to INTERNAL, the input-value is checked for a too
    large value and the iconv() call sets errno to EILSEQ. In this case the inbuf
    argument of the iconv() call should point to the invalid character, but it
    points to the beginning of the inbuf.
    Thus this patch updates the pointers inptrp and outptrp before returning in
    this error case.
    
    This patch also adds a new testcase for this issue.
    The new test was tested on a s390, power, intel machine.
    
    ChangeLog:
    
    	[BZ #19726]
    	* iconv/gconv_simple.c (ucs4le_internal_loop): Update inptrp and
    	outptrp in case of an illegal input.
    	* iconv/tst-iconv6.c: New file.
    	* iconv/Makefile (tests): Add tst-iconv6.

diff --git a/ChangeLog b/ChangeLog
index c098194..9c2d14a 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,13 @@
 2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
 
+	[BZ #19726]
+	* iconv/gconv_simple.c (ucs4le_internal_loop): Update inptrp and
+	outptrp in case of an illegal input.
+	* iconv/tst-iconv6.c: New file.
+	* iconv/Makefile (tests): Add tst-iconv6.
+
+2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
+
 	* sysdeps/s390/utf16-utf32-z9.c: Disable cu42 instruction and report
 	an error in case of a value in range of an utf16 low surrogate.
 
diff --git a/iconv/Makefile b/iconv/Makefile
index b008707..c2299c9 100644
--- a/iconv/Makefile
+++ b/iconv/Makefile
@@ -42,7 +42,7 @@ CFLAGS-charmap.c = -DCHARMAP_PATH='"$(i18ndir)/charmaps"' \
 CFLAGS-linereader.c = -DNO_TRANSLITERATION
 CFLAGS-simple-hash.c = -I../locale
 
-tests	= tst-iconv1 tst-iconv2 tst-iconv3 tst-iconv4 tst-iconv5
+tests	= tst-iconv1 tst-iconv2 tst-iconv3 tst-iconv4 tst-iconv5 tst-iconv6
 
 others		= iconv_prog iconvconfig
 install-others-programs	= $(inst_bindir)/iconv
diff --git a/iconv/gconv_simple.c b/iconv/gconv_simple.c
index 5412bd6..f66bf34 100644
--- a/iconv/gconv_simple.c
+++ b/iconv/gconv_simple.c
@@ -638,6 +638,8 @@ ucs4le_internal_loop (struct __gconv_step *step,
 	      continue;
 	    }
 
+	  *inptrp = inptr;
+	  *outptrp = outptr;
 	  return __GCONV_ILLEGAL_INPUT;
 	}
 
diff --git a/iconv/tst-iconv6.c b/iconv/tst-iconv6.c
new file mode 100644
index 0000000..57d7f38
--- /dev/null
+++ b/iconv/tst-iconv6.c
@@ -0,0 +1,117 @@
+/* Testing ucs4le_internal_loop() in gconv_simple.c.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdio.h>
+#include <errno.h>
+#include <string.h>
+#include <inttypes.h>
+#include <iconv.h>
+#include <byteswap.h>
+
+static int
+do_test (void)
+{
+  iconv_t cd;
+  char *inptr;
+  size_t inlen;
+  char *outptr;
+  size_t outlen;
+  size_t n;
+  int e;
+  int result = 0;
+
+#if __BYTE_ORDER == __BIG_ENDIAN
+  /* On big-endian machines, ucs4le_internal_loop() swaps the bytes before
+     error checking. Thus the input values has to be swapped.  */
+# define VALUE(val) bswap_32 (val)
+#else
+# define VALUE(val) val
+#endif
+  uint32_t inbuf[3] = { VALUE (0x41), VALUE (0x80000000), VALUE (0x42) };
+  uint32_t outbuf[3] = { 0, 0, 0 };
+
+  cd = iconv_open ("WCHAR_T", "UCS-4LE");
+  if (cd == (iconv_t) -1)
+    {
+      printf ("cannot convert from UCS4LE to wchar_t: %m\n");
+      return 1;
+    }
+
+  inptr = (char *) inbuf;
+  inlen = sizeof (inbuf);
+  outptr = (char *) outbuf;
+  outlen = sizeof (outbuf);
+
+  n = iconv (cd, &inptr, &inlen, &outptr, &outlen);
+  e = errno;
+
+  if (n != (size_t) -1)
+    {
+      printf ("incorrect iconv() return value: %zd, expected -1\n", n);
+      result = 1;
+    }
+
+  if (e != EILSEQ)
+    {
+      printf ("incorrect error value: %s, expected %s\n",
+	      strerror (e), strerror (EILSEQ));
+      result = 1;
+    }
+
+  if (inptr != (char *) &inbuf[1])
+    {
+      printf ("inptr=0x%p does not point to invalid character! Expected=0x%p\n"
+	      , inptr, &inbuf[1]);
+      result = 1;
+    }
+
+  if (inlen != sizeof (inbuf) - sizeof (uint32_t))
+    {
+      printf ("inlen=%zd != %zd\n"
+	      , inlen, sizeof (inbuf) - sizeof (uint32_t));
+      result = 1;
+    }
+
+  if (outptr != (char *) &outbuf[1])
+    {
+      printf ("outptr=0x%p does not point to invalid character in inbuf! "
+	      "Expected=0x%p\n"
+	      , outptr, &outbuf[1]);
+      result = 1;
+    }
+
+  if (outlen != sizeof (inbuf) - sizeof (uint32_t))
+    {
+      printf ("outlen=%zd != %zd\n"
+	      , outlen, sizeof (outbuf) - sizeof (uint32_t));
+      result = 1;
+    }
+
+  if (outbuf[0] != 0x41 || outbuf[1] != 0 || outbuf[2] != 0)
+    {
+      puts ("Characters conversion is incorrect!");
+      result = 1;
+    }
+
+  iconv_close (cd);
+
+  return result;
+}
+
+#define TEST_FUNCTION do_test ()
+#include "../test-skeleton.c"

http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=a42a95c43133d69b1108f582cffa0f6986a9c3da

commit a42a95c43133d69b1108f582cffa0f6986a9c3da
Author: Stefan Liebler <stli@linux.vnet.ibm.com>
Date:   Wed May 25 17:18:06 2016 +0200

    S390: Fix utf32 to utf16 handling of low surrogates (disable cu42).
    
    According to the latest Unicode standard, a conversion from/to UTF-xx has
    to report an error if the character value is in range of an utf16 surrogate
    (0xd800..0xdfff). See https://sourceware.org/ml/libc-help/2015-12/msg00015.html.
    
    Thus the cu42 instruction, which converts from utf32 to utf16,  has to be
    disabled because it does not report an error in case of a value in range of
    a low surrogate (0xdc00..0xdfff). The etf3eh variant is removed and the c,
    vector variant is adjusted to handle the value in range of an utf16 low
    surrogate correctly.
    
    ChangeLog:
    
    	* sysdeps/s390/utf16-utf32-z9.c: Disable cu42 instruction and report
    	an error in case of a value in range of an utf16 low surrogate.

diff --git a/ChangeLog b/ChangeLog
index b50c3d0..c098194 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,10 @@
 2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
 
+	* sysdeps/s390/utf16-utf32-z9.c: Disable cu42 instruction and report
+	an error in case of a value in range of an utf16 low surrogate.
+
+2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
+
 	* sysdeps/s390/utf8-utf32-z9.c: Disable cu41 instruction and report
 	an error in case of a value in range of an utf16 low surrogate.
 
diff --git a/sysdeps/s390/utf16-utf32-z9.c b/sysdeps/s390/utf16-utf32-z9.c
index 8d42ab8..5d2ac44 100644
--- a/sysdeps/s390/utf16-utf32-z9.c
+++ b/sysdeps/s390/utf16-utf32-z9.c
@@ -145,42 +145,6 @@ gconv_end (struct __gconv_step *data)
   free (data->__data);
 }
 
-/* The macro for the hardware loop.  This is used for both
-   directions.  */
-#define HARDWARE_CONVERT(INSTRUCTION)					\
-  {									\
-    register const unsigned char* pInput __asm__ ("8") = inptr;		\
-    register size_t inlen __asm__ ("9") = inend - inptr;		\
-    register unsigned char* pOutput __asm__ ("10") = outptr;		\
-    register size_t outlen __asm__("11") = outend - outptr;		\
-    unsigned long cc = 0;						\
-									\
-    __asm__ __volatile__ (".machine push       \n\t"			\
-			  ".machine \"z9-109\" \n\t"			\
-			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
-			  "0: " INSTRUCTION "  \n\t"			\
-			  ".machine pop        \n\t"			\
-			  "   jo     0b        \n\t"			\
-			  "   ipm    %2        \n"			\
-			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-			    "+d" (outlen), "+d" (inlen)			\
-			  :						\
-			  : "cc", "memory");				\
-									\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-    cc >>= 28;								\
-									\
-    if (cc == 1)							\
-      {									\
-	result = __GCONV_FULL_OUTPUT;					\
-      }									\
-    else if (cc == 2)							\
-      {									\
-	result = __GCONV_ILLEGAL_INPUT;					\
-      }									\
-  }
-
 #define PREPARE_LOOP							\
   enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
   int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
@@ -310,7 +274,7 @@ gconv_end (struct __gconv_step *data)
 		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
 		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
 		  "12: lghi %[R_TMP2],16\n\t"				\
-		  "    sgr %[R_TMP2],%[R_TMP]\n\t"			\
+		  "    slgr %[R_TMP2],%[R_TMP]\n\t"			\
 		  "    srl %[R_TMP2],1\n\t"				\
 		  "    llh %[R_TMP],0(%[R_IN])\n\t"			\
 		  "    aghi %[R_OUTLEN],-4\n\t"				\
@@ -437,7 +401,7 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
     uint32_t c = get32 (inptr);						\
 									\
     if (__builtin_expect (c <= 0xd7ff, 1)				\
-	|| (c >=0xdc00 && c <= 0xffff))					\
+	|| (c > 0xdfff && c <= 0xffff))					\
       {									\
 	/* Two UTF-16 chars.  */					\
 	put16 (outptr, c);						\
@@ -475,29 +439,10 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
     inptr += 4;								\
   }
 
-#define BODY_TO_ETF3EH							\
-  {									\
-    HARDWARE_CONVERT ("cu42 %0, %1");					\
-									\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-									\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-									\
-    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
-  }
-
 #define BODY_TO_VX							\
   {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
+    size_t inlen = inend - inptr;					\
+    size_t outlen = outend - outptr;					\
     unsigned long tmp, tmp2, tmp3;					\
     asm volatile (".machine push\n\t"					\
 		  ".machine \"z13\"\n\t"				\
@@ -509,8 +454,8 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
 		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
 		  /* Loop which handles UTF-16 chars			\
 		     ch < 0xd800 || (ch > 0xdfff && ch < 0x10000).  */	\
-		  "0:  clgijl %[R_INLEN],32,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "0:  clgijl %[R_INLEN],32,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
 		  "1:  vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
 		  "    lghi %[R_TMP2],0\n\t"				\
 		  /* Shorten to UTF-16.  */				\
@@ -526,9 +471,15 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
 		  "    aghi %[R_INLEN],-32\n\t"				\
 		  "    aghi %[R_OUTLEN],-16\n\t"			\
 		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
-		  "    clgijl %[R_INLEN],32,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "    clgijl %[R_INLEN],32,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
 		  "    j 1b\n\t"					\
+		  /* Calculate remaining uint32_t values in inptr.  */	\
+		  "2:  \n\t"						\
+		  "    clgije %[R_INLEN],0,99f\n\t"			\
+		  "    clgijl %[R_INLEN],4,92f\n\t"			\
+		  "    srlg %[R_TMP2],%[R_INLEN],2\n\t"			\
+		  "    j 20f\n\t"					\
 		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff	\
 		     and check for ch >= 0x10000. (v30, v31)  */	\
 		  "9:  .long 0xd800,0xdfff,0x10000,0x10000\n\t"		\
@@ -540,21 +491,59 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
 		  "    agr %[R_TMP],%[R_TMP2]\n\t"			\
 		  "    srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
 		  "    ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
-		  "    jl 20f\n\t"					\
+		  "    jl 12f\n\t"					\
 		  "    vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
 		  /* Update pointers.  */				\
 		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
 		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
 		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
 		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
-		  /* Handles UTF16 surrogates with convert instruction.  */ \
-		  "20: cu42 %[R_OUT],%[R_IN]\n\t"			\
-		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
-		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
-		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
+		  /* Calculate remaining uint32_t values in vrs.  */	\
+		  "12: lghi %[R_TMP2],8\n\t"				\
+		  "    srlg %[R_TMP3],%[R_TMP3],1\n\t"			\
+		  "    slgr %[R_TMP2],%[R_TMP3]\n\t"			\
+		  /* Handle remaining UTF-32 characters.  */		\
+		  "20: l %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-4\n\t"				\
+		  /* Test if ch is 2byte UTF-16 char. */		\
+		  "    clfi %[R_TMP],0xffff\n\t"			\
+		  "    jh 21f\n\t"					\
+		  /* Handle 2 byte UTF16 char.  */			\
+		  "    lgr %[R_TMP3],%[R_TMP]\n\t"			\
+		  "    nilf %[R_TMP],0xf800\n\t"			\
+		  "    clfi %[R_TMP],0xd800\n\t"			\
+		  "    je 91f\n\t" /* Do not accept UTF-16 surrogates.  */ \
+		  "    slgfi %[R_OUTLEN],2\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    sth %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],4(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],2(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Test if ch is 4byte UTF-16 char. */		\
+		  "21: clfi %[R_TMP],0x10ffff\n\t"			\
+		  "    jh 91f\n\t" /* ch > 0x10ffff is not allowed!  */	\
+		  /* Handle 4 byte UTF16 char.  */			\
+		  "    slgfi %[R_OUTLEN],4\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    slfi %[R_TMP],0x10000\n\t" /* zabcd = uvwxy - 1.  */ \
+		  "    llilf %[R_TMP3],0xd800dc00\n\t"			\
+		  "    la %[R_IN],4(%[R_IN])\n\t"			\
+		  "    risbgn %[R_TMP3],%[R_TMP],38,47,6\n\t" /* High surrogate.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],54,63,0\n\t" /* Low surrogate.  */ \
+		  "    st %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  "92: lghi %[R_RES],%[RES_IN_FULL]\n\t"		\
+		  "    j 99f\n\t"					\
+		  "91: lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
+		  "    j 99f\n\t"					\
+		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
+		  "99: \n\t"						\
 		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		  : /* outputs */ [R_IN] "+a" (inptr)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
 		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
 		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
 		    , [R_RES] "+d" (result)				\
@@ -567,17 +556,10 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
 		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
 		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
 		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
-									\
     if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
+	|| result != __GCONV_ILLEGAL_INPUT)				\
       break;								\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
+									\
     STANDARD_TO_LOOP_ERR_HANDLER (4);					\
   }
 
@@ -590,15 +572,6 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
 #define BODY			BODY_TO_C
 #include <iconv/loop.c>
 
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf16_loop_etf3eh
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_TO_ETF3EH
-#include <iconv/loop.c>
-
 #if defined HAVE_S390_VX_ASM_SUPPORT
 /* Generate loop-function with hardware vector instructions.  */
 # define MIN_NEEDED_INPUT	MIN_NEEDED_TO
@@ -623,10 +596,6 @@ __to_utf16_loop_resolver (unsigned long int dl_hwcap)
     return __to_utf16_loop_vx;
   else
 #endif
-  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
-      && dl_hwcap & HWCAP_S390_ETF3EH)
-    return __to_utf16_loop_etf3eh;
-  else
     return __to_utf16_loop_c;
 }
 

http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=52f8a48e24563daa807f94824ce9782b9a9eece9

commit 52f8a48e24563daa807f94824ce9782b9a9eece9
Author: Stefan Liebler <stli@linux.vnet.ibm.com>
Date:   Wed May 25 17:18:05 2016 +0200

    S390: Fix utf32 to utf8 handling of low surrogates (disable cu41).
    
    According to the latest Unicode standard, a conversion from/to UTF-xx has
    to report an error if the character value is in range of an utf16 surrogate
    (0xd800..0xdfff). See https://sourceware.org/ml/libc-help/2015-12/msg00015.html.
    
    Thus the cu41 instruction, which converts from utf32 to utf8,  has to be
    disabled because it does not report an error in case of a value in range of
    a low surrogate (0xdc00..0xdfff). The etf3eh variant is removed and the c,
    vector variant is adjusted to handle the value in range of an utf16 low
    surrogate correctly.
    
    ChangeLog:
    
    	* sysdeps/s390/utf8-utf32-z9.c: Disable cu41 instruction and report
    	an error in case of a value in range of an utf16 low surrogate.

diff --git a/ChangeLog b/ChangeLog
index 78afe17..b50c3d0 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,10 @@
 2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
 
+	* sysdeps/s390/utf8-utf32-z9.c: Disable cu41 instruction and report
+	an error in case of a value in range of an utf16 low surrogate.
+
+2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
+
 	* sysdeps/s390/s390-64/Makefile (iconvdata-subdirectory):
 	Move to ...
 	* sysdeps/s390/Makefile: ... here.
diff --git a/sysdeps/s390/utf8-utf32-z9.c b/sysdeps/s390/utf8-utf32-z9.c
index e39e0a7..efae745 100644
--- a/sysdeps/s390/utf8-utf32-z9.c
+++ b/sysdeps/s390/utf8-utf32-z9.c
@@ -572,28 +572,6 @@ __from_utf8_loop_resolver (unsigned long int dl_hwcap)
 
 strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
 
-
-/* Conversion from UTF-32 internal/BE to UTF-8.  */
-#define BODY_TO_HW(ASM)							\
-  {									\
-    ASM;								\
-    if (__glibc_likely (inptr == inend)					\
-	|| result == __GCONV_FULL_OUTPUT)				\
-      break;								\
-    if (inptr + 4 > inend)						\
-      {									\
-	result = __GCONV_INCOMPLETE_INPUT;				\
-	break;								\
-      }									\
-    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
-  }
-
-/* The hardware routine uses the S/390 cu41 instruction.  */
-#define BODY_TO_ETF3EH BODY_TO_HW (HARDWARE_CONVERT ("cu41 %0, %1"))
-
-/* The hardware routine uses the S/390 vector and cu41 instructions.  */
-#define BODY_TO_VX BODY_TO_HW (HW_TO_VX)
-
 /* The software routine mimics the S/390 cu41 instruction.  */
 #define BODY_TO_C						\
   {								\
@@ -632,7 +610,7 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
 	    result = __GCONV_FULL_OUTPUT;			\
 	    break;						\
 	  }							\
-	if (wc >= 0xd800 && wc < 0xdc00)			\
+	if (wc >= 0xd800 && wc <= 0xdfff)			\
 	  {							\
 	    /* Do not accept UTF-16 surrogates.   */		\
 	    result = __GCONV_ILLEGAL_INPUT;			\
@@ -679,13 +657,12 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
     inptr += 4;							\
   }
 
-#define HW_TO_VX							\
+/* The hardware routine uses the S/390 vector instructions.  */
+#define BODY_TO_VX							\
   {									\
-    register const unsigned char* pInput asm ("8") = inptr;		\
-    register size_t inlen asm ("9") = inend - inptr;			\
-    register unsigned char* pOutput asm ("10") = outptr;		\
-    register size_t outlen asm("11") = outend - outptr;			\
-    unsigned long tmp, tmp2;						\
+    size_t inlen = inend - inptr;					\
+    size_t outlen = outend - outptr;					\
+    unsigned long tmp, tmp2, tmp3;					\
     asm volatile (".machine push\n\t"					\
 		  ".machine \"z13\"\n\t"				\
 		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
@@ -696,10 +673,10 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
 		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
 		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
 		  /* Loop which handles UTF-32 chars <=0x7f.  */	\
-		  "0:  clgijl %[R_INLEN],64,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "0:  clgijl %[R_INLEN],64,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
 		  "1:  vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
-		  "    lghi %[R_TMP],0\n\t"				\
+		  "    lghi %[R_TMP2],0\n\t"				\
 		  /* Shorten to byte values.  */			\
 		  "    vpkf %%v23,%%v16,%%v17\n\t"			\
 		  "    vpkf %%v24,%%v18,%%v19\n\t"			\
@@ -719,41 +696,116 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
 		  "    aghi %[R_OUTLEN],-16\n\t"			\
 		  "    la %[R_IN],64(%[R_IN])\n\t"			\
 		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
-		  "    clgijl %[R_INLEN],64,20f\n\t"			\
-		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "    clgijl %[R_INLEN],64,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
 		  "    j 1b\n\t"					\
 		  /* Found a value > 0x7f.  */				\
-		  "13: ahi %[R_TMP],4\n\t"				\
-		  "12: ahi %[R_TMP],4\n\t"				\
-		  "11: ahi %[R_TMP],4\n\t"				\
-		  "10: vlgvb %[R_I],%%v22,7\n\t"			\
-		  "    srlg %[R_I],%[R_I],2\n\t"			\
-		  "    agr %[R_I],%[R_TMP]\n\t"				\
-		  "    je 20f\n\t"					\
+		  "13: ahi %[R_TMP2],4\n\t"				\
+		  "12: ahi %[R_TMP2],4\n\t"				\
+		  "11: ahi %[R_TMP2],4\n\t"				\
+		  "10: vlgvb %[R_TMP],%%v22,7\n\t"			\
+		  "    srlg %[R_TMP],%[R_TMP],2\n\t"			\
+		  "    agr %[R_TMP],%[R_TMP2]\n\t"			\
+		  "    je 16f\n\t"					\
 		  /* Store characters before invalid one...  */		\
-		  "    slgr %[R_OUTLEN],%[R_I]\n\t"			\
-		  "15: aghi %[R_I],-1\n\t"				\
-		  "    vstl %%v23,%[R_I],0(%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP]\n\t"			\
+		  "15: aghi %[R_TMP],-1\n\t"				\
+		  "    vstl %%v23,%[R_TMP],0(%[R_OUT])\n\t"		\
 		  /* ... and update pointers.  */			\
-		  "    aghi %[R_I],1\n\t"				\
-		  "    la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"		\
-		  "    sllg %[R_I],%[R_I],2\n\t"			\
-		  "    la %[R_IN],0(%[R_I],%[R_IN])\n\t"		\
-		  "    slgr %[R_INLEN],%[R_I]\n\t"			\
-		  /* Handle multibyte utf8-char with convert instruction. */ \
-		  "20: cu41 %[R_OUT],%[R_IN]\n\t"			\
-		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
-		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
-		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
+		  "    aghi %[R_TMP],1\n\t"				\
+		  "    la %[R_OUT],0(%[R_TMP],%[R_OUT])\n\t"		\
+		  "    sllg %[R_TMP2],%[R_TMP],2\n\t"			\
+		  "    la %[R_IN],0(%[R_TMP2],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP2]\n\t"			\
+		  /* Calculate remaining uint32_t values in loaded vrs.  */ \
+		  "16: lghi %[R_TMP2],16\n\t"				\
+		  "    sgr %[R_TMP2],%[R_TMP]\n\t"			\
+		  "    l %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-4\n\t"				\
+		  "    j 22f\n\t"					\
+		  /* Handle remaining bytes.  */			\
+		  "2:  clgije %[R_INLEN],0,99f\n\t"			\
+		  "    clgijl %[R_INLEN],4,92f\n\t"			\
+		  /* Calculate remaining uint32_t values in inptr.  */	\
+		  "    srlg %[R_TMP2],%[R_INLEN],2\n\t"			\
+		  /* Handle multibyte utf8-char. */			\
+		  "20: l %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-4\n\t"				\
+		  /* Test if ch is 1byte UTF-8 char. */			\
+		  "21: clijh %[R_TMP],0x7f,22f\n\t"			\
+		  /* Handle 1-byte UTF-8 char.  */			\
+		  "31: slgfi %[R_OUTLEN],1\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    stc %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],4(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],1(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Test if ch is 2byte UTF-8 char. */			\
+		  "22: clfi %[R_TMP],0x7ff\n\t"				\
+		  "    jh 23f\n\t"					\
+		  /* Handle 2-byte UTF-8 char.  */			\
+		  "32: slgfi %[R_OUTLEN],2\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    llill %[R_TMP3],0xc080\n\t"			\
+		  "    risbgn %[R_TMP3],%[R_TMP],51,55,2\n\t" /* 1. byte.   */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 2. byte.   */ \
+		  "    sth %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],4(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],2(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Test if ch is 3-byte UTF-8 char.  */		\
+		  "23: clfi %[R_TMP],0xffff\n\t"			\
+		  "    jh 24f\n\t"					\
+		  /* Handle 3-byte UTF-8 char.  */			\
+		  "33: slgfi %[R_OUTLEN],3\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    llilf %[R_TMP3],0xe08080\n\t"			\
+		  "    risbgn %[R_TMP3],%[R_TMP],44,47,4\n\t" /* 1. byte.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 2. byte.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 3. byte.  */ \
+		  /* Test if ch is a UTF-16 surrogate: ch & 0xf800 == 0xd800  */ \
+		  "    nilf %[R_TMP],0xf800\n\t"			\
+		  "    clfi %[R_TMP],0xd800\n\t"			\
+		  "    je 91f\n\t" /* Do not accept UTF-16 surrogates.  */ \
+		  "    stcm %[R_TMP3],7,0(%[R_OUT])\n\t"		\
+		  "    la %[R_IN],4(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],3(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Test if ch is 4-byte UTF-8 char.  */		\
+		  "24: clfi %[R_TMP],0x10ffff\n\t"			\
+		  "    jh 91f\n\t" /* ch > 0x10ffff is not allowed!  */	\
+		  /* Handle 4-byte UTF-8 char.  */			\
+		  "34: slgfi %[R_OUTLEN],4\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    llilf %[R_TMP3],0xf0808080\n\t"			\
+		  "    risbgn %[R_TMP3],%[R_TMP],37,39,6\n\t" /* 1. byte.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],42,47,4\n\t" /* 2. byte.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 3. byte.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 4. byte.  */ \
+		  "    st %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],4(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  "92: lghi %[R_RES],%[RES_IN_FULL]\n\t"		\
+		  "    j 99f\n\t"					\
+		  "91: lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
+		  "    j 99f\n\t"					\
+		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
+		  "99: \n\t"						\
 		  ".machine pop"					\
-		  : /* outputs */ [R_IN] "+a" (pInput)			\
-		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
-		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=d" (tmp)	\
-		    , [R_I] "=a" (tmp2)					\
+		  : /* outputs */ [R_IN] "+a" (inptr)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=a" (tmp2), [R_TMP3] "=d" (tmp3)	\
 		    , [R_RES] "+d" (result)				\
 		  : /* inputs */					\
 		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
 		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
 		  : /* clobber list */ "memory", "cc"			\
 		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
 		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
@@ -761,8 +813,11 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
 		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
 		    ASM_CLOBBER_VR ("v24")				\
 		  );							\
-    inptr = pInput;							\
-    outptr = pOutput;							\
+    if (__glibc_likely (inptr == inend)					\
+	|| result != __GCONV_ILLEGAL_INPUT)				\
+      break;								\
+									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
   }
 
 /* Generate loop-function with software routing.  */
@@ -774,15 +829,6 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
 #define LOOP_NEED_FLAGS
 #include <iconv/loop.c>
 
-/* Generate loop-function with hardware utf-convert instruction.  */
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			__to_utf8_loop_etf3eh
-#define LOOP_NEED_FLAGS
-#define BODY			BODY_TO_ETF3EH
-#include <iconv/loop.c>
-
 #if defined HAVE_S390_VX_ASM_SUPPORT
 /* Generate loop-function with hardware vector and utf-convert instructions.  */
 # define MIN_NEEDED_INPUT	MIN_NEEDED_TO
@@ -807,10 +853,6 @@ __to_utf8_loop_resolver (unsigned long int dl_hwcap)
     return __to_utf8_loop_vx;
   else
 #endif
-  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
-      && dl_hwcap & HWCAP_S390_ETF3EH)
-    return __to_utf8_loop_etf3eh;
-  else
     return __to_utf8_loop_c;
 }
 

http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=ee518b7070b1bcb41382b6db10f513e071b2c20e

commit ee518b7070b1bcb41382b6db10f513e071b2c20e
Author: Stefan Liebler <stli@linux.vnet.ibm.com>
Date:   Wed May 25 17:18:05 2016 +0200

    S390: Use s390-64 specific ionv-modules on s390-32, too.
    
    This patch reworks the existing s390 64bit specific iconv modules in order
    to use them on s390 31bit, too.
    
    Thus the parts for subdirectory iconvdata in sysdeps/s390/s390-64/Makefile
    were moved to sysdeps/s390/Makefile so that they apply on 31bit, too.
    All those modules are moved from sysdeps/s390/s390-64 directory to sysdeps/s390.
    
    The iso-8859-1 to/from cp037 module was adjusted, to use brct (branch relative
    on count) instruction on 31bit s390 instead of brctg, because the brctg is a
    zarch instruction and is not available on a 31bit kernel.
    
    The utf modules are using zarch instructions, thus the directive machinemode
    zarch_nohighgprs was added to the inline assemblies to omit the high-gprs flag
    in the shared libraries. Otherwise they can't be loaded on a 31bit kernel.
    The ifunc resolvers were adjusted in order to call the etf3eh or vector variants
    only if zarch instructions are available (64bit kernel in 31bit compat-mode).
    Furthermore some variable types were changed. E.g. unsigned long long would be
    a register pair on s390 31bit, but we want only one single register.
    For variables of type size_t the register contents have to be enlarged from a
    32bit to a 64bit value on 31bit, because the inline assemblies uses 64bit values
    in such cases.
    
    ChangeLog:
    
    	* sysdeps/s390/s390-64/Makefile (iconvdata-subdirectory):
    	Move to ...
    	* sysdeps/s390/Makefile: ... here.
    	* sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c: Move to ...
    	* sysdeps/s390/iso-8859-1_cp037_z900.c: ... here.
    	(BRANCH_ON_COUNT): New define.
    	(TR_LOOP): Use BRANCH_ON_COUNT instead of brctg.
    	* sysdeps/s390/s390-64/utf16-utf32-z9.c: Move to ...
    	* sysdeps/s390/utf16-utf32-z9.c: ... here and adjust to
    	run on s390-32, too.
    	* sysdeps/s390/s390-64/utf8-utf16-z9.c: Move to ...
    	* sysdeps/s390/utf8-utf16-z9.c: ... here and adjust to
    	run on s390-32, too.
    	* sysdeps/s390/s390-64/utf8-utf32-z9.c: Move to ...
    	* sysdeps/s390/utf8-utf32-z9.c: ... here and adjust to
    	run on s390-32, too.

diff --git a/ChangeLog b/ChangeLog
index 306c616..78afe17 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,24 @@
 2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
 
+	* sysdeps/s390/s390-64/Makefile (iconvdata-subdirectory):
+	Move to ...
+	* sysdeps/s390/Makefile: ... here.
+	* sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c: Move to ...
+	* sysdeps/s390/iso-8859-1_cp037_z900.c: ... here.
+	(BRANCH_ON_COUNT): New define.
+	(TR_LOOP): Use BRANCH_ON_COUNT instead of brctg.
+	* sysdeps/s390/s390-64/utf16-utf32-z9.c: Move to ...
+	* sysdeps/s390/utf16-utf32-z9.c: ... here and adjust to
+	run on s390-32, too.
+	* sysdeps/s390/s390-64/utf8-utf16-z9.c: Move to ...
+	* sysdeps/s390/utf8-utf16-z9.c: ... here and adjust to
+	run on s390-32, too.
+	* sysdeps/s390/s390-64/utf8-utf32-z9.c: Move to ...
+	* sysdeps/s390/utf8-utf32-z9.c: ... here and adjust to
+	run on s390-32, too.
+
+2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
+
 	* sysdeps/s390/s390-64/utf16-utf32-z9.c: Use ifunc to select c,
 	etf3eh or new vector loop-variant.
 
diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/Makefile
similarity index 83%
copy from sysdeps/s390/s390-64/Makefile
copy to sysdeps/s390/Makefile
index ce4aa3b..d508365 100644
--- a/sysdeps/s390/s390-64/Makefile
+++ b/sysdeps/s390/Makefile
@@ -1,13 +1,3 @@
-ifeq ($(subdir),gmon)
-sysdep_routines += s390x-mcount
-endif
-
-ifeq ($(subdir),elf)
-CFLAGS-rtld.c += -Wno-uninitialized -Wno-unused
-CFLAGS-dl-load.c += -Wno-unused
-CFLAGS-dl-reloc.c += -Wno-unused
-endif
-
 ifeq ($(subdir),iconvdata)
 ISO-8859-1_CP037_Z900-routines := iso-8859-1_cp037_z900
 ISO-8859-1_CP037_Z900-map := gconv.map
diff --git a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c b/sysdeps/s390/iso-8859-1_cp037_z900.c
similarity index 98%
rename from sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
rename to sysdeps/s390/iso-8859-1_cp037_z900.c
index 3b63e6a..fc25dff 100644
--- a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
+++ b/sysdeps/s390/iso-8859-1_cp037_z900.c
@@ -175,6 +175,12 @@ __attribute__ ((aligned (8))) =
 #define MIN_NEEDED_FROM		1
 #define MIN_NEEDED_TO		1
 
+# if defined __s390x__
+#  define BRANCH_ON_COUNT(REG,LBL) "brctg %" #REG "," #LBL "\n\t"
+# else
+#  define BRANCH_ON_COUNT(REG,LBL) "brct %" #REG "," #LBL "\n\t"
+# endif
+
 #define TR_LOOP(TABLE)							\
   {									\
     size_t length = (inend - inptr < outend - outptr			\
@@ -188,7 +194,7 @@ __attribute__ ((aligned (8))) =
 			     "   tr 0(256,%[R_OUT]),0(%[R_TBL])\n\t"	\
 			     "   la %[R_IN],256(%[R_IN])\n\t"		\
 			     "   la %[R_OUT],256(%[R_OUT])\n\t"		\
-			     "   brctg %[R_LI],0b\n\t"			\
+			     BRANCH_ON_COUNT ([R_LI], 0b)		\
 			     : /* outputs */ [R_IN] "+a" (inptr)	\
 			       , [R_OUT] "+a" (outptr), [R_LI] "+d" (blocks) \
 			     : /* inputs */ [R_TBL] "a" (TABLE)		\
diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
index ce4aa3b..b4d793b 100644
--- a/sysdeps/s390/s390-64/Makefile
+++ b/sysdeps/s390/s390-64/Makefile
@@ -7,35 +7,3 @@ CFLAGS-rtld.c += -Wno-uninitialized -Wno-unused
 CFLAGS-dl-load.c += -Wno-unused
 CFLAGS-dl-reloc.c += -Wno-unused
 endif
-
-ifeq ($(subdir),iconvdata)
-ISO-8859-1_CP037_Z900-routines := iso-8859-1_cp037_z900
-ISO-8859-1_CP037_Z900-map := gconv.map
-
-UTF8_UTF32_Z9-routines := utf8-utf32-z9
-UTF8_UTF32_Z9-map := gconv.map
-
-UTF16_UTF32_Z9-routines := utf16-utf32-z9
-UTF16_UTF32_Z9-map := gconv.map
-
-UTF8_UTF16_Z9-routines := utf8-utf16-z9
-UTF8_UTF16_Z9-map := gconv.map
-
-s390x-iconv-modules = ISO-8859-1_CP037_Z900 UTF8_UTF16_Z9 UTF16_UTF32_Z9 UTF8_UTF32_Z9
-
-extra-modules-left += $(s390x-iconv-modules)
-include extra-module.mk
-
-cpp-srcs-left := $(foreach mod,$(s390x-iconv-modules),$($(mod)-routines))
-lib := iconvdata
-include $(patsubst %,$(..)cppflags-iterator.mk,$(cpp-srcs-left))
-
-extra-objs      += $(addsuffix .so, $(s390x-iconv-modules))
-install-others  += $(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules))
-
-$(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules)) : \
-$(inst_gconvdir)/%.so: $(objpfx)%.so $(+force)
-	$(do-install-program)
-
-sysdeps-gconv-modules = ../sysdeps/s390/gconv-modules
-endif
diff --git a/sysdeps/s390/s390-64/utf16-utf32-z9.c b/sysdeps/s390/utf16-utf32-z9.c
similarity index 97%
rename from sysdeps/s390/s390-64/utf16-utf32-z9.c
rename to sysdeps/s390/utf16-utf32-z9.c
index 61d0a94..8d42ab8 100644
--- a/sysdeps/s390/s390-64/utf16-utf32-z9.c
+++ b/sysdeps/s390/utf16-utf32-z9.c
@@ -36,6 +36,12 @@
 # define ASM_CLOBBER_VR(NR)
 #endif
 
+#if defined __s390x__
+# define CONVERT_32BIT_SIZE_T(REG)
+#else
+# define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
+#endif
+
 /* UTF-32 big endian byte order mark.  */
 #define BOM_UTF32               0x0000feffu
 
@@ -144,13 +150,14 @@ gconv_end (struct __gconv_step *data)
 #define HARDWARE_CONVERT(INSTRUCTION)					\
   {									\
     register const unsigned char* pInput __asm__ ("8") = inptr;		\
-    register unsigned long long inlen __asm__ ("9") = inend - inptr;	\
+    register size_t inlen __asm__ ("9") = inend - inptr;		\
     register unsigned char* pOutput __asm__ ("10") = outptr;		\
-    register unsigned long long outlen __asm__("11") = outend - outptr;	\
-    uint64_t cc = 0;							\
+    register size_t outlen __asm__("11") = outend - outptr;		\
+    unsigned long cc = 0;						\
 									\
     __asm__ __volatile__ (".machine push       \n\t"			\
 			  ".machine \"z9-109\" \n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
 			  "0: " INSTRUCTION "  \n\t"			\
 			  ".machine pop        \n\t"			\
 			  "   jo     0b        \n\t"			\
@@ -260,6 +267,8 @@ gconv_end (struct __gconv_step *data)
 		  /* Setup to check for surrogates.  */			\
 		  "    larl %[R_TMP],9f\n\t"				\
 		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
 		  /* Loop which handles UTF-16 chars <0xd800, >0xdfff.  */ \
 		  "0:  clgijl %[R_INLEN],16,2f\n\t"			\
 		  "    clgijl %[R_OUTLEN],32,2f\n\t"			\
@@ -496,6 +505,8 @@ strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
 		  /* Setup to check for surrogates.  */			\
 		  "    larl %[R_TMP],9f\n\t"				\
 		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
 		  /* Loop which handles UTF-16 chars			\
 		     ch < 0xd800 || (ch > 0xdfff && ch < 0x10000).  */	\
 		  "0:  clgijl %[R_INLEN],32,20f\n\t"			\
@@ -612,7 +623,8 @@ __to_utf16_loop_resolver (unsigned long int dl_hwcap)
     return __to_utf16_loop_vx;
   else
 #endif
-  if (dl_hwcap & HWCAP_S390_ETF3EH)
+  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
+      && dl_hwcap & HWCAP_S390_ETF3EH)
     return __to_utf16_loop_etf3eh;
   else
     return __to_utf16_loop_c;
diff --git a/sysdeps/s390/s390-64/utf8-utf16-z9.c b/sysdeps/s390/utf8-utf16-z9.c
similarity index 97%
rename from sysdeps/s390/s390-64/utf8-utf16-z9.c
rename to sysdeps/s390/utf8-utf16-z9.c
index 7520ef2..d3dc9bd 100644
--- a/sysdeps/s390/s390-64/utf8-utf16-z9.c
+++ b/sysdeps/s390/utf8-utf16-z9.c
@@ -36,6 +36,12 @@
 # define ASM_CLOBBER_VR(NR)
 #endif
 
+#if defined __s390x__
+# define CONVERT_32BIT_SIZE_T(REG)
+#else
+# define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
+#endif
+
 /* Defines for skeleton.c.  */
 #define DEFINE_INIT		0
 #define DEFINE_FINI		0
@@ -140,13 +146,14 @@ gconv_end (struct __gconv_step *data)
 #define HARDWARE_CONVERT(INSTRUCTION)					\
   {									\
     register const unsigned char* pInput __asm__ ("8") = inptr;		\
-    register unsigned long long inlen __asm__ ("9") = inend - inptr;	\
+    register size_t inlen __asm__ ("9") = inend - inptr;		\
     register unsigned char* pOutput __asm__ ("10") = outptr;		\
-    register unsigned long long outlen __asm__("11") = outend - outptr;	\
-    uint64_t cc = 0;							\
+    register size_t outlen __asm__("11") = outend - outptr;		\
+    unsigned long cc = 0;						\
 									\
     __asm__ __volatile__ (".machine push       \n\t"			\
 			  ".machine \"z9-109\" \n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
 			  "0: " INSTRUCTION "  \n\t"			\
 			  ".machine pop        \n\t"			\
 			  "   jo     0b        \n\t"			\
@@ -221,6 +228,8 @@ gconv_end (struct __gconv_step *data)
 		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
 		  "    vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */ \
 		  "    vrepib %%v31,0x20\n\t"				\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
 		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
 		  "0:  clgijl %[R_INLEN],16,20f\n\t"			\
 		  "    clgijl %[R_OUTLEN],32,20f\n\t"			\
@@ -479,7 +488,8 @@ __from_utf8_loop_resolver (unsigned long int dl_hwcap)
     return __from_utf8_loop_vx;
   else
 #endif
-  if (dl_hwcap & HWCAP_S390_ETF3EH)
+  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
+      && dl_hwcap & HWCAP_S390_ETF3EH)
     return __from_utf8_loop_etf3eh;
   else
     return __from_utf8_loop_c;
@@ -602,6 +612,8 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
 		  /* Setup to check for values <= 0x7f.  */		\
 		  "    larl %[R_TMP],9f\n\t"				\
 		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
 		  /* Loop which handles UTF-16 chars <=0x7f.  */	\
 		  "0:  clgijl %[R_INLEN],32,2f\n\t"			\
 		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
diff --git a/sysdeps/s390/s390-64/utf8-utf32-z9.c b/sysdeps/s390/utf8-utf32-z9.c
similarity index 97%
rename from sysdeps/s390/s390-64/utf8-utf32-z9.c
rename to sysdeps/s390/utf8-utf32-z9.c
index f9c9199..e39e0a7 100644
--- a/sysdeps/s390/s390-64/utf8-utf32-z9.c
+++ b/sysdeps/s390/utf8-utf32-z9.c
@@ -36,6 +36,12 @@
 # define ASM_CLOBBER_VR(NR)
 #endif
 
+#if defined __s390x__
+# define CONVERT_32BIT_SIZE_T(REG)
+#else
+# define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
+#endif
+
 /* Defines for skeleton.c.  */
 #define DEFINE_INIT		0
 #define DEFINE_FINI		0
@@ -140,13 +146,14 @@ gconv_end (struct __gconv_step *data)
 #define HARDWARE_CONVERT(INSTRUCTION)					\
   {									\
     register const unsigned char* pInput __asm__ ("8") = inptr;		\
-    register unsigned long long inlen __asm__ ("9") = inend - inptr;	\
+    register size_t inlen __asm__ ("9") = inend - inptr;		\
     register unsigned char* pOutput __asm__ ("10") = outptr;		\
-    register unsigned long long outlen __asm__("11") = outend - outptr;	\
-    uint64_t cc = 0;							\
+    register size_t outlen __asm__("11") = outend - outptr;		\
+    unsigned long cc = 0;						\
 									\
     __asm__ __volatile__ (".machine push       \n\t"			\
 			  ".machine \"z9-109\" \n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
 			  "0: " INSTRUCTION "  \n\t"			\
 			  ".machine pop        \n\t"			\
 			  "   jo     0b        \n\t"			\
@@ -413,6 +420,8 @@ gconv_end (struct __gconv_step *data)
 		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
 		  "    vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */ \
 		  "    vrepib %%v31,0x20\n\t"				\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
 		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
 		  "0:  clgijl %[R_INLEN],16,20f\n\t"			\
 		  "    clgijl %[R_OUTLEN],64,20f\n\t"			\
@@ -554,7 +563,8 @@ __from_utf8_loop_resolver (unsigned long int dl_hwcap)
     return __from_utf8_loop_vx;
   else
 #endif
-  if (dl_hwcap & HWCAP_S390_ETF3EH)
+  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
+      && dl_hwcap & HWCAP_S390_ETF3EH)
     return __from_utf8_loop_etf3eh;
   else
     return __from_utf8_loop_c;
@@ -683,6 +693,8 @@ strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
 		  "    vzero %%v21\n\t"					\
 		  "    vleih %%v21,8192,0\n\t"  /* element 0:   >  */	\
 		  "    vleih %%v21,-8192,2\n\t" /* element 1: =<>  */	\
+		  CONVERT_32BIT_SIZE_T ([R_INLEN])			\
+		  CONVERT_32BIT_SIZE_T ([R_OUTLEN])			\
 		  /* Loop which handles UTF-32 chars <=0x7f.  */	\
 		  "0:  clgijl %[R_INLEN],64,20f\n\t"			\
 		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
@@ -795,7 +807,8 @@ __to_utf8_loop_resolver (unsigned long int dl_hwcap)
     return __to_utf8_loop_vx;
   else
 #endif
-  if (dl_hwcap & HWCAP_S390_ETF3EH)
+  if (dl_hwcap & HWCAP_S390_ZARCH && dl_hwcap & HWCAP_S390_HIGH_GPRS
+      && dl_hwcap & HWCAP_S390_ETF3EH)
     return __to_utf8_loop_etf3eh;
   else
     return __to_utf8_loop_c;

http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=6896776c3c9c32fd22324e6de6737dd69ae73213

commit 6896776c3c9c32fd22324e6de6737dd69ae73213
Author: Stefan Liebler <stli@linux.vnet.ibm.com>
Date:   Wed May 25 17:18:05 2016 +0200

    S390: Optimize utf16-utf32 module.
    
    This patch reworks the s390 specific module to convert between utf16 and utf32.
    Now ifunc is used to choose either the c or etf3eh (with convert utf
    instruction) variants at runtime.
    Furthermore a new vector variant for z13 is introduced which will be build
    and chosen if vector support is available at build / runtime.
    
    In case of converting utf 32 to utf16, the vector variant optimizes input of
    2byte utf16 characters. The convert utf instruction is used if an utf16
    surrogate is found.
    
    For the other direction utf16 to utf32, the cu24 instruction can't be re-
    enabled, because it does not report an error, if the input-stream consists of
    a single low surrogate utf16 char (e.g. 0xdc00). This applies to the newest z13,
    too. Thus there is only the c or the new vector variant, which can handle utf16
    surrogate characters.
    
    This patch also fixes some whitespace errors. Furthermore, the etf3eh variant is
    handling the "UTF-xx//IGNORE" case now. Before they ignored the ignore-case and
    always stopped at an error.
    
    ChangeLog:
    
    	* sysdeps/s390/s390-64/utf16-utf32-z9.c: Use ifunc to select c,
    	etf3eh or new vector loop-variant.

diff --git a/ChangeLog b/ChangeLog
index c201dd1..306c616 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,10 @@
 2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
 
+	* sysdeps/s390/s390-64/utf16-utf32-z9.c: Use ifunc to select c,
+	etf3eh or new vector loop-variant.
+
+2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
+
 	* sysdeps/s390/s390-64/utf8-utf16-z9.c: Use ifunc to select c,
 	etf3eh or new vector loop-variant.
 
diff --git a/sysdeps/s390/s390-64/utf16-utf32-z9.c b/sysdeps/s390/s390-64/utf16-utf32-z9.c
index a3863ee..61d0a94 100644
--- a/sysdeps/s390/s390-64/utf16-utf32-z9.c
+++ b/sysdeps/s390/s390-64/utf16-utf32-z9.c
@@ -30,47 +30,27 @@
 #include <dl-procinfo.h>
 #include <gconv.h>
 
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
+
 /* UTF-32 big endian byte order mark.  */
 #define BOM_UTF32               0x0000feffu
 
 /* UTF-16 big endian byte order mark.  */
-#define BOM_UTF16	        0xfeff
+#define BOM_UTF16               0xfeff
 
 #define DEFINE_INIT		0
 #define DEFINE_FINI		0
 #define MIN_NEEDED_FROM		2
 #define MAX_NEEDED_FROM		4
 #define MIN_NEEDED_TO		4
-#define FROM_LOOP		from_utf16_loop
-#define TO_LOOP			to_utf16_loop
+#define FROM_LOOP		__from_utf16_loop
+#define TO_LOOP			__to_utf16_loop
 #define FROM_DIRECTION		(dir == from_utf16)
 #define ONE_DIRECTION           0
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      if (dir == to_utf16)						\
-	{								\
-          /* Emit the UTF-16 Byte Order Mark.  */			\
-          if (__glibc_unlikely (outbuf + 2 > outend))			      \
-	    return __GCONV_FULL_OUTPUT;					\
-									\
-	  put16u (outbuf, BOM_UTF16);					\
-	  outbuf += 2;							\
-	}								\
-      else								\
-	{								\
-          /* Emit the UTF-32 Byte Order Mark.  */			\
-	  if (__glibc_unlikely (outbuf + 4 > outend))			      \
-	    return __GCONV_FULL_OUTPUT;					\
-									\
-	  put32u (outbuf, BOM_UTF32);					\
-	  outbuf += 4;							\
-	}								\
-    }
 
 /* Direction of the transformation.  */
 enum direction
@@ -169,16 +149,16 @@ gconv_end (struct __gconv_step *data)
     register unsigned long long outlen __asm__("11") = outend - outptr;	\
     uint64_t cc = 0;							\
 									\
-    __asm__ volatile (".machine push       \n\t"			\
-		      ".machine \"z9-109\" \n\t"			\
-		      "0: " INSTRUCTION "  \n\t"			\
-		      ".machine pop        \n\t"			\
-		      "   jo     0b        \n\t"			\
-		      "   ipm    %2        \n"				\
-		      : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-		      "+d" (outlen), "+d" (inlen)			\
-		      :							\
-		      : "cc", "memory");				\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
 									\
     inptr = pInput;							\
     outptr = pOutput;							\
@@ -187,44 +167,46 @@ gconv_end (struct __gconv_step *data)
     if (cc == 1)							\
       {									\
 	result = __GCONV_FULL_OUTPUT;					\
-	break;								\
       }									\
     else if (cc == 2)							\
       {									\
 	result = __GCONV_ILLEGAL_INPUT;					\
-	break;								\
       }									\
   }
 
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf16_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf16_data *) step->__data)->emit_bom;	\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      if (dir == to_utf16)						\
+	{								\
+	  /* Emit the UTF-16 Byte Order Mark.  */			\
+	  if (__glibc_unlikely (outbuf + 2 > outend))			\
+	    return __GCONV_FULL_OUTPUT;					\
+									\
+	  put16u (outbuf, BOM_UTF16);					\
+	  outbuf += 2;							\
+	}								\
+      else								\
+	{								\
+	  /* Emit the UTF-32 Byte Order Mark.  */			\
+	  if (__glibc_unlikely (outbuf + 4 > outend))			\
+	    return __GCONV_FULL_OUTPUT;					\
+									\
+	  put32u (outbuf, BOM_UTF32);					\
+	  outbuf += 4;							\
+	}								\
+    }
+
 /* Conversion function from UTF-16 to UTF-32 internal/BE.  */
 
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define LOOPFCT			FROM_LOOP
 /* The software routine is copied from utf-16.c (minus bytes
    swapping).  */
-#define BODY								\
+#define BODY_FROM_C							\
   {									\
-    /* The hardware instruction currently fails to report an error for	\
-       isolated low surrogates so we have to disable the instruction	\
-       until this gets resolved.  */					\
-    if (0) /* (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH) */			\
-      {									\
-	HARDWARE_CONVERT ("cu24 %0, %1, 1");				\
-	if (inptr != inend)						\
-	  {								\
-	    /* Check if the third byte is				\
-	       a valid start of a UTF-16 surrogate.  */			\
-	    if (inend - inptr == 3 && (inptr[3] & 0xfc) != 0xdc)	\
-	      STANDARD_FROM_LOOP_ERR_HANDLER (3);			\
-									\
-	    result = __GCONV_INCOMPLETE_INPUT;				\
-	    break;							\
-	  }								\
-	continue;							\
-      }									\
-									\
     uint16_t u1 = get16 (inptr);					\
 									\
     if (__builtin_expect (u1 < 0xd800, 1) || u1 > 0xdfff)		\
@@ -235,15 +217,15 @@ gconv_end (struct __gconv_step *data)
       }									\
     else								\
       {									\
-        /* An isolated low-surrogate was found.  This has to be         \
+	/* An isolated low-surrogate was found.  This has to be         \
 	   considered ill-formed.  */					\
-        if (__glibc_unlikely (u1 >= 0xdc00))				      \
+	if (__glibc_unlikely (u1 >= 0xdc00))				\
 	  {								\
 	    STANDARD_FROM_LOOP_ERR_HANDLER (2);				\
 	  }								\
 	/* It's a surrogate character.  At least the first word says	\
 	   it is.  */							\
-	if (__glibc_unlikely (inptr + 4 > inend))			      \
+	if (__glibc_unlikely (inptr + 4 > inend))			\
 	  {								\
 	    /* We don't have enough input for another complete input	\
 	       character.  */						\
@@ -266,48 +248,200 @@ gconv_end (struct __gconv_step *data)
       }									\
     outptr += 4;							\
   }
-#define LOOP_NEED_FLAGS
-#include <iconv/loop.c>
+
+#define BODY_FROM_VX							\
+  {									\
+    size_t inlen = inend - inptr;					\
+    size_t outlen = outend - outptr;					\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for surrogates.  */			\
+		  "    larl %[R_TMP],9f\n\t"				\
+		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  /* Loop which handles UTF-16 chars <0xd800, >0xdfff.  */ \
+		  "0:  clgijl %[R_INLEN],16,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],32,2f\n\t"			\
+		  "1:  vl %%v16,0(%[R_IN])\n\t"				\
+		  /* Check for surrogate chars.  */			\
+		  "    vstrchs %%v19,%%v16,%%v30,%%v31\n\t"		\
+		  "    jno 10f\n\t"					\
+		  /* Enlarge to UTF-32.  */				\
+		  "    vuplhh %%v17,%%v16\n\t"				\
+		  "    la %[R_IN],16(%[R_IN])\n\t"			\
+		  "    vupllh %%v18,%%v16\n\t"				\
+		  "    aghi %[R_INLEN],-16\n\t"				\
+		  /* Store 32 bytes to buf_out.  */			\
+		  "    vstm %%v17,%%v18,0(%[R_OUT])\n\t"		\
+		  "    aghi %[R_OUTLEN],-32\n\t"			\
+		  "    la %[R_OUT],32(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],16,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],32,2f\n\t"			\
+		  "    j 1b\n\t"					\
+		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff. (v30, v31)  */ \
+		  "9:  .short 0xd800,0xdfff,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		  "    .short 0xa000,0xc000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		  /* At least on uint16_t is in range of surrogates.	\
+		     Store the preceding chars.  */			\
+		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		  "    vuplhh %%v17,%%v16\n\t"				\
+		  "    sllg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "    ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "    jl 12f\n\t"					\
+		  "    vstl %%v17,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "    vupllh %%v18,%%v16\n\t"				\
+		  "    ahi %[R_TMP2],-16\n\t"				\
+		  "    jl 11f\n\t"					\
+		  "    vstl %%v18,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "11: \n\t" /* Update pointers.  */			\
+		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
+		  "12: lghi %[R_TMP2],16\n\t"				\
+		  "    sgr %[R_TMP2],%[R_TMP]\n\t"			\
+		  "    srl %[R_TMP2],1\n\t"				\
+		  "    llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    aghi %[R_OUTLEN],-4\n\t"				\
+		  "    j 16f\n\t"					\
+		  /* Handle remaining bytes.  */			\
+		  "2:  \n\t"						\
+		  /* Zero, one or more bytes available?  */		\
+		  "    clgfi %[R_INLEN],1\n\t"				\
+		  "    je 97f\n\t" /* Only one byte available.  */	\
+		  "    jl 99f\n\t" /* End if no bytes available.  */	\
+		  /* Calculate remaining uint16_t values in inptr.  */	\
+		  "    srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
+		  /* Handle remaining uint16_t values.  */		\
+		  "13: llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    slgfi %[R_OUTLEN],4\n\t"				\
+		  "    jl 96f \n\t"					\
+		  "    clfi %[R_TMP],0xd800\n\t"			\
+		  "    jhe 15f\n\t"					\
+		  "14: st %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],2(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-2\n\t"				\
+		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],13b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Handle UTF-16 surrogate pair.  */			\
+		  "15: clfi %[R_TMP],0xdfff\n\t"			\
+		  "    jh 14b\n\t" /* Jump away if ch > 0xdfff.  */	\
+		  "16: clfi %[R_TMP],0xdc00\n\t"			\
+		  "    jhe 98f\n\t" /* Jump away in case of low-surrogate.  */ \
+		  "    slgfi %[R_INLEN],4\n\t"				\
+		  "    jl 97f\n\t" /* Big enough input?  */		\
+		  "    llh %[R_TMP3],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
+		  "    slfi %[R_TMP],0xd7c0\n\t"			\
+		  "    sll %[R_TMP],10\n\t"				\
+		  "    risbgn %[R_TMP],%[R_TMP3],54,63,0\n\t" /* Insert klmnopqrst.  */ \
+		  "    nilf %[R_TMP3],0xfc00\n\t"			\
+		  "    clfi %[R_TMP3],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
+		  "    jne 98f\n\t"					\
+		  "    st %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],4(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
+		  "    aghi %[R_TMP2],-2\n\t"				\
+		  "    jh 13b\n\t" /* Handle remaining uint16_t values.  */ \
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  "96: \n\t" /* Return full output.  */			\
+		  "    lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
+		  "    j 99f\n\t"					\
+		  "97: \n\t" /* Return incomplete input.  */		\
+		  "    lghi %[R_RES],%[RES_IN_FULL]\n\t"		\
+		  "    j 99f\n\t"					\
+		  "98:\n\t" /* Return Illegal character.  */		\
+		  "    lghi %[R_RES],%[RES_IN_ILL]\n\t"			\
+		  "99:\n\t"						\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (inptr)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    if (__glibc_likely (inptr == inend)					\
+	|| result != __GCONV_ILLEGAL_INPUT)				\
+      break;								\
+									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (2);					\
+  }
+
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#if defined HAVE_S390_VX_ASM_SUPPORT
+# define LOOPFCT		__from_utf16_loop_c
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_C
+# include <iconv/loop.c>
+
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		__from_utf16_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf16_loop_c)
+__attribute__ ((ifunc ("__from_utf16_loop_resolver")))
+__from_utf16_loop;
+
+static void *
+__from_utf16_loop_resolver (unsigned long int dl_hwcap)
+{
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf16_loop_vx;
+  else
+    return __from_utf16_loop_c;
+}
+
+strong_alias (__from_utf16_loop_c_single, __from_utf16_loop_single)
+#else
+# define LOOPFCT		FROM_LOOP
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_C
+# include <iconv/loop.c>
+#endif
 
 /* Conversion from UTF-32 internal/BE to UTF-16.  */
 
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			TO_LOOP
 /* The software routine is copied from utf-16.c (minus bytes
    swapping).  */
-#define BODY								\
+#define BODY_TO_C							\
   {									\
-    if (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH)				\
-      {									\
-	HARDWARE_CONVERT ("cu42 %0, %1");				\
-									\
-	if (inptr != inend)						\
-	  {								\
-	    result = __GCONV_INCOMPLETE_INPUT;				\
-	    break;							\
-	  }								\
-	continue;							\
-      }									\
-									\
     uint32_t c = get32 (inptr);						\
 									\
     if (__builtin_expect (c <= 0xd7ff, 1)				\
 	|| (c >=0xdc00 && c <= 0xffff))					\
       {									\
-        /* Two UTF-16 chars.  */					\
-        put16 (outptr, c);						\
+	/* Two UTF-16 chars.  */					\
+	put16 (outptr, c);						\
       }									\
     else if (__builtin_expect (c >= 0x10000, 1)				\
 	     && __builtin_expect (c <= 0x10ffff, 1))			\
       {									\
 	/* Four UTF-16 chars.  */					\
-        uint16_t zabcd = ((c & 0x1f0000) >> 16) - 1;			\
+	uint16_t zabcd = ((c & 0x1f0000) >> 16) - 1;			\
 	uint16_t out;							\
 									\
 	/* Generate a surrogate character.  */				\
-	if (__glibc_unlikely (outptr + 4 > outend))			      \
+	if (__glibc_unlikely (outptr + 4 > outend))			\
 	  {								\
 	    /* Overflow in the output buffer.  */			\
 	    result = __GCONV_FULL_OUTPUT;				\
@@ -326,12 +460,165 @@ gconv_end (struct __gconv_step *data)
       }									\
     else								\
       {									\
-        STANDARD_TO_LOOP_ERR_HANDLER (4);				\
+	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
       }									\
     outptr += 2;							\
     inptr += 4;								\
   }
+
+#define BODY_TO_ETF3EH							\
+  {									\
+    HARDWARE_CONVERT ("cu42 %0, %1");					\
+									\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+									\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+#define BODY_TO_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for surrogates.  */			\
+		  "    larl %[R_TMP],9f\n\t"				\
+		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  /* Loop which handles UTF-16 chars			\
+		     ch < 0xd800 || (ch > 0xdfff && ch < 0x10000).  */	\
+		  "0:  clgijl %[R_INLEN],32,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "1:  vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
+		  "    lghi %[R_TMP2],0\n\t"				\
+		  /* Shorten to UTF-16.  */				\
+		  "    vpkf %%v18,%%v16,%%v17\n\t"			\
+		  /* Check for surrogate chars.  */			\
+		  "    vstrcfs %%v19,%%v16,%%v30,%%v31\n\t"		\
+		  "    jno 10f\n\t"					\
+		  "    vstrcfs %%v19,%%v17,%%v30,%%v31\n\t"		\
+		  "    jno 11f\n\t"					\
+		  /* Store 16 bytes to buf_out.  */			\
+		  "    vst %%v18,0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],32(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-32\n\t"				\
+		  "    aghi %[R_OUTLEN],-16\n\t"			\
+		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],32,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "    j 1b\n\t"					\
+		  /* Setup to check for ch >= 0xd800 && ch <= 0xdfff	\
+		     and check for ch >= 0x10000. (v30, v31)  */	\
+		  "9:  .long 0xd800,0xdfff,0x10000,0x10000\n\t"		\
+		  "    .long 0xa0000000,0xc0000000, 0xa0000000,0xa0000000\n\t" \
+		  /* At least on UTF32 char is in range of surrogates.	\
+		     Store the preceding characters.  */		\
+		  "11: ahi %[R_TMP2],16\n\t"				\
+		  "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		  "    agr %[R_TMP],%[R_TMP2]\n\t"			\
+		  "    srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "    ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "    jl 20f\n\t"					\
+		  "    vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  /* Update pointers.  */				\
+		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handles UTF16 surrogates with convert instruction.  */ \
+		  "20: cu42 %[R_OUT],%[R_IN]\n\t"			\
+		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
+		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
+		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+									\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf16_loop_c
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_C
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf16_loop_etf3eh
 #define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_ETF3EH
 #include <iconv/loop.c>
 
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf16_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_TO_VX
+# include <iconv/loop.c>
+#endif
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf16_loop_c)
+__attribute__ ((ifunc ("__to_utf16_loop_resolver")))
+__to_utf16_loop;
+
+static void *
+__to_utf16_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf16_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ETF3EH)
+    return __to_utf16_loop_etf3eh;
+  else
+    return __to_utf16_loop_c;
+}
+
+strong_alias (__to_utf16_loop_c_single, __to_utf16_loop_single)
+
+
 #include <iconv/skeleton.c>

http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=5bd11b19099b3f22d821515f9c93f1ecc1a7e15e

commit 5bd11b19099b3f22d821515f9c93f1ecc1a7e15e
Author: Stefan Liebler <stli@linux.vnet.ibm.com>
Date:   Wed May 25 17:18:05 2016 +0200

    S390: Optimize utf8-utf16 module.
    
    This patch reworks the s390 specific module to convert between utf8 and utf16.
    Now ifunc is used to choose either the c or etf3eh (with convert utf instruction)
    variants at runtime. Furthermore a new vector variant for z13 is introduced
    which will be build and chosen if vector support is available at build / runtime.
    
    In case of converting utf 8 to utf16, the vector variant optimizes input of
    1byte utf8 characters. The convert utf instruction is used if a multibyte utf8
    character is found.
    
    For the other direction utf16 to utf8, the cu21 instruction can't be re-enabled,
    because it does not report an error, if the input-stream consists of a single
    low surrogate utf16 char (e.g. 0xdc00). This applies to the newest z13, too.
    Thus there is only the c or the new vector variant, which can handle 1..4 byte
    utf8 characters.
    
    The c variant from utf16 to utf8 has beed fixed. If a high surrogate was at the
    end of the input-buffer, then errno was set to EINVAL and the input-pointer
    pointed just after the high surrogate. Now it points to the beginning of the
    high surrogate.
    
    This patch also fixes some whitespace errors. The c variant from utf8 to utf16
    is now checking that tail-bytes starts with 0b10... and the value is not in
    range of an utf16 surrogate.
    
    Furthermore, the etf3eh variants are handling the "UTF-xx//IGNORE" case now.
    Before they ignored the ignore-case and always stopped at an error.
    
    ChangeLog:
    
    	* sysdeps/s390/s390-64/utf8-utf16-z9.c: Use ifunc to select c,
    	etf3eh or new vector loop-variant.

diff --git a/ChangeLog b/ChangeLog
index 42e37fd..c201dd1 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,10 @@
 2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
 
+	* sysdeps/s390/s390-64/utf8-utf16-z9.c: Use ifunc to select c,
+	etf3eh or new vector loop-variant.
+
+2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
+
 	* sysdeps/s390/s390-64/utf8-utf32-z9.c: Use ifunc to select c, etf3eh
 	or new vector loop-variant.
 
diff --git a/sysdeps/s390/s390-64/utf8-utf16-z9.c b/sysdeps/s390/s390-64/utf8-utf16-z9.c
index 4148ed7..7520ef2 100644
--- a/sysdeps/s390/s390-64/utf8-utf16-z9.c
+++ b/sysdeps/s390/s390-64/utf8-utf16-z9.c
@@ -30,33 +30,27 @@
 #include <dl-procinfo.h>
 #include <gconv.h>
 
-/* UTF-16 big endian byte order mark.  */
-#define BOM_UTF16	0xfeff
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
 
+/* Defines for skeleton.c.  */
 #define DEFINE_INIT		0
 #define DEFINE_FINI		0
 #define MIN_NEEDED_FROM		1
 #define MAX_NEEDED_FROM		4
 #define MIN_NEEDED_TO		2
 #define MAX_NEEDED_TO		4
-#define FROM_LOOP		from_utf8_loop
-#define TO_LOOP			to_utf8_loop
+#define FROM_LOOP		__from_utf8_loop
+#define TO_LOOP			__to_utf8_loop
 #define FROM_DIRECTION		(dir == from_utf8)
 #define ONE_DIRECTION           0
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      /* Emit the UTF-16 Byte Order Mark.  */				\
-      if (__glibc_unlikely (outbuf + 2 > outend))			      \
-	return __GCONV_FULL_OUTPUT;					\
-									\
-      put16u (outbuf, BOM_UTF16);					\
-      outbuf += 2;							\
-    }
+
+
+/* UTF-16 big endian byte order mark.  */
+#define BOM_UTF16	0xfeff
 
 /* Direction of the transformation.  */
 enum direction
@@ -151,16 +145,16 @@ gconv_end (struct __gconv_step *data)
     register unsigned long long outlen __asm__("11") = outend - outptr;	\
     uint64_t cc = 0;							\
 									\
-    __asm__ volatile (".machine push       \n\t"			\
-		      ".machine \"z9-109\" \n\t"			\
-		      "0: " INSTRUCTION "  \n\t"			\
-		      ".machine pop        \n\t"			\
-		      "   jo     0b        \n\t"			\
-		      "   ipm    %2        \n"				\
-		      : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-			"+d" (outlen), "+d" (inlen)			\
-		      :							\
-		      : "cc", "memory");				\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
 									\
     inptr = pInput;							\
     outptr = pOutput;							\
@@ -169,50 +163,135 @@ gconv_end (struct __gconv_step *data)
     if (cc == 1)							\
       {									\
 	result = __GCONV_FULL_OUTPUT;					\
-	break;								\
       }									\
     else if (cc == 2)							\
       {									\
 	result = __GCONV_ILLEGAL_INPUT;					\
-	break;								\
       }									\
   }
 
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      /* Emit the UTF-16 Byte Order Mark.  */				\
+      if (__glibc_unlikely (outbuf + 2 > outend))			\
+	return __GCONV_FULL_OUTPUT;					\
+									\
+      put16u (outbuf, BOM_UTF16);					\
+      outbuf += 2;							\
+    }
+
 /* Conversion function from UTF-8 to UTF-16.  */
+#define BODY_FROM_HW(ASM)						\
+  {									\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+									\
+    int i;								\
+    for (i = 1; inptr + i < inend && i < 5; ++i)			\
+      if ((inptr[i] & 0xc0) != 0x80)					\
+	break;								\
+									\
+    if (__glibc_likely (inptr + i == inend				\
+			&& result == __GCONV_EMPTY_INPUT))		\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
+  }
+
+#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu12 %0, %1, 1"))
+
+#define HW_FROM_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "    vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */ \
+		  "    vrepib %%v31,0x20\n\t"				\
+		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
+		  "0:  clgijl %[R_INLEN],16,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],32,20f\n\t"			\
+		  "1:  vl %%v16,0(%[R_IN])\n\t"				\
+		  "    vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
+		  "    jno 10f\n\t" /* Jump away if not all bytes are 1byte \
+				       UTF8 chars.  */			\
+		  /* Enlarge to UTF-16.  */				\
+		  "    vuplhb %%v18,%%v16\n\t"				\
+		  "    la %[R_IN],16(%[R_IN])\n\t"			\
+		  "    vupllb %%v19,%%v16\n\t"				\
+		  "    aghi %[R_INLEN],-16\n\t"				\
+		  /* Store 32 bytes to buf_out.  */			\
+		  "    vstm %%v18,%%v19,0(%[R_OUT])\n\t"		\
+		  "    aghi %[R_OUTLEN],-32\n\t"			\
+		  "    la %[R_OUT],32(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],16,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],32,20f\n\t"			\
+		  "    j 1b\n\t"					\
+		  "10:\n\t"						\
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "    vlgvb %[R_TMP],%%v17,7\n\t"			\
+		  "    sllk %[R_TMP2],%[R_TMP],1\n\t" /* Compute highest \
+							 index to store. */ \
+		  "    llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
+		  "    ahi %[R_TMP2],-1\n\t"				\
+		  "    jl 20f\n\t"					\
+		  "    vuplhb %%v18,%%v16\n\t"				\
+		  "    vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "    ahi %[R_TMP2],-16\n\t"				\
+		  "    jl 11f\n\t"					\
+		  "    vupllb %%v19,%%v16\n\t"				\
+		  "    vstl %%v19,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "11: \n\t" /* Update pointers.  */			\
+		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu12 %[R_OUT],%[R_IN],1\n\t"			\
+		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
+		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
+		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+  }
+#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
+
 
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
-#define LOOPFCT			FROM_LOOP
 /* The software implementation is based on the code in gconv_simple.c.  */
-#define BODY								\
+#define BODY_FROM_C							\
   {									\
-    if (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH)				\
-      {									\
-	HARDWARE_CONVERT ("cu12 %0, %1, 1");				\
-									\
-	if (inptr != inend)						\
-	  {								\
-	    int i;							\
-	    for (i = 1; inptr + i < inend; ++i)				\
-	      if ((inptr[i] & 0xc0) != 0x80)				\
-		break;							\
-								\
-	    if (__glibc_likely (inptr + i == inend))			      \
-	      {								\
-		result = __GCONV_INCOMPLETE_INPUT;			\
-		break;							\
-	      }								\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
-	  }								\
-	continue;							\
-    }									\
-									\
     /* Next input byte.  */						\
     uint16_t ch = *inptr;						\
 									\
-    if (__glibc_likely (ch < 0x80))					      \
+    if (__glibc_likely (ch < 0x80))					\
       {									\
 	/* One byte sequence.  */					\
 	++inptr;							\
@@ -230,13 +309,13 @@ gconv_end (struct __gconv_step *data)
 	    cnt = 2;							\
 	    ch &= 0x1f;							\
 	  }								\
-        else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
+	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
 	  {								\
 	    /* We expect three bytes.  */				\
 	    cnt = 3;							\
 	    ch &= 0x0f;							\
 	  }								\
-	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
+	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
 	  {								\
 	    /* We expect four bytes.  */				\
 	    cnt = 4;							\
@@ -257,7 +336,7 @@ gconv_end (struct __gconv_step *data)
 	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
 	  }								\
 									\
-	if (__glibc_unlikely (inptr + cnt > inend))			      \
+	if (__glibc_unlikely (inptr + cnt > inend))			\
 	  {								\
 	    /* We don't have enough input.  But before we report	\
 	       that check that all the bytes are correct.  */		\
@@ -265,7 +344,7 @@ gconv_end (struct __gconv_step *data)
 	      if ((inptr[i] & 0xc0) != 0x80)				\
 		break;							\
 									\
-	    if (__glibc_likely (inptr + i == inend))			      \
+	    if (__glibc_likely (inptr + i == inend))			\
 	      {								\
 		result = __GCONV_INCOMPLETE_INPUT;			\
 		break;							\
@@ -280,23 +359,31 @@ gconv_end (struct __gconv_step *data)
 	       low) are needed.  */					\
 	    uint16_t zabcd, high, low;					\
 									\
-	    if (__glibc_unlikely (outptr + 4 > outend))			      \
+	    if (__glibc_unlikely (outptr + 4 > outend))			\
 	      {								\
 		/* Overflow in the output buffer.  */			\
 		result = __GCONV_FULL_OUTPUT;				\
 		break;							\
 	      }								\
 									\
+	    /* Check if tail-bytes >= 0x80, < 0xc0.  */			\
+	    for (i = 1; i < cnt; ++i)					\
+	      {								\
+		if ((inptr[i] & 0xc0) != 0x80)				\
+		  /* This is an illegal encoding.  */			\
+		  goto errout;						\
+	      }								\
+									\
 	    /* See Principles of Operations cu12.  */			\
 	    zabcd = (((inptr[0] & 0x7) << 2) |				\
-                     ((inptr[1] & 0x30) >> 4)) - 1;			\
+		     ((inptr[1] & 0x30) >> 4)) - 1;			\
 									\
 	    /* z-bit must be zero after subtracting 1.  */		\
 	    if (zabcd & 0x10)						\
 	      STANDARD_FROM_LOOP_ERR_HANDLER (4)			\
 									\
 	    high = (uint16_t)(0xd8 << 8);       /* high surrogate id */ \
-	    high |= zabcd << 6;	                        /* abcd bits */	\
+	    high |= zabcd << 6;                         /* abcd bits */	\
 	    high |= (inptr[1] & 0xf) << 2;              /* efgh bits */	\
 	    high |= (inptr[2] & 0x30) >> 4;               /* ij bits */	\
 									\
@@ -326,8 +413,19 @@ gconv_end (struct __gconv_step *data)
 		ch <<= 6;						\
 		ch |= byte & 0x3f;					\
 	      }								\
-	    inptr += cnt;						\
 									\
+	    /* If i < cnt, some trail byte was not >= 0x80, < 0xc0.	\
+	       If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could \
+	       have been represented with fewer than cnt bytes.  */	\
+	    if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)	\
+		/* Do not accept UTF-16 surrogates.  */			\
+		|| (ch >= 0xd800 && ch <= 0xdfff))			\
+	      {								\
+		/* This is an illegal encoding.  */			\
+		goto errout;						\
+	      }								\
+									\
+	    inptr += cnt;						\
 	  }								\
       }									\
     /* Now adjust the pointers and store the result.  */		\
@@ -335,43 +433,70 @@ gconv_end (struct __gconv_step *data)
     outptr += sizeof (uint16_t);					\
   }
 
+/* Generate loop-function with software implementation.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_c
+#define LOOP_NEED_FLAGS
+#define BODY			BODY_FROM_C
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_etf3eh
 #define LOOP_NEED_FLAGS
+#define BODY			BODY_FROM_ETF3EH
 #include <iconv/loop.c>
 
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector and utf-convert instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_TO
+# define LOOPFCT		__from_utf8_loop_vx
+# define LOOP_NEED_FLAGS
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+#endif
+
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf8_loop_c)
+__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
+__from_utf8_loop;
+
+static void *
+__from_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ETF3EH)
+    return __from_utf8_loop_etf3eh;
+  else
+    return __from_utf8_loop_c;
+}
+
+strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
+
 /* Conversion from UTF-16 to UTF-8.  */
 
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MAX_NEEDED_INPUT	MAX_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			TO_LOOP
 /* The software routine is based on the functionality of the S/390
    hardware instruction (cu21) as described in the Principles of
    Operation.  */
-#define BODY								\
+#define BODY_TO_C							\
   {									\
-    /* The hardware instruction currently fails to report an error for	\
-       isolated low surrogates so we have to disable the instruction	\
-       until this gets resolved.  */					\
-    if (0) /* (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH) */			\
-      {									\
-	HARDWARE_CONVERT ("cu21 %0, %1, 1");				\
-	if (inptr != inend)						\
-	  {								\
-	    /* Check if the third byte is				\
-	       a valid start of a UTF-16 surrogate.  */			\
-	    if (inend - inptr == 3 && (inptr[3] & 0xfc) != 0xdc)	\
-	      STANDARD_TO_LOOP_ERR_HANDLER (3);				\
-									\
-	    result = __GCONV_INCOMPLETE_INPUT;				\
-	    break;							\
-	  }								\
-	continue;							\
-      }									\
-									\
     uint16_t c = get16 (inptr);						\
 									\
-    if (__glibc_likely (c <= 0x007f))					      \
+    if (__glibc_likely (c <= 0x007f))					\
       {									\
 	/* Single byte UTF-8 char.  */					\
 	*outptr = c & 0xff;						\
@@ -379,20 +504,20 @@ gconv_end (struct __gconv_step *data)
       }									\
     else if (c >= 0x0080 && c <= 0x07ff)				\
       {									\
-        /* Two byte UTF-8 char.  */					\
+	/* Two byte UTF-8 char.  */					\
 									\
-	if (__glibc_unlikely (outptr + 2 > outend))			      \
+	if (__glibc_unlikely (outptr + 2 > outend))			\
 	  {								\
 	    /* Overflow in the output buffer.  */			\
 	    result = __GCONV_FULL_OUTPUT;				\
 	    break;							\
 	  }								\
 									\
-        outptr[0] = 0xc0;						\
-        outptr[0] |= c >> 6;						\
+	outptr[0] = 0xc0;						\
+	outptr[0] |= c >> 6;						\
 									\
-        outptr[1] = 0x80;						\
-        outptr[1] |= c & 0x3f;						\
+	outptr[1] = 0x80;						\
+	outptr[1] |= c & 0x3f;						\
 									\
 	outptr += 2;							\
       }									\
@@ -400,7 +525,7 @@ gconv_end (struct __gconv_step *data)
       {									\
 	/* Three byte UTF-8 char.  */					\
 									\
-	if (__glibc_unlikely (outptr + 3 > outend))			      \
+	if (__glibc_unlikely (outptr + 3 > outend))			\
 	  {								\
 	    /* Overflow in the output buffer.  */			\
 	    result = __GCONV_FULL_OUTPUT;				\
@@ -419,22 +544,22 @@ gconv_end (struct __gconv_step *data)
       }									\
     else if (c >= 0xd800 && c <= 0xdbff)				\
       {									\
-        /* Four byte UTF-8 char.  */					\
+	/* Four byte UTF-8 char.  */					\
 	uint16_t low, uvwxy;						\
 									\
-	if (__glibc_unlikely (outptr + 4 > outend))			      \
+	if (__glibc_unlikely (outptr + 4 > outend))			\
 	  {								\
 	    /* Overflow in the output buffer.  */			\
 	    result = __GCONV_FULL_OUTPUT;				\
 	    break;							\
 	  }								\
-	inptr += 2;							\
-	if (__glibc_unlikely (inptr + 2 > inend))			      \
+	if (__glibc_unlikely (inptr + 4 > inend))			\
 	  {								\
 	    result = __GCONV_INCOMPLETE_INPUT;				\
 	    break;							\
 	  }								\
 									\
+	inptr += 2;							\
 	low = get16 (inptr);						\
 									\
 	if ((low & 0xfc00) != 0xdc00)					\
@@ -461,11 +586,221 @@ gconv_end (struct __gconv_step *data)
       }									\
     else								\
       {									\
-        STANDARD_TO_LOOP_ERR_HANDLER (2);				\
+	STANDARD_TO_LOOP_ERR_HANDLER (2);				\
       }									\
     inptr += 2;								\
   }
-#define LOOP_NEED_FLAGS
-#include <iconv/loop.c>
+
+#define BODY_TO_VX							\
+  {									\
+    size_t inlen  = inend - inptr;					\
+    size_t outlen  = outend - outptr;					\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  /* Setup to check for values <= 0x7f.  */		\
+		  "    larl %[R_TMP],9f\n\t"				\
+		  "    vlm %%v30,%%v31,0(%[R_TMP])\n\t"			\
+		  /* Loop which handles UTF-16 chars <=0x7f.  */	\
+		  "0:  clgijl %[R_INLEN],32,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
+		  "1:  vlm %%v16,%%v17,0(%[R_IN])\n\t"			\
+		  "    lghi %[R_TMP2],0\n\t"				\
+		  /* Check for > 1byte UTF-8 chars.  */			\
+		  "    vstrchs %%v19,%%v16,%%v30,%%v31\n\t"		\
+		  "    jno 10f\n\t" /* Jump away if not all bytes are 1byte \
+				       UTF8 chars.  */			\
+		  "    vstrchs %%v19,%%v17,%%v30,%%v31\n\t"		\
+		  "    jno 11f\n\t" /* Jump away if not all bytes are 1byte \
+				       UTF8 chars.  */			\
+		  /* Shorten to UTF-8.  */				\
+		  "    vpkh %%v18,%%v16,%%v17\n\t"			\
+		  "    la %[R_IN],32(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-32\n\t"				\
+		  /* Store 16 bytes to buf_out.  */			\
+		  "    vst %%v18,0(%[R_OUT])\n\t"			\
+		  "    aghi %[R_OUTLEN],-16\n\t"			\
+		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],32,2f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,2f\n\t"			\
+		  "    j 1b\n\t"					\
+		  /* Setup to check for ch > 0x7f. (v30, v31)  */	\
+		  "9:  .short 0x7f,0x7f,0x0,0x0,0x0,0x0,0x0,0x0\n\t"	\
+		  "    .short 0x2000,0x2000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "11: lghi %[R_TMP2],16\n\t" /* match was found in v17.  */ \
+		  "10:\n\t"						\
+		  "    vlgvb %[R_TMP],%%v19,7\n\t"			\
+		  /* Shorten to UTF-8.  */				\
+		  "    vpkh %%v18,%%v16,%%v17\n\t"			\
+		  "    ar %[R_TMP],%[R_TMP2]\n\t" /* Number of in bytes.  */ \
+		  "    srlg %[R_TMP3],%[R_TMP],1\n\t" /* Number of out bytes.  */ \
+		  "    ahik %[R_TMP2],%[R_TMP3],-1\n\t" /* Highest index to store.  */ \
+		  "    jl 13f\n\t"					\
+		  "    vstl %%v18,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  /* Update pointers.  */				\
+		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  "13: \n\t"						\
+		  /* Calculate remaining uint16_t values in loaded vrs.  */ \
+		  "    lghi %[R_TMP2],16\n\t"				\
+		  "    slgr %[R_TMP2],%[R_TMP3]\n\t"			\
+		  "    llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-2\n\t"				\
+		  "    j 22f\n\t"					\
+		  /* Handle remaining bytes.  */			\
+		  "2:  \n\t"						\
+		  /* Zero, one or more bytes available?  */		\
+		  "    clgfi %[R_INLEN],1\n\t"				\
+		  "    locghie %[R_RES],%[RES_IN_FULL]\n\t" /* Only one byte.  */ \
+		  "    jle 99f\n\t" /* End if less than two bytes.  */	\
+		  /* Calculate remaining uint16_t values in inptr.  */	\
+		  "    srlg %[R_TMP2],%[R_INLEN],1\n\t"			\
+		  /* Handle multibyte utf8-char. */			\
+		  "20: llh %[R_TMP],0(%[R_IN])\n\t"			\
+		  "    aghi %[R_INLEN],-2\n\t"				\
+		  /* Test if ch is 1-byte UTF-8 char.  */		\
+		  "21: clijh %[R_TMP],0x7f,22f\n\t"			\
+		  /* Handle 1-byte UTF-8 char.  */			\
+		  "31: slgfi %[R_OUTLEN],1\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    stc %[R_TMP],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],2(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],1(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Test if ch is 2-byte UTF-8 char.  */		\
+		  "22: clfi %[R_TMP],0x7ff\n\t"				\
+		  "    jh 23f\n\t"					\
+		  /* Handle 2-byte UTF-8 char.  */			\
+		  "32: slgfi %[R_OUTLEN],2\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    llill %[R_TMP3],0xc080\n\t"			\
+		  "    la %[R_IN],2(%[R_IN])\n\t"			\
+		  "    risbgn %[R_TMP3],%[R_TMP],51,55,2\n\t" /* 1. byte.   */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 2. byte.   */ \
+		  "    sth %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "    la %[R_OUT],2(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Test if ch is 3-byte UTF-8 char.  */		\
+		  "23: clfi %[R_TMP],0xd7ff\n\t"			\
+		  "    jh 24f\n\t"					\
+		  /* Handle 3-byte UTF-8 char.  */			\
+		  "33: slgfi %[R_OUTLEN],3\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    llilf %[R_TMP3],0xe08080\n\t"			\
+		  "    la %[R_IN],2(%[R_IN])\n\t"			\
+		  "    risbgn %[R_TMP3],%[R_TMP],44,47,4\n\t" /* 1. byte.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],50,55,2\n\t" /* 2. byte.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 3. byte.  */ \
+		  "    stcm %[R_TMP3],7,0(%[R_OUT])\n\t"		\
+		  "    la %[R_OUT],3(%[R_OUT])\n\t"			\
+		  "    brctg %[R_TMP2],20b\n\t"				\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Test if ch is 4-byte UTF-8 char.  */		\
+		  "24: clfi %[R_TMP],0xdfff\n\t"			\
+		  "    jh 33b\n\t" /* Handle this 3-byte UTF-8 char.  */ \
+		  "    clfi %[R_TMP],0xdbff\n\t"			\
+		  "    locghih %[R_RES],%[RES_IN_ILL]\n\t"		\
+		  "    jh 99f\n\t" /* Jump away if this is a low surrogate \
+				      without a preceding high surrogate.  */ \
+		  /* Handle 4-byte UTF-8 char.  */			\
+		  "34: slgfi %[R_OUTLEN],4\n\t"				\
+		  "    jl 90f \n\t"					\
+		  "    slgfi %[R_INLEN],2\n\t"				\
+		  "    locghil %[R_RES],%[RES_IN_FULL]\n\t"		\
+		  "    jl 99f\n\t" /* Jump away if low surrogate is missing.  */ \
+		  "    llilf %[R_TMP3],0xf0808080\n\t"			\
+		  "    aghi %[R_TMP],0x40\n\t"				\
+		  "    risbgn %[R_TMP3],%[R_TMP],37,39,16\n\t" /* 1. byte: uvw  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],42,43,14\n\t" /* 2. byte: xy  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],44,47,14\n\t" /* 2. byte: efgh  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],50,51,12\n\t" /* 3. byte: ij */ \
+		  "    llh %[R_TMP],2(%[R_IN])\n\t" /* Load low surrogate.  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],52,55,2\n\t" /* 3. byte: klmn  */ \
+		  "    risbgn %[R_TMP3],%[R_TMP],58,63,0\n\t" /* 4. byte: opqrst  */ \
+		  "    nilf %[R_TMP],0xfc00\n\t"			\
+		  "    clfi %[R_TMP],0xdc00\n\t" /* Check if it starts with 0xdc00.  */ \
+		  "    locghine %[R_RES],%[RES_IN_ILL]\n\t"		\
+		  "    jne 99f\n\t" /* Jump away if low surrogate is invalid.  */ \
+		  "    st %[R_TMP3],0(%[R_OUT])\n\t"			\
+		  "    la %[R_IN],4(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],4(%[R_OUT])\n\t"			\
+		  "    aghi %[R_TMP2],-2\n\t"				\
+		  "    jh 20b\n\t"					\
+		  "    j 0b\n\t" /* Switch to vx-loop.  */		\
+		  /* Exit with __GCONV_FULL_OUTPUT.  */			\
+		  "90: lghi %[R_RES],%[RES_OUT_FULL]\n\t"		\
+		  "99: \n\t"						\
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (inptr)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (outptr)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		    , [RES_IN_FULL] "i" (__GCONV_INCOMPLETE_INPUT)	\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v30") ASM_CLOBBER_VR ("v31")	\
+		  );							\
+    if (__glibc_likely (inptr == inend)					\
+	|| result != __GCONV_ILLEGAL_INPUT)				\
+      break;								\
+									\
+    STANDARD_TO_LOOP_ERR_HANDLER (2);					\
+  }
+
+/* Generate loop-function with software implementation.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MAX_NEEDED_INPUT	MAX_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#if defined HAVE_S390_VX_ASM_SUPPORT
+# define LOOPFCT		__to_utf8_loop_c
+# define BODY                   BODY_TO_C
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+
+/* Generate loop-function with software implementation.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MAX_NEEDED_INPUT	MAX_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf8_loop_vx
+# define BODY                   BODY_TO_VX
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf8_loop_c)
+__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
+__to_utf8_loop;
+
+static void *
+__to_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf8_loop_vx;
+  else
+    return __to_utf8_loop_c;
+}
+
+strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
+
+#else
+# define LOOPFCT		TO_LOOP
+# define BODY                   BODY_TO_C
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+#endif /* !HAVE_S390_VX_ASM_SUPPORT  */
 
 #include <iconv/skeleton.c>

http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=421c5278d83e72740150259960a431706ac343f9

commit 421c5278d83e72740150259960a431706ac343f9
Author: Stefan Liebler <stli@linux.vnet.ibm.com>
Date:   Wed May 25 17:18:05 2016 +0200

    S390: Optimize utf8-utf32 module.
    
    This patch reworks the s390 specific module to convert between utf8 and utf32.
    Now ifunc is used to choose either the c or etf3eh (with convert utf
    instruction) variants at runtime.
    Furthermore a new vector variant for z13 is introduced which will be build
    and chosen if vector support is available at build / runtime.
    The vector variants optimize input of 1byte utf8 characters. The convert utf
    instruction is used if a multibyte utf8 character is found.
    
    This patch also fixes some whitespace errors. The c variants are rejecting
    UTF-16 surrogates and values above 0x10ffff now.
    Furthermore, the etf3eh variants are handling the "UTF-xx//IGNORE" case now.
    Before they ignored the ignore-case and always stopped at an error.
    
    ChangeLog:
    
    	* sysdeps/s390/s390-64/utf8-utf32-z9.c: Use ifunc to select c, etf3eh
    	or new vector loop-variant.

diff --git a/ChangeLog b/ChangeLog
index f303dea..42e37fd 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,10 @@
 2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
 
+	* sysdeps/s390/s390-64/utf8-utf32-z9.c: Use ifunc to select c, etf3eh
+	or new vector loop-variant.
+
+2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
+
 	* sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c (TROO_LOOP):
 	Rename to TR_LOOP and usage of tr instead of troo instruction.
 
diff --git a/sysdeps/s390/s390-64/utf8-utf32-z9.c b/sysdeps/s390/s390-64/utf8-utf32-z9.c
index defd47d..f9c9199 100644
--- a/sysdeps/s390/s390-64/utf8-utf32-z9.c
+++ b/sysdeps/s390/s390-64/utf8-utf32-z9.c
@@ -30,35 +30,25 @@
 #include <dl-procinfo.h>
 #include <gconv.h>
 
-/* UTF-32 big endian byte order mark.  */
-#define BOM	                0x0000feffu
+#if defined HAVE_S390_VX_GCC_SUPPORT
+# define ASM_CLOBBER_VR(NR) , NR
+#else
+# define ASM_CLOBBER_VR(NR)
+#endif
 
+/* Defines for skeleton.c.  */
 #define DEFINE_INIT		0
 #define DEFINE_FINI		0
-/* These definitions apply to the UTF-8 to UTF-32 direction.  The
-   software implementation for UTF-8 still supports multibyte
-   characters up to 6 bytes whereas the hardware variant does not.  */
 #define MIN_NEEDED_FROM		1
 #define MAX_NEEDED_FROM		6
 #define MIN_NEEDED_TO		4
-#define FROM_LOOP		from_utf8_loop
-#define TO_LOOP			to_utf8_loop
+#define FROM_LOOP		__from_utf8_loop
+#define TO_LOOP			__to_utf8_loop
 #define FROM_DIRECTION		(dir == from_utf8)
 #define ONE_DIRECTION           0
-#define PREPARE_LOOP							\
-  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
-  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
-									\
-  if (emit_bom && !data->__internal_use					\
-      && data->__invocation_counter == 0)				\
-    {									\
-      /* Emit the Byte Order Mark.  */					\
-      if (__glibc_unlikely (outbuf + 4 > outend))			      \
-	return __GCONV_FULL_OUTPUT;					\
-									\
-      put32u (outbuf, BOM);						\
-      outbuf += 4;							\
-    }
+
+/* UTF-32 big endian byte order mark.  */
+#define BOM			0x0000feffu
 
 /* Direction of the transformation.  */
 enum direction
@@ -155,16 +145,16 @@ gconv_end (struct __gconv_step *data)
     register unsigned long long outlen __asm__("11") = outend - outptr;	\
     uint64_t cc = 0;							\
 									\
-    __asm__ volatile (".machine push       \n\t"			\
-		      ".machine \"z9-109\" \n\t"			\
-		      "0: " INSTRUCTION "  \n\t"			\
-		      ".machine pop        \n\t"			\
-		      "   jo     0b        \n\t"			\
-		      "   ipm    %2        \n"				\
-		      : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
-		      "+d" (outlen), "+d" (inlen)			\
-		      :							\
-		      : "cc", "memory");				\
+    __asm__ __volatile__ (".machine push       \n\t"			\
+			  ".machine \"z9-109\" \n\t"			\
+			  "0: " INSTRUCTION "  \n\t"			\
+			  ".machine pop        \n\t"			\
+			  "   jo     0b        \n\t"			\
+			  "   ipm    %2        \n"			\
+			  : "+a" (pOutput), "+a" (pInput), "+d" (cc),	\
+			    "+d" (outlen), "+d" (inlen)			\
+			  :						\
+			  : "cc", "memory");				\
 									\
     inptr = pInput;							\
     outptr = pOutput;							\
@@ -173,49 +163,150 @@ gconv_end (struct __gconv_step *data)
     if (cc == 1)							\
       {									\
 	result = __GCONV_FULL_OUTPUT;					\
-	break;								\
       }									\
     else if (cc == 2)							\
       {									\
 	result = __GCONV_ILLEGAL_INPUT;					\
-	break;								\
       }									\
   }
 
+#define PREPARE_LOOP							\
+  enum direction dir = ((struct utf8_data *) step->__data)->dir;	\
+  int emit_bom = ((struct utf8_data *) step->__data)->emit_bom;		\
+									\
+  if (emit_bom && !data->__internal_use					\
+      && data->__invocation_counter == 0)				\
+    {									\
+      /* Emit the Byte Order Mark.  */					\
+      if (__glibc_unlikely (outbuf + 4 > outend))			\
+	return __GCONV_FULL_OUTPUT;					\
+									\
+      put32u (outbuf, BOM);						\
+      outbuf += 4;							\
+    }
+
 /* Conversion function from UTF-8 to UTF-32 internal/BE.  */
 
-#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
-#define LOOPFCT			FROM_LOOP
-/* The software routine is copied from gconv_simple.c.  */
-#define BODY								\
+#define STORE_REST_COMMON						      \
+  {									      \
+    /* We store the remaining bytes while converting them into the UCS4	      \
+       format.  We can assume that the first byte in the buffer is	      \
+       correct and that it requires a larger number of bytes than there	      \
+       are in the input buffer.  */					      \
+    wint_t ch = **inptrp;						      \
+    size_t cnt, r;							      \
+									      \
+    state->__count = inend - *inptrp;					      \
+									      \
+    assert (ch != 0xc0 && ch != 0xc1);					      \
+    if (ch >= 0xc2 && ch < 0xe0)					      \
+      {									      \
+	/* We expect two bytes.  The first byte cannot be 0xc0 or	      \
+	   0xc1, otherwise the wide character could have been		      \
+	   represented using a single byte.  */				      \
+	cnt = 2;							      \
+	ch &= 0x1f;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
+      {									      \
+	/* We expect three bytes.  */					      \
+	cnt = 3;							      \
+	ch &= 0x0f;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
+      {									      \
+	/* We expect four bytes.  */					      \
+	cnt = 4;							      \
+	ch &= 0x07;							      \
+      }									      \
+    else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
+      {									      \
+	/* We expect five bytes.  */					      \
+	cnt = 5;							      \
+	ch &= 0x03;							      \
+      }									      \
+    else								      \
+      {									      \
+	/* We expect six bytes.  */					      \
+	cnt = 6;							      \
+	ch &= 0x01;							      \
+      }									      \
+									      \
+    /* The first byte is already consumed.  */				      \
+    r = cnt - 1;							      \
+    while (++(*inptrp) < inend)						      \
+      {									      \
+	ch <<= 6;							      \
+	ch |= **inptrp & 0x3f;						      \
+	--r;								      \
+      }									      \
+									      \
+    /* Shift for the so far missing bytes.  */				      \
+    ch <<= r * 6;							      \
+									      \
+    /* Store the number of bytes expected for the entire sequence.  */	      \
+    state->__count |= cnt << 8;						      \
+									      \
+    /* Store the value.  */						      \
+    state->__value.__wch = ch;						      \
+  }
+
+#define UNPACK_BYTES_COMMON \
+  {									      \
+    static const unsigned char inmask[5] = { 0xc0, 0xe0, 0xf0, 0xf8, 0xfc };  \
+    wint_t wch = state->__value.__wch;					      \
+    size_t ntotal = state->__count >> 8;				      \
+									      \
+    inlen = state->__count & 255;					      \
+									      \
+    bytebuf[0] = inmask[ntotal - 2];					      \
+									      \
+    do									      \
+      {									      \
+	if (--ntotal < inlen)						      \
+	  bytebuf[ntotal] = 0x80 | (wch & 0x3f);			      \
+	wch >>= 6;							      \
+      }									      \
+    while (ntotal > 1);							      \
+									      \
+    bytebuf[0] |= wch;							      \
+  }
+
+#define CLEAR_STATE_COMMON \
+  state->__count = 0
+
+#define BODY_FROM_HW(ASM)						\
   {									\
-    if (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH)				\
-      {									\
-	HARDWARE_CONVERT ("cu14 %0, %1, 1");				\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
 									\
-	if (inptr != inend)						\
-	  {								\
-	    int i;							\
-	    for (i = 1; inptr + i < inend; ++i)				\
-	      if ((inptr[i] & 0xc0) != 0x80)				\
-		break;							\
+    int i;								\
+    for (i = 1; inptr + i < inend && i < 5; ++i)			\
+      if ((inptr[i] & 0xc0) != 0x80)					\
+	break;								\
 									\
-	    if (__glibc_likely (inptr + i == inend))			      \
-	      {								\
-		result = __GCONV_INCOMPLETE_INPUT;			\
-		break;							\
-	      }								\
-	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
-	  }								\
-	continue;							\
+    if (__glibc_likely (inptr + i == inend				\
+			&& result == __GCONV_EMPTY_INPUT))		\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
       }									\
-									\
+    STANDARD_FROM_LOOP_ERR_HANDLER (i);					\
+  }
+
+/* This hardware routine uses the Convert UTF8 to UTF32 (cu14) instruction.  */
+#define BODY_FROM_ETF3EH BODY_FROM_HW (HARDWARE_CONVERT ("cu14 %0, %1, 1"))
+
+
+/* The software routine is copied from gconv_simple.c.  */
+#define BODY_FROM_C							\
+  {									\
     /* Next input byte.  */						\
     uint32_t ch = *inptr;						\
 									\
-    if (__glibc_likely (ch < 0x80))					      \
+    if (__glibc_likely (ch < 0x80))					\
       {									\
 	/* One byte sequence.  */					\
 	++inptr;							\
@@ -233,30 +324,18 @@ gconv_end (struct __gconv_step *data)
 	    cnt = 2;							\
 	    ch &= 0x1f;							\
 	  }								\
-        else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
+	else if (__glibc_likely ((ch & 0xf0) == 0xe0))			\
 	  {								\
 	    /* We expect three bytes.  */				\
 	    cnt = 3;							\
 	    ch &= 0x0f;							\
 	  }								\
-	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
+	else if (__glibc_likely ((ch & 0xf8) == 0xf0))			\
 	  {								\
 	    /* We expect four bytes.  */				\
 	    cnt = 4;							\
 	    ch &= 0x07;							\
 	  }								\
-	else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
-	  {								\
-	    /* We expect five bytes.  */				\
-	    cnt = 5;							\
-	    ch &= 0x03;							\
-	  }								\
-	else if (__glibc_likely ((ch & 0xfe) == 0xfc))			      \
-	  {								\
-	    /* We expect six bytes.  */					\
-	    cnt = 6;							\
-	    ch &= 0x01;							\
-	  }								\
 	else								\
 	  {								\
 	    /* Search the end of this ill-formed UTF-8 character.  This	\
@@ -272,7 +351,7 @@ gconv_end (struct __gconv_step *data)
 	    STANDARD_FROM_LOOP_ERR_HANDLER (i);				\
 	  }								\
 									\
-	if (__glibc_unlikely (inptr + cnt > inend))			      \
+	if (__glibc_unlikely (inptr + cnt > inend))			\
 	  {								\
 	    /* We don't have enough input.  But before we report	\
 	       that check that all the bytes are correct.  */		\
@@ -280,7 +359,7 @@ gconv_end (struct __gconv_step *data)
 	      if ((inptr[i] & 0xc0) != 0x80)				\
 		break;							\
 									\
-	    if (__glibc_likely (inptr + i == inend))			      \
+	    if (__glibc_likely (inptr + i == inend))			\
 	      {								\
 		result = __GCONV_INCOMPLETE_INPUT;			\
 		break;							\
@@ -305,7 +384,10 @@ gconv_end (struct __gconv_step *data)
 	/* If i < cnt, some trail byte was not >= 0x80, < 0xc0.		\
 	   If cnt > 2 and ch < 2^(5*cnt-4), the wide character ch could	\
 	   have been represented with fewer than cnt bytes.  */		\
-	if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0))		\
+	if (i < cnt || (cnt > 2 && (ch >> (5 * cnt - 4)) == 0)		\
+	    /* Do not accept UTF-16 surrogates.  */			\
+	    || (ch >= 0xd800 && ch <= 0xdfff)				\
+	    || (ch > 0x10ffff))						\
 	  {								\
 	    /* This is an illegal encoding.  */				\
 	    goto errout;						\
@@ -318,137 +400,212 @@ gconv_end (struct __gconv_step *data)
     *((uint32_t *) outptr) = ch;					\
     outptr += sizeof (uint32_t);					\
   }
-#define LOOP_NEED_FLAGS
 
-#define STORE_REST							\
-  {									      \
-    /* We store the remaining bytes while converting them into the UCS4	      \
-       format.  We can assume that the first byte in the buffer is	      \
-       correct and that it requires a larger number of bytes than there	      \
-       are in the input buffer.  */					      \
-    wint_t ch = **inptrp;						      \
-    size_t cnt, r;							      \
-									      \
-    state->__count = inend - *inptrp;					      \
-									      \
-    if (ch >= 0xc2 && ch < 0xe0)					      \
-      {									      \
-	/* We expect two bytes.  The first byte cannot be 0xc0 or	      \
-	   0xc1, otherwise the wide character could have been		      \
-	   represented using a single byte.  */				      \
-	cnt = 2;							      \
-	ch &= 0x1f;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xf0) == 0xe0))			      \
-      {									      \
-	/* We expect three bytes.  */					      \
-	cnt = 3;							      \
-	ch &= 0x0f;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xf8) == 0xf0))			      \
-      {									      \
-	/* We expect four bytes.  */					      \
-	cnt = 4;							      \
-	ch &= 0x07;							      \
-      }									      \
-    else if (__glibc_likely ((ch & 0xfc) == 0xf8))			      \
-      {									      \
-	/* We expect five bytes.  */					      \
-	cnt = 5;							      \
-	ch &= 0x03;							      \
-      }									      \
-    else								      \
-      {									      \
-	/* We expect six bytes.  */					      \
-	cnt = 6;							      \
-	ch &= 0x01;							      \
-      }									      \
-									      \
-    /* The first byte is already consumed.  */				      \
-    r = cnt - 1;							      \
-    while (++(*inptrp) < inend)						      \
-      {									      \
-	ch <<= 6;							      \
-	ch |= **inptrp & 0x3f;						      \
-	--r;								      \
-      }									      \
-									      \
-    /* Shift for the so far missing bytes.  */				      \
-    ch <<= r * 6;							      \
-									      \
-    /* Store the number of bytes expected for the entire sequence.  */	      \
-    state->__count |= cnt << 8;						      \
-									      \
-    /* Store the value.  */						      \
-    state->__value.__wch = ch;						      \
+#define HW_FROM_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2, tmp3;					\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "    vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */ \
+		  "    vrepib %%v31,0x20\n\t"				\
+		  /* Loop which handles UTF-8 chars <=0x7f.  */		\
+		  "0:  clgijl %[R_INLEN],16,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],64,20f\n\t"			\
+		  "1: vl %%v16,0(%[R_IN])\n\t"				\
+		  "    vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
+		  "    jno 10f\n\t" /* Jump away if not all bytes are 1byte \
+				   UTF8 chars.  */			\
+		  /* Enlarge to UCS4.  */				\
+		  "    vuplhb %%v18,%%v16\n\t"				\
+		  "    vupllb %%v19,%%v16\n\t"				\
+		  "    la %[R_IN],16(%[R_IN])\n\t"			\
+		  "    vuplhh %%v20,%%v18\n\t"				\
+		  "    aghi %[R_INLEN],-16\n\t"				\
+		  "    vupllh %%v21,%%v18\n\t"				\
+		  "    aghi %[R_OUTLEN],-64\n\t"			\
+		  "    vuplhh %%v22,%%v19\n\t"				\
+		  "    vupllh %%v23,%%v19\n\t"				\
+		  /* Store 64 bytes to buf_out.  */			\
+		  "    vstm %%v20,%%v23,0(%[R_OUT])\n\t"		\
+		  "    la %[R_OUT],64(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],16,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],64,20f\n\t"			\
+		  "    j 1b\n\t"					\
+		  "10: \n\t"						\
+		  /* At least one byte is > 0x7f.			\
+		     Store the preceding 1-byte chars.  */		\
+		  "    vlgvb %[R_TMP],%%v17,7\n\t"			\
+		  "    sllk %[R_TMP2],%[R_TMP],2\n\t" /* Compute highest \
+						     index to store. */ \
+		  "    llgfr %[R_TMP3],%[R_TMP2]\n\t"			\
+		  "    ahi %[R_TMP2],-1\n\t"				\
+		  "    jl 20f\n\t"					\
+		  "    vuplhb %%v18,%%v16\n\t"				\
+		  "    vuplhh %%v20,%%v18\n\t"				\
+		  "    vstl %%v20,%[R_TMP2],0(%[R_OUT])\n\t"		\
+		  "    ahi %[R_TMP2],-16\n\t"				\
+		  "    jl 11f\n\t"					\
+		  "    vupllh %%v21,%%v18\n\t"				\
+		  "    vstl %%v21,%[R_TMP2],16(%[R_OUT])\n\t"		\
+		  "    ahi %[R_TMP2],-16\n\t"				\
+		  "    jl 11f\n\t"					\
+		  "    vupllb %%v19,%%v16\n\t"				\
+		  "    vuplhh %%v22,%%v19\n\t"				\
+		  "    vstl %%v22,%[R_TMP2],32(%[R_OUT])\n\t"		\
+		  "    ahi %[R_TMP2],-16\n\t"				\
+		  "    jl 11f\n\t"					\
+		  "    vupllh %%v23,%%v19\n\t"				\
+		  "    vstl %%v23,%[R_TMP2],48(%[R_OUT])\n\t"		\
+		  "11: \n\t"						\
+		  /* Update pointers.  */				\
+		  "    la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_TMP]\n\t"			\
+		  "    la %[R_OUT],0(%[R_TMP3],%[R_OUT])\n\t"		\
+		  "    slgr %[R_OUTLEN],%[R_TMP3]\n\t"			\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu14 %[R_OUT],%[R_IN],1\n\t"			\
+		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
+		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
+		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=a" (tmp)	\
+		    , [R_TMP2] "=d" (tmp2), [R_TMP3] "=a" (tmp3)	\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30")	\
+		    ASM_CLOBBER_VR ("v31")				\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
   }
+#define BODY_FROM_VX BODY_FROM_HW (HW_FROM_VX)
 
-#define UNPACK_BYTES \
-  {									      \
-    static const unsigned char inmask[5] = { 0xc0, 0xe0, 0xf0, 0xf8, 0xfc };  \
-    wint_t wch = state->__value.__wch;					      \
-    size_t ntotal = state->__count >> 8;				      \
-									      \
-    inlen = state->__count & 255;					      \
-									      \
-    bytebuf[0] = inmask[ntotal - 2];					      \
-									      \
-    do									      \
-      {									      \
-	if (--ntotal < inlen)						      \
-	  bytebuf[ntotal] = 0x80 | (wch & 0x3f);			      \
-	wch >>= 6;							      \
-      }									      \
-    while (ntotal > 1);							      \
-									      \
-    bytebuf[0] |= wch;							      \
-  }
+/* These definitions apply to the UTF-8 to UTF-32 direction.  The
+   software implementation for UTF-8 still supports multibyte
+   characters up to 6 bytes whereas the hardware variant does not.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_c
 
-#define CLEAR_STATE \
-  state->__count = 0
+#define LOOP_NEED_FLAGS
 
+#define STORE_REST		STORE_REST_COMMON
+#define UNPACK_BYTES		UNPACK_BYTES_COMMON
+#define CLEAR_STATE		CLEAR_STATE_COMMON
+#define BODY			BODY_FROM_C
 #include <iconv/loop.c>
 
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			__from_utf8_loop_etf3eh
+
+#define LOOP_NEED_FLAGS
+
+#define STORE_REST		STORE_REST_COMMON
+#define UNPACK_BYTES		UNPACK_BYTES_COMMON
+#define CLEAR_STATE		CLEAR_STATE_COMMON
+#define BODY			BODY_FROM_ETF3EH
+#include <iconv/loop.c>
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_INPUT	MAX_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		__from_utf8_loop_vx
+
+# define LOOP_NEED_FLAGS
+
+# define STORE_REST		STORE_REST_COMMON
+# define UNPACK_BYTES		UNPACK_BYTES_COMMON
+# define CLEAR_STATE		CLEAR_STATE_COMMON
+# define BODY			BODY_FROM_VX
+# include <iconv/loop.c>
+#endif
+
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_utf8_loop_c)
+__attribute__ ((ifunc ("__from_utf8_loop_resolver")))
+__from_utf8_loop;
+
+static void *
+__from_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __from_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ETF3EH)
+    return __from_utf8_loop_etf3eh;
+  else
+    return __from_utf8_loop_c;
+}
+
+strong_alias (__from_utf8_loop_c_single, __from_utf8_loop_single)
+
+
 /* Conversion from UTF-32 internal/BE to UTF-8.  */
+#define BODY_TO_HW(ASM)							\
+  {									\
+    ASM;								\
+    if (__glibc_likely (inptr == inend)					\
+	|| result == __GCONV_FULL_OUTPUT)				\
+      break;								\
+    if (inptr + 4 > inend)						\
+      {									\
+	result = __GCONV_INCOMPLETE_INPUT;				\
+	break;								\
+      }									\
+    STANDARD_TO_LOOP_ERR_HANDLER (4);					\
+  }
+
+/* The hardware routine uses the S/390 cu41 instruction.  */
+#define BODY_TO_ETF3EH BODY_TO_HW (HARDWARE_CONVERT ("cu41 %0, %1"))
+
+/* The hardware routine uses the S/390 vector and cu41 instructions.  */
+#define BODY_TO_VX BODY_TO_HW (HW_TO_VX)
 
-#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
-#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
-#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
-#define LOOPFCT			TO_LOOP
 /* The software routine mimics the S/390 cu41 instruction.  */
-#define BODY							\
+#define BODY_TO_C						\
   {								\
-    if (GLRO (dl_hwcap) & HWCAP_S390_ETF3EH)			\
-      {								\
-	HARDWARE_CONVERT ("cu41 %0, %1");			\
-								\
-	if (inptr != inend)					\
-	  {							\
-	    result = __GCONV_INCOMPLETE_INPUT;			\
-	    break;						\
-	  }							\
-	continue;						\
-      }								\
-								\
     uint32_t wc = *((const uint32_t *) inptr);			\
 								\
-    if (__glibc_likely (wc <= 0x7f))					      \
+    if (__glibc_likely (wc <= 0x7f))				\
       {								\
-        /* Single UTF-8 char.  */				\
-        *outptr = (uint8_t)wc;					\
+	/* Single UTF-8 char.  */				\
+	*outptr = (uint8_t)wc;					\
 	outptr++;						\
       }								\
     else if (wc <= 0x7ff)					\
       {								\
-        /* Two UTF-8 chars.  */					\
-        if (__glibc_unlikely (outptr + 2 > outend))			      \
+	/* Two UTF-8 chars.  */					\
+	if (__glibc_unlikely (outptr + 2 > outend))		\
 	  {							\
 	    /* Overflow in the output buffer.  */		\
 	    result = __GCONV_FULL_OUTPUT;			\
 	    break;						\
 	  }							\
 								\
-        outptr[0] = 0xc0;					\
+	outptr[0] = 0xc0;					\
 	outptr[0] |= wc >> 6;					\
 								\
 	outptr[1] = 0x80;					\
@@ -459,12 +616,18 @@ gconv_end (struct __gconv_step *data)
     else if (wc <= 0xffff)					\
       {								\
 	/* Three UTF-8 chars.  */				\
-	if (__glibc_unlikely (outptr + 3 > outend))			      \
+	if (__glibc_unlikely (outptr + 3 > outend))		\
 	  {							\
 	    /* Overflow in the output buffer.  */		\
 	    result = __GCONV_FULL_OUTPUT;			\
 	    break;						\
 	  }							\
+	if (wc >= 0xd800 && wc < 0xdc00)			\
+	  {							\
+	    /* Do not accept UTF-16 surrogates.   */		\
+	    result = __GCONV_ILLEGAL_INPUT;			\
+	    STANDARD_TO_LOOP_ERR_HANDLER (4);			\
+	  }							\
 	outptr[0] = 0xe0;					\
 	outptr[0] |= wc >> 12;					\
 								\
@@ -479,7 +642,7 @@ gconv_end (struct __gconv_step *data)
       else if (wc <= 0x10ffff)					\
 	{							\
 	  /* Four UTF-8 chars.  */				\
-	  if (__glibc_unlikely (outptr + 4 > outend))			      \
+	  if (__glibc_unlikely (outptr + 4 > outend))		\
 	    {							\
 	      /* Overflow in the output buffer.  */		\
 	      result = __GCONV_FULL_OUTPUT;			\
@@ -505,7 +668,140 @@ gconv_end (struct __gconv_step *data)
 	}							\
     inptr += 4;							\
   }
+
+#define HW_TO_VX							\
+  {									\
+    register const unsigned char* pInput asm ("8") = inptr;		\
+    register size_t inlen asm ("9") = inend - inptr;			\
+    register unsigned char* pOutput asm ("10") = outptr;		\
+    register size_t outlen asm("11") = outend - outptr;			\
+    unsigned long tmp, tmp2;						\
+    asm volatile (".machine push\n\t"					\
+		  ".machine \"z13\"\n\t"				\
+		  ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		  "    vleif %%v20,127,0\n\t"   /* element 0: 127  */	\
+		  "    vzero %%v21\n\t"					\
+		  "    vleih %%v21,8192,0\n\t"  /* element 0:   >  */	\
+		  "    vleih %%v21,-8192,2\n\t" /* element 1: =<>  */	\
+		  /* Loop which handles UTF-32 chars <=0x7f.  */	\
+		  "0:  clgijl %[R_INLEN],64,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "1:  vlm %%v16,%%v19,0(%[R_IN])\n\t"			\
+		  "    lghi %[R_TMP],0\n\t"				\
+		  /* Shorten to byte values.  */			\
+		  "    vpkf %%v23,%%v16,%%v17\n\t"			\
+		  "    vpkf %%v24,%%v18,%%v19\n\t"			\
+		  "    vpkh %%v23,%%v23,%%v24\n\t"			\
+		  /* Checking for values > 0x7f.  */			\
+		  "    vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
+		  "    jno 10f\n\t"					\
+		  "    vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
+		  "    jno 11f\n\t"					\
+		  "    vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"		\
+		  "    jno 12f\n\t"					\
+		  "    vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"		\
+		  "    jno 13f\n\t"					\
+		  /* Store 16bytes to outptr.  */			\
+		  "    vst %%v23,0(%[R_OUT])\n\t"			\
+		  "    aghi %[R_INLEN],-64\n\t"				\
+		  "    aghi %[R_OUTLEN],-16\n\t"			\
+		  "    la %[R_IN],64(%[R_IN])\n\t"			\
+		  "    la %[R_OUT],16(%[R_OUT])\n\t"			\
+		  "    clgijl %[R_INLEN],64,20f\n\t"			\
+		  "    clgijl %[R_OUTLEN],16,20f\n\t"			\
+		  "    j 1b\n\t"					\
+		  /* Found a value > 0x7f.  */				\
+		  "13: ahi %[R_TMP],4\n\t"				\
+		  "12: ahi %[R_TMP],4\n\t"				\
+		  "11: ahi %[R_TMP],4\n\t"				\
+		  "10: vlgvb %[R_I],%%v22,7\n\t"			\
+		  "    srlg %[R_I],%[R_I],2\n\t"			\
+		  "    agr %[R_I],%[R_TMP]\n\t"				\
+		  "    je 20f\n\t"					\
+		  /* Store characters before invalid one...  */		\
+		  "    slgr %[R_OUTLEN],%[R_I]\n\t"			\
+		  "15: aghi %[R_I],-1\n\t"				\
+		  "    vstl %%v23,%[R_I],0(%[R_OUT])\n\t"		\
+		  /* ... and update pointers.  */			\
+		  "    aghi %[R_I],1\n\t"				\
+		  "    la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"		\
+		  "    sllg %[R_I],%[R_I],2\n\t"			\
+		  "    la %[R_IN],0(%[R_I],%[R_IN])\n\t"		\
+		  "    slgr %[R_INLEN],%[R_I]\n\t"			\
+		  /* Handle multibyte utf8-char with convert instruction. */ \
+		  "20: cu41 %[R_OUT],%[R_IN]\n\t"			\
+		  "    jo 0b\n\t" /* Try vector implemenation again.  */ \
+		  "    lochil %[R_RES],%[RES_OUT_FULL]\n\t" /* cc == 1.  */ \
+		  "    lochih %[R_RES],%[RES_IN_ILL]\n\t" /* cc == 2.  */ \
+		  ".machine pop"					\
+		  : /* outputs */ [R_IN] "+a" (pInput)			\
+		    , [R_INLEN] "+d" (inlen), [R_OUT] "+a" (pOutput)	\
+		    , [R_OUTLEN] "+d" (outlen), [R_TMP] "=d" (tmp)	\
+		    , [R_I] "=a" (tmp2)					\
+		    , [R_RES] "+d" (result)				\
+		  : /* inputs */					\
+		    [RES_OUT_FULL] "i" (__GCONV_FULL_OUTPUT)		\
+		    , [RES_IN_ILL] "i" (__GCONV_ILLEGAL_INPUT)		\
+		  : /* clobber list */ "memory", "cc"			\
+		    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+		    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+		    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+		    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
+		    ASM_CLOBBER_VR ("v24")				\
+		  );							\
+    inptr = pInput;							\
+    outptr = pOutput;							\
+  }
+
+/* Generate loop-function with software routing.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf8_loop_c
+#define BODY			BODY_TO_C
+#define LOOP_NEED_FLAGS
+#include <iconv/loop.c>
+
+/* Generate loop-function with hardware utf-convert instruction.  */
+#define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+#define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+#define LOOPFCT			__to_utf8_loop_etf3eh
 #define LOOP_NEED_FLAGS
+#define BODY			BODY_TO_ETF3EH
 #include <iconv/loop.c>
 
+#if defined HAVE_S390_VX_ASM_SUPPORT
+/* Generate loop-function with hardware vector and utf-convert instructions.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define MAX_NEEDED_OUTPUT	MAX_NEEDED_FROM
+# define LOOPFCT		__to_utf8_loop_vx
+# define BODY			BODY_TO_VX
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+#endif
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__to_utf8_loop_c)
+__attribute__ ((ifunc ("__to_utf8_loop_resolver")))
+__to_utf8_loop;
+
+static void *
+__to_utf8_loop_resolver (unsigned long int dl_hwcap)
+{
+#if defined HAVE_S390_VX_ASM_SUPPORT
+  if (dl_hwcap & HWCAP_S390_VX)
+    return __to_utf8_loop_vx;
+  else
+#endif
+  if (dl_hwcap & HWCAP_S390_ETF3EH)
+    return __to_utf8_loop_etf3eh;
+  else
+    return __to_utf8_loop_c;
+}
+
+strong_alias (__to_utf8_loop_c_single, __to_utf8_loop_single)
+
+
 #include <iconv/skeleton.c>

http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=81c6380887c6d62c56e5f0f85a241f759f58b2fd

commit 81c6380887c6d62c56e5f0f85a241f759f58b2fd
Author: Stefan Liebler <stli@linux.vnet.ibm.com>
Date:   Wed May 25 17:18:05 2016 +0200

    S390: Optimize iso-8859-1 to ibm037 iconv-module.
    
    This patch reworks the s390 specific module which used the z900
    translate one to one instruction. Now the g5 translate instruction is used,
    because it outperforms the troo instruction.
    
    ChangeLog:
    
    	* sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c (TROO_LOOP):
    	Rename to TR_LOOP and usage of tr instead of troo instruction.

diff --git a/ChangeLog b/ChangeLog
index 285f4fb..f303dea 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,10 @@
 2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
 
+	* sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c (TROO_LOOP):
+	Rename to TR_LOOP and usage of tr instead of troo instruction.
+
+2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
+
 	* sysdeps/s390/multiarch/gconv_simple.c: New File.
 	* sysdeps/s390/multiarch/Makefile (sysdep_routines): Add gconv_simple.
 
diff --git a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c b/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
index c59f87f..3b63e6a 100644
--- a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
+++ b/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
@@ -1,7 +1,6 @@
 /* Conversion between ISO 8859-1 and IBM037.
 
-   This module uses the Z900 variant of the Translate One To One
-   instruction.
+   This module uses the translate instruction.
    Copyright (C) 1997-2016 Free Software Foundation, Inc.
 
    Author: Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
@@ -176,50 +175,70 @@ __attribute__ ((aligned (8))) =
 #define MIN_NEEDED_FROM		1
 #define MIN_NEEDED_TO		1
 
-/* The Z900 variant of troo forces us to always specify a test
-   character which ends the translation.  So if we run into the
-   situation where the translation has been interrupted due to the
-   test character we translate the character by hand and jump back
-   into the instruction.  */
-
-#define TROO_LOOP(TABLE)						\
+#define TR_LOOP(TABLE)							\
   {									\
-    register const unsigned char test __asm__ ("0") = 0;		\
-    register const unsigned char *pTable __asm__ ("1") = TABLE;		\
-    register unsigned char *pOutput __asm__ ("2") = outptr;		\
-    register uint64_t length __asm__ ("3");				\
-    const unsigned char* pInput = inptr;				\
-    uint64_t tmp;							\
-									\
-    length = (inend - inptr < outend - outptr				\
-	      ? inend - inptr : outend - outptr);			\
+    size_t length = (inend - inptr < outend - outptr			\
+		     ? inend - inptr : outend - outptr);		\
 									\
-    __asm__ volatile ("0:                        \n\t"			\
-		      "  troo    %0,%1           \n\t"			\
-		      "  jz      1f              \n\t"			\
-		      "  jo      0b              \n\t"			\
-		      "  llgc    %3,0(%1)        \n\t"			\
-		      "  la      %3,0(%3,%4)     \n\t"			\
-		      "  mvc     0(1,%0),0(%3)   \n\t"			\
-		      "  aghi    %1,1            \n\t"			\
-		      "  aghi    %0,1            \n\t"			\
-		      "  aghi    %2,-1           \n\t"			\
-		      "  j       0b              \n\t"			\
-		      "1:                        \n"			\
+    /* Process in 256 byte blocks.  */					\
+    if (__builtin_expect (length >= 256, 0))				\
+      {									\
+	size_t blocks = length / 256;					\
+	__asm__ __volatile__("0: mvc 0(256,%[R_OUT]),0(%[R_IN])\n\t"	\
+			     "   tr 0(256,%[R_OUT]),0(%[R_TBL])\n\t"	\
+			     "   la %[R_IN],256(%[R_IN])\n\t"		\
+			     "   la %[R_OUT],256(%[R_OUT])\n\t"		\
+			     "   brctg %[R_LI],0b\n\t"			\
+			     : /* outputs */ [R_IN] "+a" (inptr)	\
+			       , [R_OUT] "+a" (outptr), [R_LI] "+d" (blocks) \
+			     : /* inputs */ [R_TBL] "a" (TABLE)		\
+			     : /* clobber list */ "memory"		\
+			     );						\
+	length = length % 256;						\
+      }									\
 									\
-     : "+a" (pOutput), "+a" (pInput), "+d" (length), "=&a" (tmp)        \
-     : "a" (pTable), "d" (test)						\
-     : "cc");								\
+    /* Process remaining 0...248 bytes in 8byte blocks.  */		\
+    if (length >= 8)							\
+      {									\
+	size_t blocks = length / 8;					\
+	for (int i = 0; i < blocks; i++)				\
+	  {								\
+	    outptr[0] = TABLE[inptr[0]];				\
+	    outptr[1] = TABLE[inptr[1]];				\
+	    outptr[2] = TABLE[inptr[2]];				\
+	    outptr[3] = TABLE[inptr[3]];				\
+	    outptr[4] = TABLE[inptr[4]];				\
+	    outptr[5] = TABLE[inptr[5]];				\
+	    outptr[6] = TABLE[inptr[6]];				\
+	    outptr[7] = TABLE[inptr[7]];				\
+	    inptr += 8;							\
+	    outptr += 8;						\
+	  }								\
+	length = length % 8;						\
+      }									\
 									\
-    inptr = pInput;							\
-    outptr = pOutput;							\
+    /* Process remaining 0...7 bytes.  */				\
+    switch (length)							\
+      {									\
+      case 7: outptr[6] = TABLE[inptr[6]];				\
+      case 6: outptr[5] = TABLE[inptr[5]];				\
+      case 5: outptr[4] = TABLE[inptr[4]];				\
+      case 4: outptr[3] = TABLE[inptr[3]];				\
+      case 3: outptr[2] = TABLE[inptr[2]];				\
+      case 2: outptr[1] = TABLE[inptr[1]];				\
+      case 1: outptr[0] = TABLE[inptr[0]];				\
+      case 0: break;							\
+      }									\
+    inptr += length;							\
+    outptr += length;							\
   }
 
+
 /* First define the conversion function from ISO 8859-1 to CP037.  */
 #define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
 #define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
 #define LOOPFCT			FROM_LOOP
-#define BODY TROO_LOOP (table_iso8859_1_to_cp037)
+#define BODY			TR_LOOP (table_iso8859_1_to_cp037)
 
 #include <iconv/loop.c>
 
@@ -228,7 +247,7 @@ __attribute__ ((aligned (8))) =
 #define MIN_NEEDED_INPUT	MIN_NEEDED_TO
 #define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
 #define LOOPFCT			TO_LOOP
-#define BODY TROO_LOOP (table_cp037_iso8859_1);
+#define BODY			TR_LOOP (table_cp037_iso8859_1);
 
 #include <iconv/loop.c>
 

http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=3b704e26b33e35d99de920f8462d8e438f89be39

commit 3b704e26b33e35d99de920f8462d8e438f89be39
Author: Stefan Liebler <stli@linux.vnet.ibm.com>
Date:   Wed May 25 17:18:04 2016 +0200

    S390: Optimize builtin iconv-modules.
    
    This patch introduces a s390 specific gconv_simple.c file which provides
    optimized versions for z13 with vector instructions, which will be chosen at
    runtime via ifunc.
    The optimized conversions can convert between internal and ascii, ucs4, ucs4le,
    ucs2, ucs2le.
    If the build-environment lacks vector support, then iconv/gconv_simple.c
    is used wihtout any change. Otherwise iconvdata/gconv_simple.c is used to create
    conversion loop routines without vector instructions as fallback, if vector
    instructions aren't available at runtime.
    
    ChangeLog:
    
    	* sysdeps/s390/multiarch/gconv_simple.c: New File.
    	* sysdeps/s390/multiarch/Makefile (sysdep_routines): Add gconv_simple.

diff --git a/ChangeLog b/ChangeLog
index 42f1b9d..285f4fb 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,10 @@
 2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
 
+	* sysdeps/s390/multiarch/gconv_simple.c: New File.
+	* sysdeps/s390/multiarch/Makefile (sysdep_routines): Add gconv_simple.
+
+2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
+
 	* sysdeps/s390/multiarch/8bit-generic.c: New File.
 	* sysdeps/s390/multiarch/gen-8bit.sh: New File.
 	* sysdeps/s390/multiarch/Makefile (generate-8bit-table):
diff --git a/sysdeps/s390/multiarch/Makefile b/sysdeps/s390/multiarch/Makefile
index 6073bbb..c893ebc 100644
--- a/sysdeps/s390/multiarch/Makefile
+++ b/sysdeps/s390/multiarch/Makefile
@@ -53,3 +53,7 @@ $(move-if-change) $(@:stmp=T) $(@:stmp=h)
 touch $@
 endef
 endif
+
+ifeq ($(subdir),iconv)
+sysdep_routines += gconv_simple
+endif
diff --git a/sysdeps/s390/multiarch/gconv_simple.c b/sysdeps/s390/multiarch/gconv_simple.c
new file mode 100644
index 0000000..dc53a48
--- /dev/null
+++ b/sysdeps/s390/multiarch/gconv_simple.c
@@ -0,0 +1,1266 @@
+/* Simple transformations functions - s390 version.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+# include <ifunc-resolve.h>
+
+# if defined HAVE_S390_VX_GCC_SUPPORT
+#  define ASM_CLOBBER_VR(NR) , NR
+# else
+#  define ASM_CLOBBER_VR(NR)
+# endif
+
+# define ICONV_C_NAME(NAME) __##NAME##_c
+# define ICONV_VX_NAME(NAME) __##NAME##_vx
+# define ICONV_VX_IFUNC(FUNC)						\
+  extern __typeof (ICONV_C_NAME (FUNC)) __##FUNC;			\
+  s390_vx_libc_ifunc (__##FUNC)						\
+  int FUNC (struct __gconv_step *step, struct __gconv_step_data *data,	\
+	    const unsigned char **inptrp, const unsigned char *inend,	\
+	    unsigned char **outbufstart, size_t *irreversible,		\
+	    int do_flush, int consume_incomplete)			\
+  {									\
+    return __##FUNC (step, data, inptrp, inend,outbufstart,		\
+		     irreversible, do_flush, consume_incomplete);	\
+  }
+# define ICONV_VX_SINGLE(NAME)						\
+  static __typeof (NAME##_single) __##NAME##_vx_single __attribute__((alias(#NAME "_single")));
+
+/* Generate the transformations which are used, if the target machine does not
+   support vector instructions.  */
+# define __gconv_transform_ascii_internal		\
+  ICONV_C_NAME (__gconv_transform_ascii_internal)
+# define __gconv_transform_internal_ascii		\
+  ICONV_C_NAME (__gconv_transform_internal_ascii)
+# define __gconv_transform_internal_ucs4le		\
+  ICONV_C_NAME (__gconv_transform_internal_ucs4le)
+# define __gconv_transform_ucs4_internal		\
+  ICONV_C_NAME (__gconv_transform_ucs4_internal)
+# define __gconv_transform_ucs4le_internal		\
+  ICONV_C_NAME (__gconv_transform_ucs4le_internal)
+# define __gconv_transform_ucs2_internal		\
+  ICONV_C_NAME (__gconv_transform_ucs2_internal)
+# define __gconv_transform_ucs2reverse_internal		\
+  ICONV_C_NAME (__gconv_transform_ucs2reverse_internal)
+# define __gconv_transform_internal_ucs2		\
+  ICONV_C_NAME (__gconv_transform_internal_ucs2)
+# define __gconv_transform_internal_ucs2reverse		\
+  ICONV_C_NAME (__gconv_transform_internal_ucs2reverse)
+
+
+# include <iconv/gconv_simple.c>
+
+# undef __gconv_transform_ascii_internal
+# undef __gconv_transform_internal_ascii
+# undef __gconv_transform_internal_ucs4le
+# undef __gconv_transform_ucs4_internal
+# undef __gconv_transform_ucs4le_internal
+# undef __gconv_transform_ucs2_internal
+# undef __gconv_transform_ucs2reverse_internal
+# undef __gconv_transform_internal_ucs2
+# undef __gconv_transform_internal_ucs2reverse
+
+/* Now define the functions with vector support.  */
+# if defined __s390x__
+#  define CONVERT_32BIT_SIZE_T(REG)
+# else
+#  define CONVERT_32BIT_SIZE_T(REG) "llgfr %" #REG ",%" #REG "\n\t"
+# endif
+
+/* Convert from ISO 646-IRV to the internal (UCS4-like) format.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	1
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (ascii_internal_loop)
+# define TO_LOOP		ICONV_VX_NAME (ascii_internal_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ascii_internal)
+# define ONE_DIRECTION		1
+
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		FROM_LOOP
+# define BODY_ORIG_ERROR						\
+    /* The value is too large.  We don't try transliteration here since \
+       this is not an error because of the lack of possibilities to	\
+       represent the result.  This is a genuine bug in the input since	\
+       ASCII does not allow such values.  */				\
+    STANDARD_FROM_LOOP_ERR_HANDLER (1);
+
+# define BODY_ORIG							\
+  {									\
+    if (__glibc_unlikely (*inptr > '\x7f'))				\
+      {									\
+	BODY_ORIG_ERROR							\
+      }									\
+    else								\
+      {									\
+	/* It's an one byte sequence.  */				\
+	*((uint32_t *) outptr) = *inptr++;				\
+	outptr += sizeof (uint32_t);					\
+      }									\
+  }
+# define BODY								\
+  {									\
+    size_t len = inend - inptr;						\
+    if (len > (outend - outptr) / 4)					\
+      len = (outend - outptr) / 4;					\
+    size_t loop_count, tmp;						\
+    __asm__ volatile (".machine push\n\t"				\
+		      ".machine \"z13\"\n\t"				\
+		      ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		      CONVERT_32BIT_SIZE_T ([R_LEN])			\
+		      "    vrepib %%v30,0x7f\n\t" /* For compare > 0x7f.  */ \
+		      "    srlg %[R_LI],%[R_LEN],4\n\t"			\
+		      "    vrepib %%v31,0x20\n\t"			\
+		      "    clgije %[R_LI],0,1f\n\t"			\
+		      "0:  \n\t" /* Handle 16-byte blocks.  */		\
+		      "    vl %%v16,0(%[R_IN])\n\t"			\
+		      /* Checking for values > 0x7f.  */		\
+		      "    vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
+		      "    jno 10f\n\t"					\
+		      /* Enlarge to UCS4.  */				\
+		      "    vuplhb %%v17,%%v16\n\t"			\
+		      "    vupllb %%v18,%%v16\n\t"			\
+		      "    vuplhh %%v19,%%v17\n\t"			\
+		      "    vupllh %%v20,%%v17\n\t"			\
+		      "    vuplhh %%v21,%%v18\n\t"			\
+		      "    vupllh %%v22,%%v18\n\t"			\
+		      /* Store 64bytes to buf_out.  */			\
+		      "    vstm %%v19,%%v22,0(%[R_OUT])\n\t"		\
+		      "    la %[R_IN],16(%[R_IN])\n\t"			\
+		      "    la %[R_OUT],64(%[R_OUT])\n\t"		\
+		      "    brctg %[R_LI],0b\n\t"			\
+		      "    lghi %[R_LI],15\n\t"				\
+		      "    ngr %[R_LEN],%[R_LI]\n\t"			\
+		      "    je 20f\n\t" /* Jump away if no remaining bytes.  */ \
+		      /* Handle remaining bytes.  */			\
+		      "1: aghik %[R_LI],%[R_LEN],-1\n\t"		\
+		      "    jl 20f\n\t" /* Jump away if no remaining bytes.  */ \
+		      "    vll %%v16,%[R_LI],0(%[R_IN])\n\t"		\
+		      /* Checking for values > 0x7f.  */		\
+		      "    vstrcbs %%v17,%%v16,%%v30,%%v31\n\t"		\
+		      "    vlgvb %[R_TMP],%%v17,7\n\t"			\
+		      "    clr %[R_TMP],%[R_LI]\n\t"			\
+		      "    locrh %[R_TMP],%[R_LEN]\n\t"			\
+		      "    locghih %[R_LEN],0\n\t"			\
+		      "    j 12f\n\t"					\
+		      "10:\n\t"						\
+		      /* Found a value > 0x7f.				\
+			 Store the preceding chars.  */			\
+		      "    vlgvb %[R_TMP],%%v17,7\n\t"			\
+		      "12: la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		      "    sllk %[R_TMP],%[R_TMP],2\n\t"		\
+		      "    ahi %[R_TMP],-1\n\t"				\
+		      "    jl 20f\n\t"					\
+		      "    lgr %[R_LI],%[R_TMP]\n\t"			\
+		      "    vuplhb %%v17,%%v16\n\t"			\
+		      "    vuplhh %%v19,%%v17\n\t"			\
+		      "    vstl %%v19,%[R_LI],0(%[R_OUT])\n\t"		\
+		      "    ahi %[R_LI],-16\n\t"				\
+		      "    jl 11f\n\t"					\
+		      "    vupllh %%v20,%%v17\n\t"			\
+		      "    vstl %%v20,%[R_LI],16(%[R_OUT])\n\t"		\
+		      "    ahi %[R_LI],-16\n\t"				\
+		      "    jl 11f\n\t"					\
+		      "    vupllb %%v18,%%v16\n\t"			\
+		      "    vuplhh %%v21,%%v18\n\t"			\
+		      "    vstl %%v21,%[R_LI],32(%[R_OUT])\n\t"		\
+		      "    ahi %[R_LI],-16\n\t"				\
+		      "    jl 11f\n\t"					\
+		      "    vupllh %%v22,%%v18\n\t"			\
+		      "    vstl %%v22,%[R_LI],48(%[R_OUT])\n\t"		\
+		      "11:\n\t"						\
+		      "    la %[R_OUT],1(%[R_TMP],%[R_OUT])\n\t"	\
+		      "20:\n\t"						\
+		      ".machine pop"					\
+		      : /* outputs */ [R_OUT] "+a" (outptr)		\
+			, [R_IN] "+a" (inptr)				\
+			, [R_LEN] "+d" (len)				\
+			, [R_LI] "=d" (loop_count)			\
+			, [R_TMP] "=a" (tmp)				\
+		      : /* inputs */					\
+		      : /* clobber list*/ "memory", "cc"		\
+			ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+			ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+			ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+			ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30")	\
+			ASM_CLOBBER_VR ("v31")				\
+		      );						\
+    if (len > 0)							\
+      {									\
+	/* Found an invalid character at the next input byte.  */	\
+	BODY_ORIG_ERROR							\
+      }									\
+  }
+
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+# include <iconv/skeleton.c>
+# undef BODY_ORIG
+# undef BODY_ORIG_ERROR
+ICONV_VX_IFUNC (__gconv_transform_ascii_internal)
+
+/* Convert from the internal (UCS4-like) format to ISO 646-IRV.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	4
+# define MIN_NEEDED_TO		1
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (internal_ascii_loop)
+# define TO_LOOP		ICONV_VX_NAME (internal_ascii_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_internal_ascii)
+# define ONE_DIRECTION		1
+
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		FROM_LOOP
+# define BODY_ORIG_ERROR						\
+  UNICODE_TAG_HANDLER (*((const uint32_t *) inptr), 4);			\
+  STANDARD_TO_LOOP_ERR_HANDLER (4);
+
+# define BODY_ORIG							\
+  {									\
+    if (__glibc_unlikely (*((const uint32_t *) inptr) > 0x7f))		\
+      {									\
+	BODY_ORIG_ERROR							\
+      }									\
+    else								\
+      {									\
+	/* It's an one byte sequence.  */				\
+	*outptr++ = *((const uint32_t *) inptr);			\
+	inptr += sizeof (uint32_t);					\
+      }									\
+  }
+# define BODY								\
+  {									\
+    size_t len = (inend - inptr) / 4;					\
+    if (len > outend - outptr)						\
+      len = outend - outptr;						\
+    size_t loop_count, tmp, tmp2;					\
+    __asm__ volatile (".machine push\n\t"				\
+		      ".machine \"z13\"\n\t"				\
+		      ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		      CONVERT_32BIT_SIZE_T ([R_LEN])			\
+		      /* Setup to check for ch > 0x7f.  */		\
+		      "    vzero %%v21\n\t"				\
+		      "    srlg %[R_LI],%[R_LEN],4\n\t"			\
+		      "    vleih %%v21,8192,0\n\t"  /* element 0:   >  */ \
+		      "    vleih %%v21,-8192,2\n\t" /* element 1: =<>  */ \
+		      "    vleif %%v20,127,0\n\t"   /* element 0: 127  */ \
+		      "    lghi %[R_TMP],0\n\t"				\
+		      "    clgije %[R_LI],0,1f\n\t"			\
+		      "0:\n\t"						\
+		      "    vlm %%v16,%%v19,0(%[R_IN])\n\t"		\
+		      /* Shorten to byte values.  */			\
+		      "    vpkf %%v23,%%v16,%%v17\n\t"			\
+		      "    vpkf %%v24,%%v18,%%v19\n\t"			\
+		      "    vpkh %%v23,%%v23,%%v24\n\t"			\
+		      /* Checking for values > 0x7f.  */		\
+		      "    vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
+		      "    jno 10f\n\t"					\
+		      "    vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
+		      "    jno 11f\n\t"					\
+		      "    vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"		\
+		      "    jno 12f\n\t"					\
+		      "    vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"		\
+		      "    jno 13f\n\t"					\
+		      /* Store 16bytes to outptr.  */			\
+		      "    vst %%v23,0(%[R_OUT])\n\t"			\
+		      "    la %[R_IN],64(%[R_IN])\n\t"			\
+		      "    la %[R_OUT],16(%[R_OUT])\n\t"		\
+		      "    brctg %[R_LI],0b\n\t"			\
+		      "    lghi %[R_LI],15\n\t"				\
+		      "    ngr %[R_LEN],%[R_LI]\n\t"			\
+		      "    je 20f\n\t" /* Jump away if no remaining bytes.  */ \
+		      /* Handle remaining bytes.  */			\
+		      "1: sllg %[R_LI],%[R_LEN],2\n\t"			\
+		      "    aghi %[R_LI],-1\n\t"				\
+		      "    jl 20f\n\t" /* Jump away if no remaining bytes.  */ \
+		      /* Load remaining 1...63 bytes.  */		\
+		      "    vll %%v16,%[R_LI],0(%[R_IN])\n\t"		\
+		      "    ahi %[R_LI],-16\n\t"				\
+		      "    jl 2f\n\t"					\
+		      "    vll %%v17,%[R_LI],16(%[R_IN])\n\t"		\
+		      "    ahi %[R_LI],-16\n\t"				\
+		      "    jl 2f\n\t"					\
+		      "    vll %%v18,%[R_LI],32(%[R_IN])\n\t"		\
+		      "    ahi %[R_LI],-16\n\t"				\
+		      "    jl 2f\n\t"					\
+		      "    vll %%v19,%[R_LI],48(%[R_IN])\n\t"		\
+		      "2:\n\t"						\
+		      /* Shorten to byte values.  */			\
+		      "    vpkf %%v23,%%v16,%%v17\n\t"			\
+		      "    vpkf %%v24,%%v18,%%v19\n\t"			\
+		      "    vpkh %%v23,%%v23,%%v24\n\t"			\
+		      "    sllg %[R_LI],%[R_LEN],2\n\t"			\
+		      "    aghi %[R_LI],-16\n\t"			\
+		      "    jl 3f\n\t" /* v16 is not fully loaded.  */	\
+		      "    vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
+		      "    jno 10f\n\t"					\
+		      "    aghi %[R_LI],-16\n\t"			\
+		      "    jl 4f\n\t" /* v17 is not fully loaded.  */	\
+		      "    vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
+		      "    jno 11f\n\t"					\
+		      "    aghi %[R_LI],-16\n\t"			\
+		      "    jl 5f\n\t" /* v18 is not fully loaded.  */	\
+		      "    vstrcfs %%v22,%%v18,%%v20,%%v21\n\t"		\
+		      "    jno 12f\n\t"					\
+		      "    aghi %[R_LI],-16\n\t"			\
+		      /* v19 is not fully loaded. */			\
+		      "    lghi %[R_TMP],12\n\t"			\
+		      "    vstrcfs %%v22,%%v19,%%v20,%%v21\n\t"		\
+		      "6: vlgvb %[R_I],%%v22,7\n\t"			\
+		      "    aghi %[R_LI],16\n\t"				\
+		      "    clrjl %[R_I],%[R_LI],14f\n\t"		\
+		      "    lgr %[R_I],%[R_LEN]\n\t"			\
+		      "    lghi %[R_LEN],0\n\t"				\
+		      "    j 15f\n\t"					\
+		      "3: vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"		\
+		      "    j 6b\n\t"					\
+		      "4: vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
+		      "    lghi %[R_TMP],4\n\t"				\
+		      "    j 6b\n\t"					\
+		      "5: vstrcfs %%v22,%%v17,%%v20,%%v21\n\t"		\
+		      "    lghi %[R_TMP],8\n\t"				\
+		      "    j 6b\n\t"					\
+		      /* Found a value > 0x7f.  */			\
+		      "13: ahi %[R_TMP],4\n\t"				\
+		      "12: ahi %[R_TMP],4\n\t"				\
+		      "11: ahi %[R_TMP],4\n\t"				\
+		      "10: vlgvb %[R_I],%%v22,7\n\t"			\
+		      "14: srlg %[R_I],%[R_I],2\n\t"			\
+		      "    agr %[R_I],%[R_TMP]\n\t"			\
+		      "    je 20f\n\t"					\
+		      /* Store characters before invalid one...  */	\
+		      "15: aghi %[R_I],-1\n\t"				\
+		      "    vstl %%v23,%[R_I],0(%[R_OUT])\n\t"		\
+		      /* ... and update pointers.  */			\
+		      "    la %[R_OUT],1(%[R_I],%[R_OUT])\n\t"		\
+		      "    sllg %[R_I],%[R_I],2\n\t"			\
+		      "    la %[R_IN],4(%[R_I],%[R_IN])\n\t"		\
+		      "20:\n\t"						\
+		      ".machine pop"					\
+		      : /* outputs */ [R_OUT] "+a" (outptr)		\
+			, [R_IN] "+a" (inptr)				\
+			, [R_LEN] "+d" (len)				\
+			, [R_LI] "=d" (loop_count)			\
+			, [R_I] "=a" (tmp2)				\
+			, [R_TMP] "=d" (tmp)				\
+		      : /* inputs */					\
+		      : /* clobber list*/ "memory", "cc"		\
+			ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+			ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+			ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+			ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23")	\
+			ASM_CLOBBER_VR ("v24")				\
+		      );						\
+    if (len > 0)							\
+      {									\
+	/* Found an invalid character > 0x7f at next character.  */	\
+	BODY_ORIG_ERROR							\
+      }									\
+  }
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+# include <iconv/skeleton.c>
+# undef BODY_ORIG
+# undef BODY_ORIG_ERROR
+ICONV_VX_IFUNC (__gconv_transform_internal_ascii)
+
+
+/* Convert from internal UCS4 to UCS4 little endian form.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	4
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (internal_ucs4le_loop)
+# define TO_LOOP		ICONV_VX_NAME (internal_ucs4le_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_internal_ucs4le)
+# define ONE_DIRECTION		0
+
+static inline int
+__attribute ((always_inline))
+ICONV_VX_NAME (internal_ucs4le_loop) (struct __gconv_step *step,
+				      struct __gconv_step_data *step_data,
+				      const unsigned char **inptrp,
+				      const unsigned char *inend,
+				      unsigned char **outptrp,
+				      unsigned char *outend,
+				      size_t *irreversible)
+{
+  const unsigned char *inptr = *inptrp;
+  unsigned char *outptr = *outptrp;
+  int result;
+  size_t len = MIN (inend - inptr, outend - outptr) / 4;
+  size_t loop_count;
+  __asm__ volatile (".machine push\n\t"
+		    ".machine \"z13\"\n\t"
+		    ".machinemode \"zarch_nohighgprs\"\n\t"
+		    CONVERT_32BIT_SIZE_T ([R_LEN])
+		    "    bras %[R_LI],1f\n\t"
+		    /* Vector permute mask:  */
+		    "    .long 0x03020100,0x7060504,0x0B0A0908,0x0F0E0D0C\n\t"
+		    "1:  vl %%v20,0(%[R_LI])\n\t"
+		    /* Process 64byte (16char) blocks.  */
+		    "    srlg %[R_LI],%[R_LEN],4\n\t"
+		    "    clgije %[R_LI],0,10f\n\t"
+		    "0:  vlm %%v16,%%v19,0(%[R_IN])\n\t"
+		    "    vperm %%v16,%%v16,%%v16,%%v20\n\t"
+		    "    vperm %%v17,%%v17,%%v17,%%v20\n\t"
+		    "    vperm %%v18,%%v18,%%v18,%%v20\n\t"
+		    "    vperm %%v19,%%v19,%%v19,%%v20\n\t"
+		    "    vstm %%v16,%%v19,0(%[R_OUT])\n\t"
+		    "    la %[R_IN],64(%[R_IN])\n\t"
+		    "    la %[R_OUT],64(%[R_OUT])\n\t"
+		    "    brctg %[R_LI],0b\n\t"
+		    "    llgfr %[R_LEN],%[R_LEN]\n\t"
+		    "    nilf %[R_LEN],15\n\t"
+		    /* Process 16byte (4char) blocks.  */
+		    "10: srlg %[R_LI],%[R_LEN],2\n\t"
+		    "    clgije %[R_LI],0,20f\n\t"
+		    "11: vl %%v16,0(%[R_IN])\n\t"
+		    "    vperm %%v16,%%v16,%%v16,%%v20\n\t"
+		    "    vst %%v16,0(%[R_OUT])\n\t"
+		    "    la %[R_IN],16(%[R_IN])\n\t"
+		    "    la %[R_OUT],16(%[R_OUT])\n\t"
+		    "    brctg %[R_LI],11b\n\t"
+		    "    nill %[R_LEN],3\n\t"
+		    /* Process <16bytes.  */
+		    "20: sll %[R_LEN],2\n\t"
+		    "    ahi %[R_LEN],-1\n\t"
+		    "    jl 30f\n\t"
+		    "    vll %%v16,%[R_LEN],0(%[R_IN])\n\t"
+		    "    vperm %%v16,%%v16,%%v16,%%v20\n\t"
+		    "    vstl %%v16,%[R_LEN],0(%[R_OUT])\n\t"
+		    "    la %[R_IN],1(%[R_LEN],%[R_IN])\n\t"
+		    "    la %[R_OUT],1(%[R_LEN],%[R_OUT])\n\t"
+		    "30: \n\t"
+		    ".machine pop"
+		    : /* outputs */ [R_OUT] "+a" (outptr)
+		      , [R_IN] "+a" (inptr)
+		      , [R_LI] "=a" (loop_count)
+		      , [R_LEN] "+a" (len)
+		    : /* inputs */
+		    : /* clobber list*/ "memory", "cc"
+		      ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")
+		      ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")
+		      ASM_CLOBBER_VR ("v20")
+		    );
+  *inptrp = inptr;
+  *outptrp = outptr;
+
+  /* Determine the status.  */
+  if (*inptrp == inend)
+    result = __GCONV_EMPTY_INPUT;
+  else if (*outptrp + 4 > outend)
+    result = __GCONV_FULL_OUTPUT;
+  else
+    result = __GCONV_INCOMPLETE_INPUT;
+
+  return result;
+}
+
+ICONV_VX_SINGLE (internal_ucs4le_loop)
+# include <iconv/skeleton.c>
+ICONV_VX_IFUNC (__gconv_transform_internal_ucs4le)
+
+
+/* Transform from UCS4 to the internal, UCS4-like format.  Unlike
+   for the other direction we have to check for correct values here.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	4
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (ucs4_internal_loop)
+# define TO_LOOP		ICONV_VX_NAME (ucs4_internal_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ucs4_internal)
+# define ONE_DIRECTION		0
+
+
+static inline int
+__attribute ((always_inline))
+ICONV_VX_NAME (ucs4_internal_loop) (struct __gconv_step *step,
+				    struct __gconv_step_data *step_data,
+				    const unsigned char **inptrp,
+				    const unsigned char *inend,
+				    unsigned char **outptrp,
+				    unsigned char *outend,
+				    size_t *irreversible)
+{
+  int flags = step_data->__flags;
+  const unsigned char *inptr = *inptrp;
+  unsigned char *outptr = *outptrp;
+  int result;
+  size_t len, loop_count;
+  do
+    {
+      len = MIN (inend - inptr, outend - outptr) / 4;
+      __asm__ volatile (".machine push\n\t"
+			".machine \"z13\"\n\t"
+			".machinemode \"zarch_nohighgprs\"\n\t"
+			CONVERT_32BIT_SIZE_T ([R_LEN])
+			/* Setup to check for ch > 0x7fffffff.  */
+			"    larl %[R_LI],9f\n\t"
+			"    vlm %%v20,%%v21,0(%[R_LI])\n\t"
+			"    srlg %[R_LI],%[R_LEN],2\n\t"
+			"    clgije %[R_LI],0,1f\n\t"
+			/* Process 16byte (4char) blocks.  */
+			"0:  vl %%v16,0(%[R_IN])\n\t"
+			"    vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"
+			"    jno 10f\n\t"
+			"    vst %%v16,0(%[R_OUT])\n\t"
+			"    la %[R_IN],16(%[R_IN])\n\t"
+			"    la %[R_OUT],16(%[R_OUT])\n\t"
+			"    brctg %[R_LI],0b\n\t"
+			"    llgfr %[R_LEN],%[R_LEN]\n\t"
+			"    nilf %[R_LEN],3\n\t"
+			/* Process <16bytes.  */
+			"1:  sll %[R_LEN],2\n\t"
+			"    ahik %[R_LI],%[R_LEN],-1\n\t"
+			"    jl 20f\n\t" /* No further bytes available.  */
+			"    vll %%v16,%[R_LI],0(%[R_IN])\n\t"
+			"    vstrcfs %%v22,%%v16,%%v20,%%v21\n\t"
+			"    vlgvb %[R_LI],%%v22,7\n\t"
+			"    clr %[R_LI],%[R_LEN]\n\t"
+			"    locgrhe %[R_LI],%[R_LEN]\n\t"
+			"    locghihe %[R_LEN],0\n\t"
+			"    j 11f\n\t"
+			/* v20: Vector string range compare values.  */
+			"9:  .long 0x7fffffff,0x0,0x0,0x0\n\t"
+			/* v21: Vector string range compare control-bits.
+			   element 0: >; element 1: =<> (always true)  */
+			"    .long 0x20000000,0xE0000000,0x0,0x0\n\t"
+			/* Found a value > 0x7fffffff.  */
+			"10: vlgvb %[R_LI],%%v22,7\n\t"
+			/* Store characters before invalid one.  */
+			"11: aghi %[R_LI],-1\n\t"
+			"    jl 20f\n\t"
+			"    vstl %%v16,%[R_LI],0(%[R_OUT])\n\t"
+			"    la %[R_IN],1(%[R_LI],%[R_IN])\n\t"
+			"    la %[R_OUT],1(%[R_LI],%[R_OUT])\n\t"
+			"20:\n\t"
+			".machine pop"
+			: /* outputs */ [R_OUT] "+a" (outptr)
+			  , [R_IN] "+a" (inptr)
+			  , [R_LI] "=a" (loop_count)
+			  , [R_LEN] "+d" (len)
+			: /* inputs */
+			: /* clobber list*/ "memory", "cc"
+			  ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v20")
+			  ASM_CLOBBER_VR ("v21") ASM_CLOBBER_VR ("v22")
+			);
+      if (len > 0)
+	{
+	  /* The value is too large.  We don't try transliteration here since
+	     this is not an error because of the lack of possibilities to
+	     represent the result.  This is a genuine bug in the input since
+	     UCS4 does not allow such values.  */
+	  if (irreversible == NULL)
+	    /* We are transliterating, don't try to correct anything.  */
+	    return __GCONV_ILLEGAL_INPUT;
+
+	  if (flags & __GCONV_IGNORE_ERRORS)
+	    {
+	      /* Just ignore this character.  */
+	      ++*irreversible;
+	      inptr += 4;
+	      continue;
+	    }
+
+	  *inptrp = inptr;
+	  *outptrp = outptr;
+	  return __GCONV_ILLEGAL_INPUT;
+	}
+    }
+  while (len > 0);
+
+  *inptrp = inptr;
+  *outptrp = outptr;
+
+  /* Determine the status.  */
+  if (*inptrp == inend)
+    result = __GCONV_EMPTY_INPUT;
+  else if (*outptrp + 4 > outend)
+    result = __GCONV_FULL_OUTPUT;
+  else
+    result = __GCONV_INCOMPLETE_INPUT;
+
+  return result;
+}
+
+ICONV_VX_SINGLE (ucs4_internal_loop)
+# include <iconv/skeleton.c>
+ICONV_VX_IFUNC (__gconv_transform_ucs4_internal)
+
+
+/* Transform from UCS4-LE to the internal encoding.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	4
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (ucs4le_internal_loop)
+# define TO_LOOP		ICONV_VX_NAME (ucs4le_internal_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ucs4le_internal)
+# define ONE_DIRECTION		0
+
+static inline int
+__attribute ((always_inline))
+ICONV_VX_NAME (ucs4le_internal_loop) (struct __gconv_step *step,
+				      struct __gconv_step_data *step_data,
+				      const unsigned char **inptrp,
+				      const unsigned char *inend,
+				      unsigned char **outptrp,
+				      unsigned char *outend,
+				      size_t *irreversible)
+{
+  int flags = step_data->__flags;
+  const unsigned char *inptr = *inptrp;
+  unsigned char *outptr = *outptrp;
+  int result;
+  size_t len, loop_count;
+  do
+    {
+      len = MIN (inend - inptr, outend - outptr) / 4;
+      __asm__ volatile (".machine push\n\t"
+			".machine \"z13\"\n\t"
+			".machinemode \"zarch_nohighgprs\"\n\t"
+			CONVERT_32BIT_SIZE_T ([R_LEN])
+			/* Setup to check for ch > 0x7fffffff.  */
+			"    larl %[R_LI],9f\n\t"
+			"    vlm %%v20,%%v22,0(%[R_LI])\n\t"
+			"    srlg %[R_LI],%[R_LEN],2\n\t"
+			"    clgije %[R_LI],0,1f\n\t"
+			/* Process 16byte (4char) blocks.  */
+			"0:  vl %%v16,0(%[R_IN])\n\t"
+			"    vperm %%v16,%%v16,%%v16,%%v22\n\t"
+			"    vstrcfs %%v23,%%v16,%%v20,%%v21\n\t"
+			"    jno 10f\n\t"
+			"    vst %%v16,0(%[R_OUT])\n\t"
+			"    la %[R_IN],16(%[R_IN])\n\t"
+			"    la %[R_OUT],16(%[R_OUT])\n\t"
+			"    brctg %[R_LI],0b\n\t"
+			"    llgfr %[R_LEN],%[R_LEN]\n\t"
+			"    nilf %[R_LEN],3\n\t"
+			/* Process <16bytes.  */
+			"1:  sll %[R_LEN],2\n\t"
+			"    ahik %[R_LI],%[R_LEN],-1\n\t"
+			"    jl 20f\n\t" /* No further bytes available.  */
+			"    vll %%v16,%[R_LI],0(%[R_IN])\n\t"
+			"    vperm %%v16,%%v16,%%v16,%%v22\n\t"
+			"    vstrcfs %%v23,%%v16,%%v20,%%v21\n\t"
+			"    vlgvb %[R_LI],%%v23,7\n\t"
+			"    clr %[R_LI],%[R_LEN]\n\t"
+			"    locgrhe %[R_LI],%[R_LEN]\n\t"
+			"    locghihe %[R_LEN],0\n\t"
+			"    j 11f\n\t"
+			/* v20: Vector string range compare values.  */
+			"9: .long 0x7fffffff,0x0,0x0,0x0\n\t"
+			/* v21: Vector string range compare control-bits.
+			   element 0: >; element 1: =<> (always true)  */
+			"    .long 0x20000000,0xE0000000,0x0,0x0\n\t"
+			/* v22: Vector permute mask.  */
+			"    .long 0x03020100,0x7060504,0x0B0A0908,0x0F0E0D0C\n\t"
+			/* Found a value > 0x7fffffff.  */
+			"10: vlgvb %[R_LI],%%v23,7\n\t"
+			/* Store characters before invalid one.  */
+			"11: aghi %[R_LI],-1\n\t"
+			"    jl 20f\n\t"
+			"    vstl %%v16,%[R_LI],0(%[R_OUT])\n\t"
+			"    la %[R_IN],1(%[R_LI],%[R_IN])\n\t"
+			"    la %[R_OUT],1(%[R_LI],%[R_OUT])\n\t"
+			"20:\n\t"
+			".machine pop"
+			: /* outputs */ [R_OUT] "+a" (outptr)
+			  , [R_IN] "+a" (inptr)
+			  , [R_LI] "=a" (loop_count)
+			  , [R_LEN] "+d" (len)
+			: /* inputs */
+			: /* clobber list*/ "memory", "cc"
+			  ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v20")
+			  ASM_CLOBBER_VR ("v21") ASM_CLOBBER_VR ("v22")
+			  ASM_CLOBBER_VR ("v23")
+			);
+      if (len > 0)
+	{
+	  /* The value is too large.  We don't try transliteration here since
+	     this is not an error because of the lack of possibilities to
+	     represent the result.  This is a genuine bug in the input since
+	     UCS4 does not allow such values.  */
+	  if (irreversible == NULL)
+	    /* We are transliterating, don't try to correct anything.  */
+	    return __GCONV_ILLEGAL_INPUT;
+
+	  if (flags & __GCONV_IGNORE_ERRORS)
+	    {
+	      /* Just ignore this character.  */
+	      ++*irreversible;
+	      inptr += 4;
+	      continue;
+	    }
+
+	  *inptrp = inptr;
+	  *outptrp = outptr;
+	  return __GCONV_ILLEGAL_INPUT;
+	}
+    }
+  while (len > 0);
+
+  *inptrp = inptr;
+  *outptrp = outptr;
+
+  /* Determine the status.  */
+  if (*inptrp == inend)
+    result = __GCONV_EMPTY_INPUT;
+  else if (*inptrp + 4 > inend)
+    result = __GCONV_INCOMPLETE_INPUT;
+  else
+    {
+      assert (*outptrp + 4 > outend);
+      result = __GCONV_FULL_OUTPUT;
+    }
+
+  return result;
+}
+ICONV_VX_SINGLE (ucs4le_internal_loop)
+# include <iconv/skeleton.c>
+ICONV_VX_IFUNC (__gconv_transform_ucs4le_internal)
+
+/* Convert from UCS2 to the internal (UCS4-like) format.  */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	2
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (ucs2_internal_loop)
+# define TO_LOOP		ICONV_VX_NAME (ucs2_internal_loop) /* This is not used.  */
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ucs2_internal)
+# define ONE_DIRECTION		1
+
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		FROM_LOOP
+# define BODY_ORIG_ERROR						\
+  /* Surrogate characters in UCS-2 input are not valid.  Reject		\
+     them.  (Catching this here is not security relevant.)  */		\
+  STANDARD_FROM_LOOP_ERR_HANDLER (2);
+# define BODY_ORIG							\
+  {									\
+    uint16_t u1 = get16 (inptr);					\
+									\
+    if (__glibc_unlikely (u1 >= 0xd800 && u1 < 0xe000))			\
+      {									\
+	BODY_ORIG_ERROR							\
+      }									\
+									\
+    *((uint32_t *) outptr) = u1;					\
+    outptr += sizeof (uint32_t);					\
+    inptr += 2;								\
+  }
+# define BODY								\
+  {									\
+    size_t len, tmp, tmp2;						\
+    len = MIN ((inend - inptr) / 2, (outend - outptr) / 4);		\
+    __asm__ volatile (".machine push\n\t"				\
+		      ".machine \"z13\"\n\t"				\
+		      ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		      CONVERT_32BIT_SIZE_T ([R_LEN])			\
+		      /* Setup to check for ch >= 0xd800 && ch < 0xe000.  */ \
+		      "    larl %[R_TMP],9f\n\t"			\
+		      "    vlm %%v20,%%v21,0(%[R_TMP])\n\t"		\
+		      "    srlg %[R_TMP],%[R_LEN],3\n\t"		\
+		      "    clgije %[R_TMP],0,1f\n\t"			\
+		      /* Process 16byte (8char) blocks.  */		\
+		      "0:  vl %%v16,0(%[R_IN])\n\t"			\
+		      "    vstrchs %%v19,%%v16,%%v20,%%v21\n\t"		\
+		      /* Enlarge UCS2 to UCS4.  */			\
+		      "    vuplhh %%v17,%%v16\n\t"			\
+		      "    vupllh %%v18,%%v16\n\t"			\
+		      "    jno 10f\n\t"					\
+		      /* Store 32bytes to buf_out.  */			\
+		      "    vstm %%v17,%%v18,0(%[R_OUT])\n\t"		\
+		      "    la %[R_IN],16(%[R_IN])\n\t"			\
+		      "    la %[R_OUT],32(%[R_OUT])\n\t"		\
+		      "    brctg %[R_TMP],0b\n\t"			\
+		      "    llgfr %[R_LEN],%[R_LEN]\n\t"			\
+		      "    nilf %[R_LEN],7\n\t"				\
+		      /* Process <16bytes.  */				\
+		      "1:  sll %[R_LEN],1\n\t"				\
+		      "    ahik %[R_TMP],%[R_LEN],-1\n\t"		\
+		      "    jl 20f\n\t" /* No further bytes available.  */ \
+		      "    vll %%v16,%[R_TMP],0(%[R_IN])\n\t"		\
+		      "    vstrchs %%v19,%%v16,%%v20,%%v21\n\t"		\
+		      /* Enlarge UCS2 to UCS4.  */			\
+		      "    vuplhh %%v17,%%v16\n\t"			\
+		      "    vupllh %%v18,%%v16\n\t"			\
+		      "    vlgvb %[R_TMP],%%v19,7\n\t"			\
+		      "    clr %[R_TMP],%[R_LEN]\n\t"			\
+		      "    locgrhe %[R_TMP],%[R_LEN]\n\t"		\
+		      "    locghihe %[R_LEN],0\n\t"			\
+		      "    j 11f\n\t"					\
+		      /* v20: Vector string range compare values.  */	\
+		      "9:  .short 0xd800,0xe000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		      /* v21: Vector string range compare control-bits.	\
+			 element 0: =>; element 1: <  */		\
+		      "    .short 0xa000,0x4000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		      /* Found an element: ch >= 0xd800 && ch < 0xe000  */ \
+		      "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		      "11: la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		      "    sll %[R_TMP],1\n\t"				\
+		      "    lgr %[R_TMP2],%[R_TMP]\n\t"			\
+		      "    ahi %[R_TMP],-1\n\t"				\
+		      "    jl 20f\n\t"					\
+		      "    vstl %%v17,%[R_TMP],0(%[R_OUT])\n\t"		\
+		      "    ahi %[R_TMP],-16\n\t"			\
+		      "    jl 19f\n\t"					\
+		      "    vstl %%v18,%[R_TMP],16(%[R_OUT])\n\t"	\
+		      "19: la %[R_OUT],0(%[R_TMP2],%[R_OUT])\n\t"	\
+		      "20: \n\t"					\
+		      ".machine pop"					\
+		      : /* outputs */ [R_OUT] "+a" (outptr)		\
+			, [R_IN] "+a" (inptr)				\
+			, [R_TMP] "=a" (tmp)				\
+			, [R_TMP2] "=a" (tmp2)				\
+			, [R_LEN] "+d" (len)				\
+		      : /* inputs */					\
+		      : /* clobber list*/ "memory", "cc"		\
+			ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+			ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+			ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+		      );						\
+    if (len > 0)							\
+      {									\
+	/* Found an invalid character at next input-char.  */		\
+	BODY_ORIG_ERROR							\
+      }									\
+  }
+
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+# include <iconv/skeleton.c>
+# undef BODY_ORIG
+# undef BODY_ORIG_ERROR
+ICONV_VX_IFUNC (__gconv_transform_ucs2_internal)
+
+/* Convert from UCS2 in other endianness to the internal (UCS4-like) format. */
+# define DEFINE_INIT		0
+# define DEFINE_FINI		0
+# define MIN_NEEDED_FROM	2
+# define MIN_NEEDED_TO		4
+# define FROM_DIRECTION		1
+# define FROM_LOOP		ICONV_VX_NAME (ucs2reverse_internal_loop)
+# define TO_LOOP		ICONV_VX_NAME (ucs2reverse_internal_loop) /* This is not used.*/
+# define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_ucs2reverse_internal)
+# define ONE_DIRECTION		1
+
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		FROM_LOOP
+# define BODY_ORIG_ERROR						\
+  /* Surrogate characters in UCS-2 input are not valid.  Reject		\
+     them.  (Catching this here is not security relevant.)  */		\
+  if (! ignore_errors_p ())						\
+    {									\
+      result = __GCONV_ILLEGAL_INPUT;					\
+      break;								\
+    }									\
+  inptr += 2;								\
+  ++*irreversible;							\
+  continue;
+
+# define BODY_ORIG \
+  {									\
+    uint16_t u1 = bswap_16 (get16 (inptr));				\
+									\
+    if (__glibc_unlikely (u1 >= 0xd800 && u1 < 0xe000))			\
+      {									\
+	BODY_ORIG_ERROR							\
+      }									\
+									\
+    *((uint32_t *) outptr) = u1;					\
+    outptr += sizeof (uint32_t);					\
+    inptr += 2;								\
+  }
+# define BODY								\
+  {									\
+    size_t len, tmp, tmp2;						\
+    len = MIN ((inend - inptr) / 2, (outend - outptr) / 4);		\
+    __asm__ volatile (".machine push\n\t"				\
+		      ".machine \"z13\"\n\t"				\
+		      ".machinemode \"zarch_nohighgprs\"\n\t"		\
+		      CONVERT_32BIT_SIZE_T ([R_LEN])			\
+		      /* Setup to check for ch >= 0xd800 && ch < 0xe000.  */ \
+		      "    larl %[R_TMP],9f\n\t"			\
+		      "    vlm %%v20,%%v22,0(%[R_TMP])\n\t"		\
+		      "    srlg %[R_TMP],%[R_LEN],3\n\t"		\
+		      "    clgije %[R_TMP],0,1f\n\t"			\
+		      /* Process 16byte (8char) blocks.  */		\
+		      "0:  vl %%v16,0(%[R_IN])\n\t"			\
+		      "    vperm %%v16,%%v16,%%v16,%%v22\n\t"		\
+		      "    vstrchs %%v19,%%v16,%%v20,%%v21\n\t"		\
+		      /* Enlarge UCS2 to UCS4.  */			\
+		      "    vuplhh %%v17,%%v16\n\t"			\
+		      "    vupllh %%v18,%%v16\n\t"			\
+		      "    jno 10f\n\t"					\
+		      /* Store 32bytes to buf_out.  */			\
+		      "    vstm %%v17,%%v18,0(%[R_OUT])\n\t"		\
+		      "    la %[R_IN],16(%[R_IN])\n\t"			\
+		      "    la %[R_OUT],32(%[R_OUT])\n\t"		\
+		      "    brctg %[R_TMP],0b\n\t"			\
+		      "    llgfr %[R_LEN],%[R_LEN]\n\t"			\
+		      "    nilf %[R_LEN],7\n\t"				\
+		      /* Process <16bytes.  */				\
+		      "1:  sll %[R_LEN],1\n\t"				\
+		      "    ahik %[R_TMP],%[R_LEN],-1\n\t"		\
+		      "    jl 20f\n\t" /* No further bytes available.  */ \
+		      "    vll %%v16,%[R_TMP],0(%[R_IN])\n\t"		\
+		      "    vperm %%v16,%%v16,%%v16,%%v22\n\t"		\
+		      "    vstrchs %%v19,%%v16,%%v20,%%v21\n\t"		\
+		      /* Enlarge UCS2 to UCS4.  */			\
+		      "    vuplhh %%v17,%%v16\n\t"			\
+		      "    vupllh %%v18,%%v16\n\t"			\
+		      "    vlgvb %[R_TMP],%%v19,7\n\t"			\
+		      "    clr %[R_TMP],%[R_LEN]\n\t"			\
+		      "    locgrhe %[R_TMP],%[R_LEN]\n\t"		\
+		      "    locghihe %[R_LEN],0\n\t"			\
+		      "    j 11f\n\t"					\
+		      /* v20: Vector string range compare values.  */	\
+		      "9:  .short 0xd800,0xe000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		      /* v21: Vector string range compare control-bits.	\
+			 element 0: =>; element 1: <  */		\
+		      "    .short 0xa000,0x4000,0x0,0x0,0x0,0x0,0x0,0x0\n\t" \
+		      /* v22: Vector permute mask.  */			\
+		      "    .short 0x0100,0x0302,0x0504,0x0706\n\t"	\
+		      "    .short 0x0908,0x0b0a,0x0d0c,0x0f0e\n\t"	\
+		      /* Found an element: ch >= 0xd800 && ch < 0xe000  */ \
+		      "10: vlgvb %[R_TMP],%%v19,7\n\t"			\
+		      "11: la %[R_IN],0(%[R_TMP],%[R_IN])\n\t"		\
+		      "    sll %[R_TMP],1\n\t"				\
+		      "    lgr %[R_TMP2],%[R_TMP]\n\t"			\
+		      "    ahi %[R_TMP],-1\n\t"				\
+		      "    jl 20f\n\t"					\
+		      "    vstl %%v17,%[R_TMP],0(%[R_OUT])\n\t"		\
+		      "    ahi %[R_TMP],-16\n\t"			\
+		      "    jl 19f\n\t"					\
+		      "    vstl %%v18,%[R_TMP],16(%[R_OUT])\n\t"	\
+		      "19: la %[R_OUT],0(%[R_TMP2],%[R_OUT])\n\t"	\
+		      "20: \n\t"					\
+		      ".machine pop"					\
+		      : /* outputs */ [R_OUT] "+a" (outptr)		\
+			, [R_IN] "+a" (inptr)				\
+			, [R_TMP] "=a" (tmp)				\
+			, [R_TMP2] "=a" (tmp2)				\
+			, [R_LEN] "+d" (len)				\
+		      : /* inputs */					\
+		      : /* clobber list*/ "memory", "cc"		\
+			ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17")	\
+			ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19")	\
+			ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21")	\
+			ASM_CLOBBER_VR ("v22")				\
+		      );						\
+    if (len > 0)							\
+      {									\
+	/* Found an invalid character at next input-char.  */		\
+	BODY_ORIG_ERROR							\
+      }									\
+  }
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+# include <iconv/skeleton.c>
+# undef BODY_ORIG
+# undef BODY_ORIG_ERROR
+ICONV_VX_IFUNC (__gconv_transform_ucs2reverse_internal)
+
+/* Convert from the internal (UCS4-like) format to UCS2.  */
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		4
+#define MIN_NEEDED_TO		2
+#define FROM_DIRECTION		1
+#define FROM_LOOP		ICONV_VX_NAME (internal_ucs2_loop)
+#define TO_LOOP			ICONV_VX_NAME (internal_ucs2_loop) /* This is not used.  */
+#define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_internal_ucs2)
+#define ONE_DIRECTION		1
+
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			FROM_LOOP
+#define BODY_ORIG							\
+  {									\
+    uint32_t val = *((const uint32_t *) inptr);				\
+									\
+    if (__glibc_unlikely (val >= 0x10000))				\
+      {									\
+	UNICODE_TAG_HANDLER (val, 4);					\
+	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
+      }									\
+    else if (__glibc_unlikely (val >= 0xd800 && val < 0xe000))		\
+      {									\
+	/* Surrogate characters in UCS-4 input are not valid.		\
+	   We must catch this, because the UCS-2 output might be	\
+	   interpreted as UTF-16 by other programs.  If we let		\
+	   surrogates pass through, attackers could make a security	\
+	   hole exploit by synthesizing any desired plane 1-16		\
+	   character.  */						\
+	result = __GCONV_ILLEGAL_INPUT;					\
+	if (! ignore_errors_p ())					\
+	  break;							\
+	inptr += 4;							\
+	++*irreversible;						\
+	continue;							\
+      }									\
+    else								\
+      {									\
+	put16 (outptr, val);						\
+	outptr += sizeof (uint16_t);					\
+	inptr += 4;							\
+      }									\
+  }
+# define BODY								\
+  {									\
+    if (__builtin_expect (inend - inptr < 32, 1)			\
+	|| outend - outptr < 16)					\
+      /* Convert remaining bytes with c code.  */			\
+      BODY_ORIG								\
+    else								\
+      {									\
+	/* Convert in 32 byte blocks.  */				\
+	size_t loop_count = (inend - inptr) / 32;			\
+	size_t tmp, tmp2;						\
+	if (loop_count > (outend - outptr) / 16)			\
+	  loop_count = (outend - outptr) / 16;				\
+	__asm__ volatile (".machine push\n\t"				\
+			  ".machine \"z13\"\n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  CONVERT_32BIT_SIZE_T ([R_LI])			\
+			  "    larl %[R_I],3f\n\t"			\
+			  "    vlm %%v20,%%v23,0(%[R_I])\n\t"		\
+			  "0:  \n\t"					\
+			  "    vlm %%v16,%%v17,0(%[R_IN])\n\t"		\
+			  /* Shorten UCS4 to UCS2.  */			\
+			  "    vpkf %%v18,%%v16,%%v17\n\t"		\
+			  "    vstrcfs %%v19,%%v16,%%v20,%%v21\n\t"	\
+			  "    jno 11f\n\t"				\
+			  "1:  vstrcfs %%v19,%%v17,%%v20,%%v21\n\t"	\
+			  "    jno 10f\n\t"				\
+			  /* Store 16bytes to buf_out.  */		\
+			  "2:  vst %%v18,0(%[R_OUT])\n\t"		\
+			  "    la %[R_IN],32(%[R_IN])\n\t"		\
+			  "    la %[R_OUT],16(%[R_OUT])\n\t"		\
+			  "    brctg %[R_LI],0b\n\t"			\
+			  "    j 20f\n\t"				\
+			  /* Setup to check for ch >= 0xd800. (v20, v21)  */ \
+			  "3:  .long 0xd800,0xd800,0x0,0x0\n\t"		\
+			  "    .long 0xa0000000,0xa0000000,0x0,0x0\n\t"	\
+			  /* Setup to check for ch >= 0xe000		\
+			     && ch < 0x10000. (v22,v23)  */		\
+			  "    .long 0xe000,0x10000,0x0,0x0\n\t"	\
+			  "    .long 0xa0000000,0x40000000,0x0,0x0\n\t"	\
+			  /* v16 contains only valid chars. Check in v17: \
+			     ch >= 0xe000 && ch <= 0xffff.  */		\
+			  "10: vstrcfs %%v19,%%v17,%%v22,%%v23,8\n\t"	\
+			  "    jo 2b\n\t" /* All ch's in this range, proceed.   */ \
+			  "    lghi %[R_TMP],16\n\t"			\
+			  "    j 12f\n\t"				\
+			  /* Maybe v16 contains invalid chars.		\
+			     Check ch >= 0xe000 && ch <= 0xffff.  */	\
+			  "11: vstrcfs %%v19,%%v16,%%v22,%%v23,8\n\t"	\
+			  "    jo 1b\n\t" /* All ch's in this range, proceed.   */ \
+			  "    lghi %[R_TMP],0\n\t"			\
+			  "12: vlgvb %[R_I],%%v19,7\n\t"		\
+			  "    agr %[R_I],%[R_TMP]\n\t"			\
+			  "    la %[R_IN],0(%[R_I],%[R_IN])\n\t"	\
+			  "    srl %[R_I],1\n\t"			\
+			  "    ahi %[R_I],-1\n\t"			\
+			  "    jl 20f\n\t"				\
+			  "    vstl %%v18,%[R_I],0(%[R_OUT])\n\t"	\
+			  "    la %[R_OUT],1(%[R_I],%[R_OUT])\n\t"	\
+			  "20:\n\t"					\
+			  ".machine pop"				\
+			  : /* outputs */ [R_OUT] "+a" (outptr)		\
+			    , [R_IN] "+a" (inptr)			\
+			    , [R_LI] "+d" (loop_count)			\
+			    , [R_I] "=a" (tmp2)				\
+			    , [R_TMP] "=d" (tmp)			\
+			  : /* inputs */				\
+			  : /* clobber list*/ "memory", "cc"		\
+			    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17") \
+			    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19") \
+			    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21") \
+			    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23") \
+			  );						\
+	if (loop_count > 0)						\
+	  {								\
+	    /* Found an invalid character at next character.  */	\
+	    BODY_ORIG							\
+	  }								\
+      }									\
+  }
+#define LOOP_NEED_FLAGS
+#include <iconv/loop.c>
+#include <iconv/skeleton.c>
+# undef BODY_ORIG
+ICONV_VX_IFUNC (__gconv_transform_internal_ucs2)
+
+/* Convert from the internal (UCS4-like) format to UCS2 in other endianness. */
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		4
+#define MIN_NEEDED_TO		2
+#define FROM_DIRECTION		1
+#define FROM_LOOP		ICONV_VX_NAME (internal_ucs2reverse_loop)
+#define TO_LOOP			ICONV_VX_NAME (internal_ucs2reverse_loop)/* This is not used.*/
+#define FUNCTION_NAME		ICONV_VX_NAME (__gconv_transform_internal_ucs2reverse)
+#define ONE_DIRECTION		1
+
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			FROM_LOOP
+#define BODY_ORIG							\
+  {									\
+    uint32_t val = *((const uint32_t *) inptr);				\
+    if (__glibc_unlikely (val >= 0x10000))				\
+      {									\
+	UNICODE_TAG_HANDLER (val, 4);					\
+	STANDARD_TO_LOOP_ERR_HANDLER (4);				\
+      }									\
+    else if (__glibc_unlikely (val >= 0xd800 && val < 0xe000))		\
+      {									\
+	/* Surrogate characters in UCS-4 input are not valid.		\
+	   We must catch this, because the UCS-2 output might be	\
+	   interpreted as UTF-16 by other programs.  If we let		\
+	   surrogates pass through, attackers could make a security	\
+	   hole exploit by synthesizing any desired plane 1-16		\
+	   character.  */						\
+	if (! ignore_errors_p ())					\
+	  {								\
+	    result = __GCONV_ILLEGAL_INPUT;				\
+	    break;							\
+	  }								\
+	inptr += 4;							\
+	++*irreversible;						\
+	continue;							\
+      }									\
+    else								\
+      {									\
+	put16 (outptr, bswap_16 (val));					\
+	outptr += sizeof (uint16_t);					\
+	inptr += 4;							\
+      }									\
+  }
+# define BODY								\
+  {									\
+    if (__builtin_expect (inend - inptr < 32, 1)			\
+	|| outend - outptr < 16)					\
+      /* Convert remaining bytes with c code.  */			\
+      BODY_ORIG								\
+    else								\
+      {									\
+	/* Convert in 32 byte blocks.  */				\
+	size_t loop_count = (inend - inptr) / 32;			\
+	size_t tmp, tmp2;						\
+	if (loop_count > (outend - outptr) / 16)			\
+	  loop_count = (outend - outptr) / 16;				\
+	__asm__ volatile (".machine push\n\t"				\
+			  ".machine \"z13\"\n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  CONVERT_32BIT_SIZE_T ([R_LI])			\
+			  "    larl %[R_I],3f\n\t"			\
+			  "    vlm %%v20,%%v24,0(%[R_I])\n\t"		\
+			  "0:  \n\t"					\
+			  "    vlm %%v16,%%v17,0(%[R_IN])\n\t"		\
+			  /* Shorten UCS4 to UCS2 and byteswap.  */	\
+			  "    vpkf %%v18,%%v16,%%v17\n\t"		\
+			  "    vperm %%v18,%%v18,%%v18,%%v24\n\t"	\
+			  "    vstrcfs %%v19,%%v16,%%v20,%%v21\n\t"	\
+			  "    jno 11f\n\t"				\
+			  "1:  vstrcfs %%v19,%%v17,%%v20,%%v21\n\t"	\
+			  "    jno 10f\n\t"				\
+			  /* Store 16bytes to buf_out.  */		\
+			  "2: vst %%v18,0(%[R_OUT])\n\t"		\
+			  "    la %[R_IN],32(%[R_IN])\n\t"		\
+			  "    la %[R_OUT],16(%[R_OUT])\n\t"		\
+			  "    brctg %[R_LI],0b\n\t"			\
+			  "    j 20f\n\t"				\
+			  /* Setup to check for ch >= 0xd800. (v20, v21)  */ \
+			  "3: .long 0xd800,0xd800,0x0,0x0\n\t"		\
+			  "    .long 0xa0000000,0xa0000000,0x0,0x0\n\t"	\
+			  /* Setup to check for ch >= 0xe000		\
+			     && ch < 0x10000. (v22,v23)  */		\
+			  "    .long 0xe000,0x10000,0x0,0x0\n\t"	\
+			  "    .long 0xa0000000,0x40000000,0x0,0x0\n\t"	\
+			  /* Vector permute mask (v24)  */		\
+			  "    .short 0x0100,0x0302,0x0504,0x0706\n\t"	\
+			  "    .short 0x0908,0x0b0a,0x0d0c,0x0f0e\n\t"	\
+			  /* v16 contains only valid chars. Check in v17: \
+			     ch >= 0xe000 && ch <= 0xffff.  */		\
+			  "10: vstrcfs %%v19,%%v17,%%v22,%%v23,8\n\t"	\
+			  "    jo 2b\n\t" /* All ch's in this range, proceed.  */ \
+			  "    lghi %[R_TMP],16\n\t"			\
+			  "    j 12f\n\t"				\
+			  /* Maybe v16 contains invalid chars.		\
+			     Check ch >= 0xe000 && ch <= 0xffff.  */	\
+			  "11: vstrcfs %%v19,%%v16,%%v22,%%v23,8\n\t"	\
+			  "    jo 1b\n\t" /* All ch's in this range, proceed.  */ \
+			  "    lghi %[R_TMP],0\n\t"			\
+			  "12: vlgvb %[R_I],%%v19,7\n\t"		\
+			  "    agr %[R_I],%[R_TMP]\n\t"			\
+			  "    la %[R_IN],0(%[R_I],%[R_IN])\n\t"	\
+			  "    srl %[R_I],1\n\t"			\
+			  "    ahi %[R_I],-1\n\t"			\
+			  "    jl 20f\n\t"				\
+			  "    vstl %%v18,%[R_I],0(%[R_OUT])\n\t"	\
+			  "    la %[R_OUT],1(%[R_I],%[R_OUT])\n\t"	\
+			  "20:\n\t"					\
+			  ".machine pop"				\
+			  : /* outputs */ [R_OUT] "+a" (outptr)		\
+			    , [R_IN] "+a" (inptr)			\
+			    , [R_LI] "+d" (loop_count)			\
+			    , [R_I] "=a" (tmp2)				\
+			    , [R_TMP] "=d" (tmp)			\
+			  : /* inputs */				\
+			  : /* clobber list*/ "memory", "cc"		\
+			    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17") \
+			    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19") \
+			    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21") \
+			    ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23") \
+			    ASM_CLOBBER_VR ("v24")			\
+			  );						\
+	if (loop_count > 0)						\
+	  {								\
+	    /* Found an invalid character at next character.  */	\
+	    BODY_ORIG							\
+	  }								\
+      }									\
+  }
+#define LOOP_NEED_FLAGS
+#include <iconv/loop.c>
+#include <iconv/skeleton.c>
+# undef BODY_ORIG
+ICONV_VX_IFUNC (__gconv_transform_internal_ucs2reverse)
+
+
+#else
+/* Generate the internal transformations without ifunc if build environment
+   lacks vector support. Instead simply include the common version.  */
+# include <iconv/gconv_simple.c>
+#endif /* !defined HAVE_S390_VX_ASM_SUPPORT */

http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=4690dab084f854bf0013b5eaabcf90c2d5b692ff

commit 4690dab084f854bf0013b5eaabcf90c2d5b692ff
Author: Stefan Liebler <stli@linux.vnet.ibm.com>
Date:   Wed May 25 17:18:04 2016 +0200

    S390: Optimize 8bit-generic iconv modules.
    
    This patch introduces a s390 specific 8bit-generic.c file which provides an
    optimized version for z13 with translate-/vector-instructions, which will be
    chosen at runtime via ifunc.
    If the build-environment lacks vector support, then iconvdata/8bit-generic.c
    is used wihtout any change. Otherwise iconvdata/8bit-generic.c is used to create
    conversion loop routines without vector instructions as fallback, if vector
    instructions aren't available at runtime.
    
    The vector routines can only be used with charsets where the maximum UCS4 value
    fits in 1 byte size. Then the hardware translate-instruction is used
    to translate between up to 256 generic characters and "1 byte UCS4"
    characters at once. The vector instructions are used to convert between
    the "1 byte UCS4" and UCS4.
    
    The gen-8bit.sh script in sysdeps/s390/multiarch generates the conversion
    table to_ucs1. Therefore in sysdeps/s390/multiarch/Makefile is added an
    override define generate-8bit-table, which is originally defined in
    iconvdata/Makefile. This version calls the gen-8bit.sh in iconvdata folder
    and the s390 one.
    
    ChangeLog:
    
    	* sysdeps/s390/multiarch/8bit-generic.c: New File.
    	* sysdeps/s390/multiarch/gen-8bit.sh: New File.
    	* sysdeps/s390/multiarch/Makefile (generate-8bit-table):
    	New override define.
    	* sysdeps/s390/multiarch/iconv/skeleton.c: Likewise.

diff --git a/ChangeLog b/ChangeLog
index cf6315a..42f1b9d 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,13 @@
 2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
 
+	* sysdeps/s390/multiarch/8bit-generic.c: New File.
+	* sysdeps/s390/multiarch/gen-8bit.sh: New File.
+	* sysdeps/s390/multiarch/Makefile (generate-8bit-table):
+	New override define.
+	* sysdeps/s390/multiarch/iconv/skeleton.c: Likewise.
+
+2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
+
 	* config.h.in (HAVE_S390_VX_GCC_SUPPORT): New macro undefine.
 	* sysdeps/s390/configure.ac: Add test for S390 vector register
 	support in gcc.
diff --git a/sysdeps/s390/multiarch/8bit-generic.c b/sysdeps/s390/multiarch/8bit-generic.c
new file mode 100644
index 0000000..93565e1
--- /dev/null
+++ b/sysdeps/s390/multiarch/8bit-generic.c
@@ -0,0 +1,415 @@
+/* Generic conversion to and from 8bit charsets - S390 version.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#if defined HAVE_S390_VX_ASM_SUPPORT
+
+# if defined HAVE_S390_VX_GCC_SUPPORT
+#  define ASM_CLOBBER_VR(NR) , NR
+# else
+#  define ASM_CLOBBER_VR(NR)
+# endif
+
+/* Generate the conversion loop routines without vector instructions as
+   fallback, if vector instructions aren't available at runtime.  */
+# define IGNORE_ICONV_SKELETON
+# define from_generic __from_generic_c
+# define to_generic __to_generic_c
+# include "iconvdata/8bit-generic.c"
+# undef IGNORE_ICONV_SKELETON
+# undef from_generic
+# undef to_generic
+
+/* Generate the converion routines with vector instructions. The vector
+   routines can only be used with charsets where the maximum UCS4 value
+   fits in 1 byte size. Then the hardware translate-instruction is used
+   to translate between multiple generic characters and "1 byte UCS4"
+   characters at once. The vector instructions are used to convert between
+   the "1 byte UCS4" and UCS4.  */
+# include <unistd.h>
+# include <dl-procinfo.h>
+
+# undef FROM_LOOP
+# undef TO_LOOP
+# define FROM_LOOP		__from_generic_vx
+# define TO_LOOP		__to_generic_vx
+
+# define MIN_NEEDED_FROM	1
+# define MIN_NEEDED_TO		4
+# define ONE_DIRECTION		0
+
+/* First define the conversion function from the 8bit charset to UCS4.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+# define LOOPFCT		FROM_LOOP
+# define BODY_FROM_ORIG \
+  {									      \
+    uint32_t ch = to_ucs4[*inptr];					      \
+									      \
+    if (HAS_HOLES && __builtin_expect (ch == L'\0', 0) && *inptr != '\0')     \
+      {									      \
+	/* This is an illegal character.  */				      \
+	STANDARD_FROM_LOOP_ERR_HANDLER (1);				      \
+      }									      \
+									      \
+    put32 (outptr, ch);							      \
+    outptr += 4;							      \
+    ++inptr;								      \
+  }
+
+# define BODY								\
+  {									\
+    if (__builtin_expect (inend - inptr < 16, 1)			\
+	|| outend - outptr < 64)					\
+      /* Convert remaining bytes with c code.  */			\
+      BODY_FROM_ORIG							\
+    else								\
+       {								\
+	 /* Convert 16 ... 256 bytes at once with tr-instruction.  */	\
+	 size_t index;							\
+	 char buf[256];							\
+	 size_t loop_count = (inend - inptr) / 16;			\
+	 if (loop_count > (outend - outptr) / 64)			\
+	   loop_count = (outend - outptr) / 64;				\
+	 if (loop_count > 16)						\
+	   loop_count = 16;						\
+	 __asm__ volatile (".machine push\n\t"				\
+			   ".machine \"z13\"\n\t"			\
+			   ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			   "    sllk %[R_I],%[R_LI],4\n\t"		\
+			   "    ahi %[R_I],-1\n\t"			\
+			   /* Execute mvc and tr with correct len.  */	\
+			   "    exrl %[R_I],21f\n\t"			\
+			   "    exrl %[R_I],22f\n\t"			\
+			   /* Post-processing.  */			\
+			   "    lghi %[R_I],0\n\t"			\
+			   "    vzero %%v0\n\t"				\
+			   "0:  \n\t"					\
+			   /* Find invalid character - value is zero.  */ \
+			   "    vl %%v16,0(%[R_I],%[R_BUF])\n\t"	\
+			   "    vceqbs %%v23,%%v0,%%v16\n\t"		\
+			   "    jle 10f\n\t"				\
+			   "1:  \n\t"					\
+			   /* Enlarge to UCS4.  */			\
+			   "    vuplhb %%v17,%%v16\n\t"			\
+			   "    vupllb %%v18,%%v16\n\t"			\
+			   "    vuplhh %%v19,%%v17\n\t"			\
+			   "    vupllh %%v20,%%v17\n\t"			\
+			   "    vuplhh %%v21,%%v18\n\t"			\
+			   "    vupllh %%v22,%%v18\n\t"			\
+			   /* Store 64bytes to buf_out.  */		\
+			   "    vstm %%v19,%%v22,0(%[R_OUT])\n\t"	\
+			   "    aghi %[R_I],16\n\t"			\
+			   "    la %[R_OUT],64(%[R_OUT])\n\t"		\
+			   "    brct %[R_LI],0b\n\t"			\
+			   "    la %[R_IN],0(%[R_I],%[R_IN])\n\t"	\
+			   "    j 20f\n\t"				\
+			   "21: mvc 0(1,%[R_BUF]),0(%[R_IN])\n\t"	\
+			   "22: tr 0(1,%[R_BUF]),0(%[R_TBL])\n\t"	\
+			   /* Possibly invalid character found.  */	\
+			   "10: \n\t"					\
+			   /* Test if input was zero, too.  */		\
+			   "    vl %%v24,0(%[R_I],%[R_IN])\n\t"		\
+			   "    vceqb %%v24,%%v0,%%v24\n\t"		\
+			   /* Zeros in buf (v23) and inptr (v24) are marked \
+			      with one bits. After xor, invalid characters \
+			      are marked as one bits. Proceed, if no	\
+			      invalid characters are found.  */		\
+			   "    vx %%v24,%%v23,%%v24\n\t"		\
+			   "    vfenebs %%v24,%%v24,%%v0\n\t"		\
+			   "    jo 1b\n\t"				\
+			   /* Found an invalid translation.		\
+			      Store the preceding chars.  */		\
+			   "    la %[R_IN],0(%[R_I],%[R_IN])\n\t"	\
+			   "    vlgvb %[R_I],%%v24,7\n\t"		\
+			   "    la %[R_IN],0(%[R_I],%[R_IN])\n\t"	\
+			   "    sll %[R_I],2\n\t"			\
+			   "    ahi %[R_I],-1\n\t"			\
+			   "    jl 20f\n\t"				\
+			   "    lgr %[R_LI],%[R_I]\n\t"			\
+			   "    vuplhb %%v17,%%v16\n\t"			\
+			   "    vuplhh %%v19,%%v17\n\t"			\
+			   "    vstl %%v19,%[R_I],0(%[R_OUT])\n\t"	\
+			   "    ahi %[R_I],-16\n\t"			\
+			   "    jl 11f\n\t"				\
+			   "    vupllh %%v20,%%v17\n\t"			\
+			   "    vstl %%v20,%[R_I],16(%[R_OUT])\n\t"	\
+			   "    ahi %[R_I],-16\n\t"			\
+			   "    jl 11f\n\t"				\
+			   "    vupllb %%v18,%%v16\n\t"			\
+			   "    vuplhh %%v21,%%v18\n\t"			\
+			   "    vstl %%v21,%[R_I],32(%[R_OUT])\n\t"	\
+			   "    ahi %[R_I],-16\n\t"			\
+			   "    jl 11f\n\t"				\
+			   "    vupllh %%v22,%%v18\n\t"			\
+			   "    vstl %%v22,%[R_I],48(%[R_OUT])\n\t"	\
+			   "11: \n\t"					\
+			   "    la %[R_OUT],1(%[R_LI],%[R_OUT])\n\t"	\
+			   "20: \n\t"					\
+			   ".machine pop"				\
+			   : /* outputs */ [R_IN] "+a" (inptr)		\
+			     , [R_OUT] "+a" (outptr), [R_I] "=&a" (index) \
+			     , [R_LI] "+a" (loop_count)			\
+			   : /* inputs */ [R_BUF] "a" (buf)		\
+			     , [R_TBL] "a" (to_ucs1)			\
+			   : /* clobber list*/ "memory", "cc"		\
+			     ASM_CLOBBER_VR ("v0")  ASM_CLOBBER_VR ("v16") \
+			     ASM_CLOBBER_VR ("v17") ASM_CLOBBER_VR ("v18") \
+			     ASM_CLOBBER_VR ("v19") ASM_CLOBBER_VR ("v20") \
+			     ASM_CLOBBER_VR ("v21") ASM_CLOBBER_VR ("v22") \
+			     ASM_CLOBBER_VR ("v23") ASM_CLOBBER_VR ("v24") \
+			   );						\
+	 /* Error occured?  */						\
+	 if (loop_count != 0)						\
+	   {								\
+	     /* Found an invalid character!  */				\
+	    STANDARD_FROM_LOOP_ERR_HANDLER (1);				\
+	  }								\
+      }									\
+    }
+
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+
+/* Next, define the other direction - from UCS4 to 8bit charset.  */
+# define MIN_NEEDED_INPUT	MIN_NEEDED_TO
+# define MIN_NEEDED_OUTPUT	MIN_NEEDED_FROM
+# define LOOPFCT		TO_LOOP
+# define BODY_TO_ORIG \
+  {									      \
+    uint32_t ch = get32 (inptr);					      \
+									      \
+    if (__builtin_expect (ch >= sizeof (from_ucs4) / sizeof (from_ucs4[0]), 0)\
+	|| (__builtin_expect (from_ucs4[ch], '\1') == '\0' && ch != 0))	      \
+      {									      \
+	UNICODE_TAG_HANDLER (ch, 4);					      \
+									      \
+	/* This is an illegal character.  */				      \
+	STANDARD_TO_LOOP_ERR_HANDLER (4);				      \
+      }									      \
+									      \
+    *outptr++ = from_ucs4[ch];						      \
+    inptr += 4;								      \
+  }
+# define BODY								\
+  {									\
+    if (__builtin_expect (inend - inptr < 64, 1)			\
+	|| outend - outptr < 16)					\
+      /* Convert remaining bytes with c code.  */			\
+      BODY_TO_ORIG							\
+    else								\
+      {									\
+	/* Convert 64 ... 1024 bytes at once with tr-instruction.  */	\
+	size_t index, tmp;						\
+	char buf[256];							\
+	size_t loop_count = (inend - inptr) / 64;			\
+	uint32_t max = sizeof (from_ucs4) / sizeof (from_ucs4[0]);	\
+	if (loop_count > (outend - outptr) / 16)			\
+	  loop_count = (outend - outptr) / 16;				\
+	if (loop_count > 16)						\
+	  loop_count = 16;						\
+	size_t remaining_loop_count = loop_count;			\
+	/* Step 1: Check for ch>=max, ch == 0 and shorten to bytes.	\
+	   (ch == 0 is no error, but is handled differently)  */	\
+	__asm__ volatile (".machine push\n\t"				\
+			  ".machine \"z13\"\n\t"			\
+			  ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			  /* Setup to check for ch >= max.  */		\
+			  "    vzero %%v21\n\t"				\
+			  "    vleih %%v21,-24576,0\n\t" /* element 0:   >  */ \
+			  "    vleih %%v21,-8192,2\n\t"  /* element 1: =<>  */ \
+			  "    vlvgf %%v20,%[R_MAX],0\n\t" /* element 0: val  */ \
+			  /* Process in 64byte - 16 characters blocks.  */ \
+			  "    lghi %[R_I],0\n\t"			\
+			  "    lghi %[R_TMP],0\n\t"			\
+			  "0:  \n\t"					\
+			  "    vlm %%v16,%%v19,0(%[R_IN])\n\t"		\
+			  /* Test for ch >= max and ch == 0.  */	\
+			  "    vstrczfs %%v22,%%v16,%%v20,%%v21\n\t"	\
+			  "    jno 10f\n\t"				\
+			  "    vstrczfs %%v22,%%v17,%%v20,%%v21\n\t"	\
+			  "    jno 11f\n\t"				\
+			  "    vstrczfs %%v22,%%v18,%%v20,%%v21\n\t"	\
+			  "    jno 12f\n\t"				\
+			  "    vstrczfs %%v22,%%v19,%%v20,%%v21\n\t"	\
+			  "    jno 13f\n\t"				\
+			  /* Shorten to byte values.  */		\
+			  "    vpkf %%v16,%%v16,%%v17\n\t"		\
+			  "    vpkf %%v18,%%v18,%%v19\n\t"		\
+			  "    vpkh %%v16,%%v16,%%v18\n\t"		\
+			  /* Store 16bytes to buf.  */			\
+			  "    vst %%v16,0(%[R_I],%[R_BUF])\n\t"	\
+			  /* Loop until all blocks are processed.  */	\
+			  "    la %[R_IN],64(%[R_IN])\n\t"		\
+			  "    aghi %[R_I],16\n\t"			\
+			  "    brct %[R_LI],0b\n\t"			\
+			  "    j 20f\n\t"				\
+			  /* Found error ch >= max or ch == 0. */	\
+			  "13: aghi %[R_TMP],4\n\t"			\
+			  "12: aghi %[R_TMP],4\n\t"			\
+			  "11: aghi %[R_TMP],4\n\t"			\
+			  "10: vlgvb %[R_I],%%v22,7\n\t"		\
+			  "    srlg %[R_I],%[R_I],2\n\t"		\
+			  "    agr %[R_I],%[R_TMP]\n\t"			\
+			  "20: \n\t"					\
+			  ".machine pop"				\
+			  : /* outputs */ [R_IN] "+a" (inptr)		\
+			    , [R_I] "=&a" (index)			\
+			    , [R_TMP] "=d" (tmp)			\
+			    , [R_LI] "+d" (remaining_loop_count)	\
+			  : /* inputs */ [R_BUF] "a" (buf)		\
+			    , [R_MAX] "d" (max)				\
+			  : /* clobber list*/ "memory", "cc"		\
+			    ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17") \
+			    ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19") \
+			    ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21") \
+			    ASM_CLOBBER_VR ("v22")			\
+			  );						\
+	/* Error occured in step 1? An error (ch >= max || ch == 0)	\
+	   occured, if remaining_loop_count > 0. The error occured	\
+	   at character-index (index) after already processed blocks.  */ \
+	loop_count -= remaining_loop_count;				\
+	if (loop_count > 0)						\
+	  {								\
+	    /* Step 2: Translate already processed blocks in buf and	\
+	       check for errors (from_ucs4[ch] == 0).  */		\
+	    __asm__ volatile (".machine push\n\t"			\
+			      ".machine \"z13\"\n\t"			\
+			      ".machinemode \"zarch_nohighgprs\"\n\t"	\
+			      "    sllk %[R_I],%[R_LI],4\n\t"		\
+			      "    ahi %[R_I],-1\n\t"			\
+			      /* Execute tr with correct len.  */	\
+			      "    exrl %[R_I],21f\n\t"			\
+			      /* Post-processing.  */			\
+			      "    lghi %[R_I],0\n\t"			\
+			      "0:  \n\t"				\
+			      /* Find invalid character - value == 0.  */ \
+			      "    vl %%v16,0(%[R_I],%[R_BUF])\n\t"	\
+			      "    vfenezbs %%v17,%%v16,%%v16\n\t"	\
+			      "    je 10f\n\t"				\
+			      /* Store 16bytes to buf_out.  */		\
+			      "    vst %%v16,0(%[R_I],%[R_OUT])\n\t"	\
+			      "    aghi %[R_I],16\n\t"			\
+			      "    brct %[R_LI],0b\n\t"			\
+			      "    la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"	\
+			      "    j 20f\n\t"				\
+			      "21: tr 0(1,%[R_BUF]),0(%[R_TBL])\n\t"	\
+			      /* Found an error: from_ucs4[ch] == 0.  */ \
+			      "10: la %[R_OUT],0(%[R_I],%[R_OUT])\n\t"	\
+			      "    vlgvb %[R_I],%%v17,7\n\t"		\
+			      "20: \n\t"				\
+			      ".machine pop"				\
+			      : /* outputs */ [R_OUT] "+a" (outptr)	\
+				, [R_I] "=&a" (tmp)			\
+				, [R_LI] "+d" (loop_count)		\
+			      : /* inputs */ [R_BUF] "a" (buf)		\
+				, [R_TBL] "a" (from_ucs4)		\
+			      : /* clobber list*/ "memory", "cc"	\
+				ASM_CLOBBER_VR ("v16")			\
+				ASM_CLOBBER_VR ("v17")			\
+			      );					\
+	    /* Error occured in processed bytes of step 2?		\
+	       Thus possible error in step 1 is obselete.*/		\
+	    if (tmp < 16)						\
+	      {								\
+		index = tmp;						\
+		inptr -= loop_count * 64;				\
+	      }								\
+	  }								\
+	/* Error occured in step 1/2?  */				\
+	if (index < 16)							\
+	  {								\
+	    /* Found an invalid character (see step 2) or zero		\
+	       (see step 1) at index! Convert the chars before index	\
+	       manually. If there is a zero at index detected by step 1, \
+	       there could be invalid characters before this zero.  */	\
+	    int i;							\
+	    uint32_t ch;						\
+	    for (i = 0; i < index; i++)					\
+	      {								\
+		ch = get32 (inptr);					\
+		if (__builtin_expect (from_ucs4[ch], '\1') == '\0')     \
+		  break;						\
+		*outptr++ = from_ucs4[ch];				\
+		inptr += 4;						\
+	      }								\
+	    if (i == index)						\
+	      {								\
+		ch = get32 (inptr);					\
+		if (ch == 0)						\
+		  {							\
+		    /* This is no error, but handled differently.  */	\
+		    *outptr++ = from_ucs4[ch];				\
+		    inptr += 4;						\
+		    continue;						\
+		  }							\
+	      }								\
+									\
+	    UNICODE_TAG_HANDLER (ch, 4);				\
+									\
+	    /* This is an illegal character.  */			\
+	    STANDARD_TO_LOOP_ERR_HANDLER (4);				\
+	  }								\
+      }									\
+  }
+
+# define LOOP_NEED_FLAGS
+# include <iconv/loop.c>
+
+
+/* Generate ifunc'ed loop function.  */
+__typeof(__from_generic_c)
+__attribute__ ((ifunc ("__from_generic_resolver")))
+__from_generic;
+
+static void *
+__from_generic_resolver (unsigned long int dl_hwcap)
+{
+  if (sizeof (from_ucs4) / sizeof (from_ucs4[0]) <= 256
+      && dl_hwcap & HWCAP_S390_VX)
+    return &__from_generic_vx;
+  else
+    return &__from_generic_c;
+}
+
+__typeof(__to_generic_c)
+__attribute__ ((ifunc ("__to_generic_resolver")))
+__to_generic;
+
+static void *
+__to_generic_resolver (unsigned long int dl_hwcap)
+{
+  if (sizeof (from_ucs4) / sizeof (from_ucs4[0]) <= 256
+      && dl_hwcap & HWCAP_S390_VX)
+    return &__to_generic_vx;
+  else
+    return &__to_generic_c;
+}
+
+strong_alias (__to_generic_c_single, __to_generic_single)
+
+# undef FROM_LOOP
+# undef TO_LOOP
+# define FROM_LOOP		__from_generic
+# define TO_LOOP		__to_generic
+# include <iconv/skeleton.c>
+
+#else
+/* Generate this module without ifunc if build environment lacks vector
+   support. Instead the common 8bit-generic.c is used.  */
+# include "iconvdata/8bit-generic.c"
+#endif /* !defined HAVE_S390_VX_ASM_SUPPORT */
diff --git a/sysdeps/s390/multiarch/Makefile b/sysdeps/s390/multiarch/Makefile
index 89324ca..6073bbb 100644
--- a/sysdeps/s390/multiarch/Makefile
+++ b/sysdeps/s390/multiarch/Makefile
@@ -43,3 +43,13 @@ sysdep_routines += wcslen wcslen-vx wcslen-c \
 		   wmemset wmemset-vx wmemset-c \
 		   wmemcmp wmemcmp-vx wmemcmp-c
 endif
+
+ifeq ($(subdir),iconvdata)
+override define generate-8bit-table
+$(make-target-directory)
+LC_ALL=C $(SHELL) ./gen-8bit.sh $< > $(@:stmp=T)
+LC_ALL=C $(SHELL) ../sysdeps/s390/multiarch/gen-8bit.sh $< >> $(@:stmp=T)
+$(move-if-change) $(@:stmp=T) $(@:stmp=h)
+touch $@
+endef
+endif
diff --git a/sysdeps/s390/multiarch/gen-8bit.sh b/sysdeps/s390/multiarch/gen-8bit.sh
new file mode 100644
index 0000000..6f88c4b
--- /dev/null
+++ b/sysdeps/s390/multiarch/gen-8bit.sh
@@ -0,0 +1,6 @@
+#!/bin/sh
+echo "static const uint8_t to_ucs1[256] = {"
+sed -ne '/^[^[:space:]]*[[:space:]]*.x00/d;/^END/q' \
+    -e 's/^<U00\(..\)>[[:space:]]*.x\(..\).*/  [0x\2] = 0x\1,/p' \
+    "$@" | sort -u
+echo "};"
diff --git a/sysdeps/s390/multiarch/iconv/skeleton.c b/sysdeps/s390/multiarch/iconv/skeleton.c
new file mode 100644
index 0000000..3a90031
--- /dev/null
+++ b/sysdeps/s390/multiarch/iconv/skeleton.c
@@ -0,0 +1,21 @@
+/* Skeleton for a conversion module - S390 version.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef IGNORE_ICONV_SKELETON
+# include_next <iconv/skeleton.c>
+#endif

http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=9b7f05599a92dead97d6683bc838a57bc63ac52b

commit 9b7f05599a92dead97d6683bc838a57bc63ac52b
Author: Stefan Liebler <stli@linux.vnet.ibm.com>
Date:   Wed May 25 17:18:04 2016 +0200

    S390: Configure check for vector support in gcc.
    
    The S390 specific test checks if the gcc has support for vector registers
    by compiling an inline assembly which clobbers vector registers.
    On success the macro HAVE_S390_VX_GCC_SUPPORT is defined.
    This macro can be used to determine if e.g. clobbering vector registers
    is allowed or not.
    
    ChangeLog:
    
    	* config.h.in (HAVE_S390_VX_GCC_SUPPORT): New macro undefine.
    	* sysdeps/s390/configure.ac: Add test for S390 vector register
    	support in gcc.
    	* sysdeps/s390/configure: Regenerated.

diff --git a/ChangeLog b/ChangeLog
index fceeeb3..cf6315a 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,12 @@
 2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
 
+	* config.h.in (HAVE_S390_VX_GCC_SUPPORT): New macro undefine.
+	* sysdeps/s390/configure.ac: Add test for S390 vector register
+	support in gcc.
+	* sysdeps/s390/configure: Regenerated.
+
+2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
+
 	* iconvdata/Makefile ($(inst_gconvdir)/gconv-modules):
 	Install file from $(objpfx)gconv-modules.
 	($(objpfx)gconv-modules): Concatenate architecture specific file
diff --git a/config.h.in b/config.h.in
index 2c902b0..b28b513 100644
--- a/config.h.in
+++ b/config.h.in
@@ -73,6 +73,10 @@
 /* Define if assembler supports vector instructions on S390.  */
 #undef  HAVE_S390_VX_ASM_SUPPORT
 
+/* Define if gcc supports vector registers as clobbers in inline assembly
+   on S390.  */
+#undef  HAVE_S390_VX_GCC_SUPPORT
+
 /* Define if assembler supports Intel MPX.  */
 #undef  HAVE_MPX_SUPPORT
 
diff --git a/sysdeps/s390/configure b/sysdeps/s390/configure
index 0fa54c3..c9fb69c 100644
--- a/sysdeps/s390/configure
+++ b/sysdeps/s390/configure
@@ -144,6 +144,38 @@ else
 $as_echo "$as_me: WARNING: Use binutils with vector-support in order to use optimized implementations." >&2;}
 fi
 
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for S390 vector support in gcc" >&5
+$as_echo_n "checking for S390 vector support in gcc... " >&6; }
+if ${libc_cv_gcc_s390_vx+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat > conftest.c <<\EOF
+void testvecclobber ()
+{
+  __asm__ ("" : : : "v16");
+}
+EOF
+if { ac_try='${CC-cc} --shared conftest.c -o conftest.o &> /dev/null'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; } ;
+then
+  libc_cv_gcc_s390_vx=yes
+else
+  libc_cv_gcc_s390_vx=no
+fi
+rm -f conftest*
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libc_cv_gcc_s390_vx" >&5
+$as_echo "$libc_cv_gcc_s390_vx" >&6; }
+
+if test "$libc_cv_gcc_s390_vx" = yes ;
+then
+  $as_echo "#define HAVE_S390_VX_GCC_SUPPORT 1" >>confdefs.h
+
+fi
 
 test -n "$critic_missing" && as_fn_error $? "
 *** $critic_missing" "$LINENO" 5
diff --git a/sysdeps/s390/configure.ac b/sysdeps/s390/configure.ac
index 4da134e..1db6d84 100644
--- a/sysdeps/s390/configure.ac
+++ b/sysdeps/s390/configure.ac
@@ -64,6 +64,27 @@ else
   AC_MSG_WARN([Use binutils with vector-support in order to use optimized implementations.])
 fi
 
+AC_CACHE_CHECK(for S390 vector support in gcc, libc_cv_gcc_s390_vx, [dnl
+cat > conftest.c <<\EOF
+void testvecclobber ()
+{
+  __asm__ ("" : : : "v16");
+}
+EOF
+dnl
+dnl test, if gcc supports S390 vector registers as clobber in inline assembly
+if AC_TRY_COMMAND([${CC-cc} --shared conftest.c -o conftest.o &> /dev/null]) ;
+then
+  libc_cv_gcc_s390_vx=yes
+else
+  libc_cv_gcc_s390_vx=no
+fi
+rm -f conftest* ])
+
+if test "$libc_cv_gcc_s390_vx" = yes ;
+then
+  AC_DEFINE(HAVE_S390_VX_GCC_SUPPORT)
+fi
 
 test -n "$critic_missing" && AC_MSG_ERROR([
 *** $critic_missing])

http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=c70e9913d2fc2d0bf6a3ca98a4dece759d40a4ec

commit c70e9913d2fc2d0bf6a3ca98a4dece759d40a4ec
Author: Stefan Liebler <stli@linux.vnet.ibm.com>
Date:   Wed May 25 17:18:04 2016 +0200

    S390: Get rid of make warning: overriding recipe for target gconv-modules.
    
    This patch introduces a way to provide an architecture dependent gconv-modules
    file. Before this patch, the gconv-modules file was normally installed from
    src-dir/iconvdata/gconv-modules. The S390 Makefile had overridden the
    installation recipe (with a make warning) in order to install the
    gconv-module-s390 file from build-dir.
    The iconvdata/Makefile provides another recipe, which copies the gconv-modules
    file from src to build dir, which are used by the testcases.
    Thus the testcases does not use the currently build s390-modules.
    
    This patch uses build-dir/iconvdata/gconv-modules for installation, which
    is generated by concatenating src-dir/iconvdata/gconv-modules and the
    architecture specific one. The latter one can be specified by setting the variable
    sysdeps-gconv-modules in sysdeps/.../Makefile.
    
    The architecture specific gconv-modules file is emitted before the common one
    because these modules aren't used in all possible conversions. E.g. the converting
    from INTERNAL to UTF-16 used the common UTF-16.so module instead of UTF16_UTF32_Z9.so.
    
    This way, the s390-Makefile does not need to override the recipe for gconv-modules
    and no warning is emitted anymore.
    Since we no longer support empty objpfx the conditional test in iconvdata/Makefile
    is removed.
    
    ChangeLog:
    
    	* iconvdata/Makefile ($(inst_gconvdir)/gconv-modules):
    	Install file from $(objpfx)gconv-modules.
    	($(objpfx)gconv-modules): Concatenate architecture specific file
    	in variable sysdeps-gconv-modules and gconv-modules in src dir.
    	* sysdeps/s390/gconv-modules: New file.
    	* sysdeps/s390/s390-64/Makefile: ($(inst_gconvdir)/gconv-modules):
    	Deleted.
    	($(objpfx)gconv-modules-s390): Deleted.
    	(sysdeps-gconv-modules): New variable.

diff --git a/ChangeLog b/ChangeLog
index 8f119fa..fceeeb3 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,15 @@
+2016-05-25  Stefan Liebler  <stli@linux.vnet.ibm.com>
+
+	* iconvdata/Makefile ($(inst_gconvdir)/gconv-modules):
+	Install file from $(objpfx)gconv-modules.
+	($(objpfx)gconv-modules): Concatenate architecture specific file
+	in variable sysdeps-gconv-modules and gconv-modules in src dir.
+	* sysdeps/s390/gconv-modules: New file.
+	* sysdeps/s390/s390-64/Makefile: ($(inst_gconvdir)/gconv-modules):
+	Deleted.
+	($(objpfx)gconv-modules-s390): Deleted.
+	(sysdeps-gconv-modules): New variable.
+
 2016-05-24  Joseph Myers  <joseph@codesourcery.com>
 
 	[BZ #15479]
diff --git a/iconvdata/Makefile b/iconvdata/Makefile
index 357530b..f9826b3 100644
--- a/iconvdata/Makefile
+++ b/iconvdata/Makefile
@@ -244,7 +244,7 @@ headers: $(addprefix $(objpfx), $(generated-modules:=.h))
 $(addprefix $(inst_gconvdir)/, $(modules.so)): \
     $(inst_gconvdir)/%: $(objpfx)% $(+force)
 	$(do-install-program)
-$(inst_gconvdir)/gconv-modules: gconv-modules $(+force)
+$(inst_gconvdir)/gconv-modules: $(objpfx)gconv-modules $(+force)
 	$(do-install)
 ifeq (no,$(cross-compiling))
 # Update the $(prefix)/lib/gconv/gconv-modules.cache file. This is necessary
@@ -331,7 +331,5 @@ do-tests-clean common-mostlyclean: tst-tables-clean
 tst-tables-clean:
 	-rm -f $(objpfx)tst-*.table $(objpfx)tst-EUC-TW.irreversible
 
-ifdef objpfx
 $(objpfx)gconv-modules: gconv-modules
-	cp $^ $@
-endif
+	cat $(sysdeps-gconv-modules) $^ > $@
diff --git a/sysdeps/s390/gconv-modules b/sysdeps/s390/gconv-modules
new file mode 100644
index 0000000..7021105
--- /dev/null
+++ b/sysdeps/s390/gconv-modules
@@ -0,0 +1,50 @@
+# GNU libc iconv configuration.
+# Copyright (C) 1997-2016 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+
+# All lines contain the following information:
+
+# If the lines start with `module'
+#  fromset:	either a name triple or a regular expression triple.
+#  toset:	a name triple or an expression with \N to get regular
+#		expression matching results.
+#  filename:	filename of the module implementing the transformation.
+#		If it is not absolute the path is made absolute by prepending
+#		the directory the configuration file is found in.
+#  cost:	optional cost of the transformation.  Default is 1.
+
+# If the lines start with `alias'
+#  alias:	alias name which is not really recognized.
+#  name:	the real name of the character set
+
+# S/390 hardware accelerated modules
+#	from			to			module			cost
+module	ISO-8859-1//		IBM037//		ISO-8859-1_CP037_Z900	1
+module	IBM037//		ISO-8859-1//		ISO-8859-1_CP037_Z900	1
+module	ISO-10646/UTF8/		UTF-32//		UTF8_UTF32_Z9		1
+module	UTF-32BE//		ISO-10646/UTF8/		UTF8_UTF32_Z9		1
+module	ISO-10646/UTF8/		UTF-32BE//		UTF8_UTF32_Z9		1
+module	UTF-16BE//		UTF-32//		UTF16_UTF32_Z9		1
+module	UTF-32BE//		UTF-16//		UTF16_UTF32_Z9		1
+module	INTERNAL		UTF-16//		UTF16_UTF32_Z9		1
+module	UTF-32BE//		UTF-16BE//		UTF16_UTF32_Z9		1
+module	INTERNAL		UTF-16BE//		UTF16_UTF32_Z9		1
+module	UTF-16BE//		UTF-32BE//		UTF16_UTF32_Z9		1
+module	UTF-16BE//		INTERNAL		UTF16_UTF32_Z9		1
+module	UTF-16BE//		ISO-10646/UTF8/		UTF8_UTF16_Z9		1
+module	ISO-10646/UTF8/		UTF-16//		UTF8_UTF16_Z9		1
+module	ISO-10646/UTF8/		UTF-16BE//		UTF8_UTF16_Z9		1
diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
index 5909d1f..ce4aa3b 100644
--- a/sysdeps/s390/s390-64/Makefile
+++ b/sysdeps/s390/s390-64/Makefile
@@ -37,54 +37,5 @@ $(patsubst %, $(inst_gconvdir)/%.so, $(s390x-iconv-modules)) : \
 $(inst_gconvdir)/%.so: $(objpfx)%.so $(+force)
 	$(do-install-program)
 
-$(objpfx)gconv-modules-s390: gconv-modules $(+force)
-	cp $< $@
-	echo >> $@
-	echo "# S/390 hardware accelerated modules" >> $@
-	echo -n "module	ISO-8859-1//		IBM037//	" >> $@
-	echo "	ISO-8859-1_CP037_Z900	1" >> $@
-	echo -n "module	IBM037//		ISO-8859-1//	" >> $@
-	echo "	ISO-8859-1_CP037_Z900	1" >> $@
-	echo -n "module	ISO-10646/UTF8/		UTF-32//	" >> $@
-	echo "	UTF8_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-32BE//		ISO-10646/UTF8/	" >> $@
-	echo "	UTF8_UTF32_Z9		1" >> $@
-	echo -n "module	ISO-10646/UTF8/		UTF-32BE//	" >> $@
-	echo "	UTF8_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-16BE//		UTF-32//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-32BE//		UTF-16//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	INTERNAL		UTF-16//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-32BE//		UTF-16BE//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	INTERNAL		UTF-16BE//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-16BE//		UTF-32BE//	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-16BE//		INTERNAL	" >> $@
-	echo "	UTF16_UTF32_Z9		1" >> $@
-	echo -n "module	UTF-16BE//		ISO-10646/UTF8/	" >> $@
-	echo "	UTF8_UTF16_Z9		1" >> $@
-	echo -n "module	ISO-10646/UTF8/		UTF-16//	" >> $@
-	echo "	UTF8_UTF16_Z9		1" >> $@
-	echo -n "module	ISO-10646/UTF8/		UTF-16BE//	" >> $@
-	echo "	UTF8_UTF16_Z9		1" >> $@
-
-$(inst_gconvdir)/gconv-modules: $(objpfx)gconv-modules-s390 $(+force)
-	$(do-install)
-ifeq (no,$(cross-compiling))
-# Update the $(prefix)/lib/gconv/gconv-modules.cache file. This is necessary
-# if this libc has more gconv modules than the previously installed one.
-	if test -f "$(inst_gconvdir)/gconv-modules.cache"; then \
-	   LC_ALL=C \
-	   $(rtld-prefix) \
-	   $(common-objpfx)iconv/iconvconfig \
-	     $(addprefix --prefix=,$(install_root)); \
-	fi
-else
-	@echo '*@*@*@ You should recreate $(inst_gconvdir)/gconv-modules.cache'
-endif
-
+sysdeps-gconv-modules = ../sysdeps/s390/gconv-modules
 endif

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                                    |  100 ++
 config.h.in                                  |    4 +
 iconv/Makefile                               |    2 +-
 iconv/gconv_simple.c                         |    5 +-
 iconv/tst-iconv6.c                           |  117 +++
 iconvdata/Makefile                           |   10 +-
 iconvdata/bug-iconv12.c                      |  263 ++++++
 iconvdata/utf-16.c                           |   12 +
 iconvdata/utf-32.c                           |    2 +-
 sysdeps/s390/Makefile                        |   31 +
 sysdeps/s390/configure                       |   32 +
 sysdeps/s390/configure.ac                    |   21 +
 sysdeps/s390/gconv-modules                   |   50 +
 sysdeps/s390/iso-8859-1_cp037_z900.c         |  262 ++++++
 sysdeps/s390/multiarch/8bit-generic.c        |  415 +++++++++
 sysdeps/s390/multiarch/Makefile              |   14 +
 sysdeps/s390/multiarch/gconv_simple.c        | 1266 ++++++++++++++++++++++++++
 sysdeps/s390/multiarch/gen-8bit.sh           |    6 +
 sysdeps/s390/multiarch/iconv/skeleton.c      |   21 +
 sysdeps/s390/s390-64/Makefile                |   81 --
 sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c |  237 -----
 sysdeps/s390/s390-64/utf16-utf32-z9.c        |  337 -------
 sysdeps/s390/s390-64/utf8-utf16-z9.c         |  471 ----------
 sysdeps/s390/s390-64/utf8-utf32-z9.c         |  511 -----------
 sysdeps/s390/utf16-utf32-z9.c                |  605 ++++++++++++
 sysdeps/s390/utf8-utf16-z9.c                 |  818 +++++++++++++++++
 sysdeps/s390/utf8-utf32-z9.c                 |  862 ++++++++++++++++++
 27 files changed, 4910 insertions(+), 1645 deletions(-)
 create mode 100644 iconv/tst-iconv6.c
 create mode 100644 iconvdata/bug-iconv12.c
 create mode 100644 sysdeps/s390/Makefile
 create mode 100644 sysdeps/s390/gconv-modules
 create mode 100644 sysdeps/s390/iso-8859-1_cp037_z900.c
 create mode 100644 sysdeps/s390/multiarch/8bit-generic.c
 create mode 100644 sysdeps/s390/multiarch/gconv_simple.c
 create mode 100644 sysdeps/s390/multiarch/gen-8bit.sh
 create mode 100644 sysdeps/s390/multiarch/iconv/skeleton.c
 delete mode 100644 sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c
 delete mode 100644 sysdeps/s390/s390-64/utf16-utf32-z9.c
 delete mode 100644 sysdeps/s390/s390-64/utf8-utf16-z9.c
 delete mode 100644 sysdeps/s390/s390-64/utf8-utf32-z9.c
 create mode 100644 sysdeps/s390/utf16-utf32-z9.c
 create mode 100644 sysdeps/s390/utf8-utf16-z9.c
 create mode 100644 sysdeps/s390/utf8-utf32-z9.c


hooks/post-receive
-- 
GNU C Library master sources
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]