This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[PATCH 2/2] nptl: Invert the mmap/mprotect logic on allocated stacks (BZ#18988)
- From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- To: libc-alpha at sourceware dot org
- Date: Wed, 1 Feb 2017 16:34:38 -0200
- Subject: [PATCH 2/2] nptl: Invert the mmap/mprotect logic on allocated stacks (BZ#18988)
- Authentication-results: sourceware.org; auth=none
- References: <1485974078-12152-1-git-send-email-adhemerval.zanella@linaro.org>
Current allocate_stack logic for create stacks is to first mmap all
the required memory with the desirable memory and then mprotect the
guard area with PROT_NONE if required. Although it works as expected,
it pessimizes the allocation because it requires the kernel to actually
increase commit charge (it counts against the available physical/swap
memory available for the system).
The only issue is to actually check this change since side-effects are
really Linux specific and to actually account them it would require a
kernel specific tests to parse the system wide information. On the kernel
I checked /proc/self/statm does not show any meaningful difference for
vmm and/or rss before and after thread creation. I could only see
really meaningful information checking on system wide /proc/meminfo
between thread creation: MemFree, MemAvailable, and Committed_AS shows
large difference without the patch. I think trying to use these
kind of information on a testcase is fragile.
The BZ#18988 reports shows that the commit pages are easily seen with
mlockall (MCL_FUTURE) (with lock all pages that become mapped in the
process) however a more straighfoward testcase shows that pthread_create
could be faster using this patch:
--
static const int inner_count = 256;
static const int outer_count = 128;
static
void *thread1(void *arg)
{
return NULL;
}
static
void *sleeper(void *arg)
{
pthread_t ts[inner_count];
for (int i = 0; i < inner_count; i++)
pthread_create (&ts[i], &a, thread1, NULL);
for (int i = 0; i < inner_count; i++)
pthread_join (ts[i], NULL);
return NULL;
}
int main(void)
{
pthread_attr_init(&a);
pthread_attr_setguardsize(&a, 1<<20);
pthread_attr_setstacksize(&a, 1134592);
pthread_t ts[outer_count];
for (int i = 0; i < outer_count; i++)
pthread_create(&ts[i], &a, sleeper, NULL);
for (int i = 0; i < outer_count; i++)
pthread_join(ts[i], NULL);
assert(r == 0);
}
return 0;
}
--
On x86_64 (4.4.0-45-generic, gcc 5.4.0) running the small benchtests
I see:
$ time ./test
real 0m3.647s
user 0m0.080s
sys 0m11.836s
While with the patch I see:
$ time ./test
real 0m0.696s
user 0m0.040s
sys 0m1.152s
So I added a pthread_create benchtest (thread_create) which check
the thread creation latency. As for the simple benchtests, I saw
improvements in thread creation on all architectures I tested the
change.
Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu,
arm-linux-gnueabihf, and powerpc64le-linux-gnu.
[BZ #18988]
* benchtests/thread_create-inputs: New file.
* benchtests/thread_create-source.c: Likewise.
* support/xpthread_attr_setguardsize.c: Likewise.
* support/Makefile (libsupport-routines): Add
xpthread_attr_setguardsize object.
* support/xthread.h: Add xpthread_attr_setguardsize prototype.
* benchtests/Makefile (bench-pthread): Add thread_create.
* nptl/allocatestack.c (allocate_stack): Call mmap with PROT_NONE and
then mprotect the required area.
---
ChangeLog | 11 +++++++
benchtests/Makefile | 2 +-
benchtests/thread_create-inputs | 14 +++++++++
benchtests/thread_create-source.c | 58 ++++++++++++++++++++++++++++++++++++
nptl/allocatestack.c | 35 +++++++++++++++++++++-
support/Makefile | 1 +
support/xpthread_attr_setguardsize.c | 26 ++++++++++++++++
support/xthread.h | 2 ++
8 files changed, 147 insertions(+), 2 deletions(-)
create mode 100644 benchtests/thread_create-inputs
create mode 100644 benchtests/thread_create-source.c
create mode 100644 support/xpthread_attr_setguardsize.c
diff --git a/benchtests/Makefile b/benchtests/Makefile
index 81edf8a..6535373 100644
--- a/benchtests/Makefile
+++ b/benchtests/Makefile
@@ -25,7 +25,7 @@ bench-math := acos acosh asin asinh atan atanh cos cosh exp exp2 log log2 \
modf pow rint sin sincos sinh sqrt tan tanh fmin fmax fminf \
fmaxf
-bench-pthread := pthread_once
+bench-pthread := pthread_once thread_create
bench-string := ffs ffsll
diff --git a/benchtests/thread_create-inputs b/benchtests/thread_create-inputs
new file mode 100644
index 0000000..e3ca03b
--- /dev/null
+++ b/benchtests/thread_create-inputs
@@ -0,0 +1,14 @@
+## args: int:size_t:size_t
+## init: thread_create_init
+## includes: pthread.h
+## include-sources: thread_create-source.c
+
+## name: stack=1024,guard=1
+32, 1024, 1
+## name: stack=1024,guard=2
+32, 1024, 2
+
+## name: stack=2048,guard=1
+32, 2048, 1
+## name: stack=2048,guard=2
+32, 2048, 2
diff --git a/benchtests/thread_create-source.c b/benchtests/thread_create-source.c
new file mode 100644
index 0000000..74e7777
--- /dev/null
+++ b/benchtests/thread_create-source.c
@@ -0,0 +1,58 @@
+/* Measure pthread_create thread creation with different stack
+ and guard sizes.
+
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <support/xthread.h>
+
+static size_t pgsize;
+
+static void
+thread_create_init (void)
+{
+ pgsize = sysconf (_SC_PAGESIZE);
+}
+
+static void *
+thread_dummy (void *arg)
+{
+ return NULL;
+}
+
+static void
+thread_create (int nthreads, size_t stacksize, size_t guardsize)
+{
+ pthread_attr_t attr;
+ xpthread_attr_init (&attr);
+
+ stacksize = stacksize * pgsize;
+ guardsize = guardsize * pgsize;
+
+ xpthread_attr_setstacksize (&attr, stacksize);
+ xpthread_attr_setguardsize (&attr, guardsize);
+
+ pthread_t ts[nthreads];
+
+ for (int i = 0; i < nthreads; i++)
+ ts[i] = xpthread_create (&attr, thread_dummy, NULL);
+
+ for (int i = 0; i < nthreads; i++)
+ xpthread_join (ts[i]);
+}
diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index e52c698..25c5698 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -490,7 +490,13 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
size += pagesize_m1 + 1;
#endif
- mem = mmap (NULL, size, prot,
+ /* If a guard page is required, avoid committing memory by first
+ allocate with PROT_NONE and then reserve with required permission
+ excluding the guard page. */
+ int prot_mmap = PROT_NONE;
+ if (__glibc_unlikely (guardsize == 0))
+ prot_mmap = prot;
+ mem = mmap (NULL, size, prot_mmap,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
if (__glibc_unlikely (mem == MAP_FAILED))
@@ -510,9 +516,36 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
- TLS_PRE_TCB_SIZE);
#endif
+ /* Now mprotect the required region excluding the guard area. */
+ if (__glibc_likely (prot_mmap == PROT_NONE))
+ {
+ char *mprotstart = mem;
+ size_t mprotsize = size;
+#ifdef NEED_SEPARATE_REGISTER_STACK
+ mprotstart += (((size - guardsize) / 2) & ~pagesize_m1)
+ + guardsize;
+ mprotsize -= mprotstart - (char*) mem;
+#elif _STACK_GROWS_DOWN
+ mprotstart += guardsize;
+ mprotsize -= guardsize;
+#elif _STACK_GROWS_UP
+ char *guard = (char *) (((uintptr_t) pd - guardsize)
+ & ~pagesize_m1);
+ mprotsize -= guard - mprotstart;
+#endif
+ if (mprotect (mprotstart, mprotsize, prot) != 0)
+ {
+ munmap (mem, size);
+ return errno;
+ }
+ }
+
/* Remember the stack-related values. */
pd->stackblock = mem;
pd->stackblock_size = size;
+ /* Update guardsize for newly allocated guardsize to avoid
+ an mprotect in guard resize below. */
+ pd->guardsize = guardsize;
/* We allocated the first block thread-specific data array.
This address will not change for the lifetime of this
diff --git a/support/Makefile b/support/Makefile
index 2ace559..c0a443f 100644
--- a/support/Makefile
+++ b/support/Makefile
@@ -68,6 +68,7 @@ libsupport-routines = \
xpthread_attr_init \
xpthread_attr_setdetachstate \
xpthread_attr_setstacksize \
+ xpthread_attr_setguardsize \
xpthread_barrier_destroy \
xpthread_barrier_init \
xpthread_barrier_wait \
diff --git a/support/xpthread_attr_setguardsize.c b/support/xpthread_attr_setguardsize.c
new file mode 100644
index 0000000..35fed5d
--- /dev/null
+++ b/support/xpthread_attr_setguardsize.c
@@ -0,0 +1,26 @@
+/* pthread_attr_setguardsize with error checking.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <support/xthread.h>
+
+void
+xpthread_attr_setguardsize (pthread_attr_t *attr, size_t guardsize)
+{
+ xpthread_check_return ("pthread_attr_setguardize",
+ pthread_attr_setguardsize (attr, guardsize));
+}
diff --git a/support/xthread.h b/support/xthread.h
index 6dd7e70..3552a73 100644
--- a/support/xthread.h
+++ b/support/xthread.h
@@ -67,6 +67,8 @@ void xpthread_attr_setdetachstate (pthread_attr_t *attr,
int detachstate);
void xpthread_attr_setstacksize (pthread_attr_t *attr,
size_t stacksize);
+void xpthread_attr_setguardsize (pthread_attr_t *attr,
+ size_t guardsize);
/* This function returns non-zero if pthread_barrier_wait returned
PTHREAD_BARRIER_SERIAL_THREAD. */
--
2.7.4