This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Linux: Implement interfaces for memory protection keys


On 11/06/2017 05:49 PM, Adhemerval Zanella wrote:

What's the general feeling about these interfaces?  Shall we add them, despite their limitations (especially with regard to key reuse)?

Then I'll write documentation.

This is currently only supported on x86 and I am not sure if it will be
ever used by other architectures, but Linux thinks its interface is
sufficient generic since it decided to extend it be part of generic
ABI.  I also do not foresee this interface to be deprecated by any other
soon, so I think it should be a valuable adition.

Do you know any project that is already using it?

I'm not aware of any users. To be honest, I just needed a distraction over the weekend.

On 04/11/2017 15:49, Florian Weimer wrote:
This adds system call wrappers for pkey_alloc, pkey_free, pkey_mprotect,
and x86-64 implementations of pkey_get and pkey_set, which abstract over
the PKRU CPU register and hide the actual number of memory protection
keys supported by the CPU.

The system call wrapers use unsigned int instead of unsigned long for
parameters, so that no special treatment for x32 is needed.  The flags
argument is currently unused, and the access rights bit mask is limited
to two bits by the current PKRU register layout anyway.

AFAIK there is no need to special handling of unsigned long, which is
32 bits on x32.  There were issues of using {INLINE,INTERNAL}_SYSCALL
along with 64 bits arguments (off_t for instance) and it should be fixed
by 78ca091cdd2c.  The only special handling is required for syscalls that
return 64 bits arguments, for instance 'times', which can't be handled
by {INLINE,INTERNAL}_SYSCALLS (x32 times.c get around by redefining
internal_syscall1 to return a 64 bits value).

Oh, I was worried that these arguments were expected to be true 64-bit values eventually. Not today, but perhaps in the future. Then using unsigned long would be no good on x32. But this does not seem to be the case, so we can just use unsigned int, to make clear that these arguments are actually 32 bits only.

So I think there is no reason to use a different symbol signature than
kernel.

I still think using unsigned long should be avoided.

+/* Compare the two numbers LEFT and RIGHT and report failure if they
+   are different.  */
+#define TEST_COMPARE(left, right)                                       \
+  ({                                                                    \
+    __typeof__ (left) __left_value = (left);                            \
+    __typeof__ (right) __right_value = (right);                         \
+    _Static_assert (sizeof (__left_value) <= sizeof (long long),        \
+                    "left value fits into long long");                  \
+    _Static_assert (sizeof (__right_value) <= sizeof (long long),       \
+                    "right value fits into long long");                 \
+    if (__left_value != __right_value                                   \
+        || ((__left_value > 0) != (__right_value > 0)))                 \
+      support_test_compare_failure                                      \
+        (__FILE__, __LINE__,                                            \
+         #left, __left_value, __left_value > 0,                         \
+         #right, __right_value, __right_value > 0);                     \
+  })
+

I think the macro name should be more explicit since it is not only
comparing two number but also their size.

Huh? No, it checks that conversion to long long plus the sign bit is not lossy and that the signs match. So I think the name is accurate.


+/* Internal implementation of TEST_COMPARE.  LEFT_POSITIVE and
+   RIGHT_POSITIVE are used to fit both unsigned long long and long
+   long arguments into LEFT_VALUE and RIGHT_VALUE.  */
+void support_test_compare_failure (const char *file, int line,
+                                   const char *left_expr,
+                                   long long left_value,
+                                   int left_positive,
+                                   const char *right_expr,
+                                   long long right_value,
+                                   int right_positive);
+
  /* Internal function called by the test driver.  */
  int support_report_failure (int status)
    __attribute__ ((weak, warn_unused_result));
diff --git a/support/support_test_compare_failure.c b/support/support_test_compare_failure.c
new file mode 100644
index 0000000000..38fec1ca89
--- /dev/null
+++ b/support/support_test_compare_failure.c
@@ -0,0 +1,46 @@
+/* Reporting mumeric comparison failure.

Typo.

Thanks, fixed.

diff --git a/sysdeps/unix/sysv/linux/bits/mman-linux.h b/sysdeps/unix/sysv/linux/bits/mman-linux.h
index b091181960..da5ec79334 100644
--- a/sysdeps/unix/sysv/linux/bits/mman-linux.h
+++ b/sysdeps/unix/sysv/linux/bits/mman-linux.h
@@ -109,3 +109,38 @@
  # define MCL_ONFAULT	4		/* Lock all pages that are
  					   faulted in.  */
  #endif
+
+/* Memory protection key support.  */
+#ifdef __USE_GNU
+
+/* FLags for pkey_alloc.  */
+# define PKEY_DISABLE_ACCESS 0x1
+# define PKEY_DISABLE_WRITE 0x2

Typo on 'FLags'.

Fixed.

diff --git a/sysdeps/unix/sysv/linux/syscalls.list b/sysdeps/unix/sysv/linux/syscalls.list
index 40c4fbb9ea..6f657eea2e 100644
--- a/sysdeps/unix/sysv/linux/syscalls.list
+++ b/sysdeps/unix/sysv/linux/syscalls.list
@@ -110,3 +110,6 @@ setns		EXTRA	setns		i:ii	setns
  process_vm_readv EXTRA	process_vm_readv i:ipipii process_vm_readv
  process_vm_writev EXTRA	process_vm_writev i:ipipii process_vm_writev
  memfd_create    EXTRA	memfd_create	i:si    memfd_create
+pkey_alloc	EXTRA	pkey_alloc	i:ii	pkey_alloc
+pkey_free	EXTRA	pkey_free	i:i	pkey_free
+pkey_mprotect	EXTRA	pkey_mprotect	i:aiii  pkey_mprotect

I do not think we can them as default since build against kernel headers
older than v4.9 won't have __NR_pkey_{alloc,free,mprotect).  We need
default implementation with #ifdef __NR_....

Hmm. I checked that we'd fall back to an ENOSYS stub, but this was for memfd_create, which had the stub in the generic API. That won't apply here, so I will have to rework this a bit.

I will sent an updated patch.

+/* Return the value of the PKRU register.  */
+static inline unsigned int
+pkey_read (void)
+{
+  unsigned int result;
+  __asm__ volatile (".byte 0x0f, 0x01, 0xee"
+                    : "=a" (result) : "c" (0) : "rdx");
+  return result;
+}

I think we should also constrain 'edx' since it is explict cleared on
'rdpkru' instruction:

   static inline unsigned int
   pkey_read (void)
   {
     unsigned int result, edx;
     __asm__ volatile (".byte 0x0f, 0x01, 0xee"
                       : "=a" (result), "=d" (edx)
		      : "c" (0));
     return result;
   }

I think the clobber for rdx takes care of that.

I think the kernel example to implement pkey_get is more parametrized and
may lead to a more portable code if the idea is to extend this syscalls
to other architectures than x86.  It adds a arch-specific PKRU_BITS_PER_PKEY
and sets the mask based on current supported flags (which I think won't
be extended at least for x86 since there is no more available unused bits):

Sorry, I disagree with that. Premature generalization usually does not work out because you don't know which parts to generalize.

If we get key revocation on pkey_alloc, pkey_set would have to turn into a vsyscall anyway (where the code sequence is known to the kernel, and the revocation code can override the PKRU value the pkey_set implementation has read, thus closing the race).

+int
+pkey_set (int key, unsigned int rights)
+{
+  if (key < 0 || key > 15 || rights > 3)
+    {
+      __set_errno (EINVAL);
+      return -1;
+    }
+  unsigned int mask = 3 << (2 * key);
+  unsigned int pkru = pkey_read ();
+  pkru = (pkru & ~mask) | (rights << (2 * key));
+  pkey_write (pkru);
+  return 0;
+}

According to this LWN article [2] and man pages [3], the pkey 0 is set
aside for the default one and this key will never be allocated with
pkey_alloc(), and its restrictions cannot be changed.  Assuming this
is still true (since the article is based on non upstream code), what
happen if try issue a pkey_set with a key 0? Does the kernel silent
ignore it or trap with an invalid instruction?

I think userspace can write whatever it wants to the PKRU register. The register update is always valid. It's just that some values will not be very useful.

I added

  TEST_COMPARE (pkey_set (0, PKEY_DISABLE_WRITE), 0);

to the test case, and it crashes here:

   0x00000000004023c3 <+2083>:  xor    %edi,%edi
   0x00000000004023c5 <+2085>:  mov    $0x2,%esi
   0x00000000004023ca <+2090>:  callq  0x4013d0 <pkey_set@plt>
   0x00000000004023cf <+2095>:  cmp    $0x0,%eax
   0x00000000004023d2 <+2098>:  jne    0x4026d0 <do_test+2864>
   0x00000000004023d8 <+2104>:  mov    $0x6062e0,%ebx
   0x00000000004023dd <+2109>:  mov    (%rbx),%rdi
   0x00000000004023e0 <+2112>:  mov    %r12,%rsi
   0x00000000004023e3 <+2115>:  add    $0x8,%rbx
=> 0x00000000004023e7 <+2119>:  callq  0x403730 <xmunmap>

This is the first memory write after the PKRU update. It is exactly what I expected because the stack has key 0, and we revoked write access. With PKEY_DISABLE_ACCESS, things get a bit weird because the coredump writer can't read the memory. But with single-stepping, I was able to confirm that the SIGSEGV happens on the retq instruction in pkey_set, which is again as expected.

So I don't think we need to document key 0 as special in any way. It is just a key that will never be returned from pkey_alloc, so unexpected things may happen. I think key 1 is currently special as well because it is PKEY_DISABLE_ACCESS, and execute-only mappings will use that.

Thanks,
Florian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]