[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: ABI support for special memory area



On Thu, Mar 9, 2017 at 7:23 AM, Suprateeka R Hegde
<hegdesmailbox@gmail.com> wrote:
> H.J,
>
> I think we are full 180 degrees out-of-phase in our discussion this time
> somehow :-)
>
> As I have already asked, I want to know what is that ONE-FIXED-FORM of
> __gnu_mbind_setup being called by ld.so.
>
> The code you provided seems to be of Intel's implementation of libmbind. I
> am interested in how it looks like in ld.so. Because that is what we want to
> document in the ABI support. We do not want implementation specific details
> in GNU-gABI.
>
> So inside ld.so, would it be what I showed in my earlier mail or would it be
> something else?
>
> In my opinion, we have to bring that out in the ABI support proposal.
> Without the actual signature/prototype, __gnu_mbind_setup sounds more like a
> guideline and less like a ABI spec/standard. And in actual code (in ld.so),
> it may eventually appear really different for each vendor/implementation.
>
> So, either keep it as a guideline or make it generic. IMHO, we can not keep
> the following (original text) as generic:
>
> ---
>>
>> Run-time support
>>
>> int __gnu_mbind_setup (unsigned int type, void *addr, size_t length);
>
> ---
>
> --
> Supra
>
>
>
> On 07-Mar-2017 04:05 AM, H.J. Lu wrote:
>>
>> On Mon, Mar 6, 2017 at 5:25 AM, Suprateeka R Hegde
>> <hegdesmailbox@gmail.com> wrote:
>>>
>>> On 04-Mar-2017 07:37 AM, Carlos O'Donell wrote:
>>>>
>>>>
>>>> On 03/03/2017 11:00 AM, H.J. Lu wrote:
>>>>>
>>>>>
>>>>> __gnu_mbind_setup is called from ld.so.  Since there is only one ld.so,
>>>>> it needs to know what to pass to __gnu_mbind_setup.  Not all arguments
>>>>> have to be used by all implementations nor all memory types.
>>>>
>>>>
>>>>
>>>> I think what Supra is suggesting is a pointer-to-implementation
>>>> interface
>>>> which would allow ld.so to pass completely different arguments to the
>>>> library depending on what kind of memory is being defined by the sh_info
>>>> value. It avoids needing to encode all the types in the API, and just
>>>> uses an incomplete pointer to the type.
>>>
>>>
>>>
>>> Thats absolutely right.
>>>
>>> However, I am not suggesting one is better over the other. I just want to
>>> get clarity on how the code looks like for different implementations.
>>>
>>> On 03-Mar-2017 09:30 PM, H.J. Lu wrote:
>>>>
>>>>
>>>> __gnu_mbind_setup is called from ld.so.  Since there is only one ld.so,
>>>> it needs to know what to pass to __gnu_mbind_setup.
>>>
>>>
>>>
>>> So I want to know what is that ONE-FIXED-FORM of __gnu_mbind_setup being
>>> called by ld.so.
>>>
>>>>  Not all arguments
>>>> have to be used by all implementations nor all memory types.
>>>
>>>
>>>
>>> I think I am still not getting this. Really sorry for that. Would it be
>>> possible for you to write a small pseudo code that depicts how this
>>> design
>>> looks like for different implementations?
>>>
>>
>> For my usage, I only want to know memory type, address and its size:
>>
>> #define _GNU_SOURCE
>> #include <unistd.h>
>> #include <errno.h>
>> #include <stdint.h>
>> #include <cpuid.h>
>> #include <numa.h>
>> #include <numaif.h>
>> #include <mbind.h>
>>
>> #ifdef LIBMBIND_DEBUG
>> #include <stdio.h>
>> #endif
>>
>> /* High-Bandwidth Memory node mask.  */
>> static struct bitmask *hbw_node_mask;
>>
>> /* Initialize High-Bandwidth Memory node mask.  This must be called before
>>    __gnu_mbind_setup.  */
>> static void
>> __attribute__ ((used, constructor))
>> init_node_mask (void)
>> {
>>   if (__get_cpuid_max (0, 0) == 0)
>>     return;
>>
>>   /* Check if vendor is Intel.  */
>>   uint32_t eax, ebx, ecx, edx;
>>   __cpuid (0, eax, ebx, ecx, edx);
>>   if (!(ebx == 0x756e6547 && ecx == 0x6c65746e && edx == 0x49656e69))
>>     return;
>>
>>   /* Get family and model.  */
>>   uint32_t model;
>>   uint32_t family;
>>   __cpuid (1, eax, ebx, ecx, edx);
>>   family = (eax >> 8) & 0x0f;
>>   if (family != 0x6)
>>     return;
>>   model = (eax >> 4) & 0x0f;
>>   model += (eax >> 12) & 0xf0;
>>
>>   /* Check for KNL and KNM.  */
>>   switch (model)
>>     {
>>     default:
>>       return;
>>
>>     case 0x57: /* Knights Landing.  */
>>     case 0x85: /* Knights Mill.  */
>>       break;
>>     }
>>
>>   /* Check if NUMA configuration is supported.  */
>>   int nodes_num = numa_num_configured_nodes ();
>>   if (nodes_num < 2)
>>     return;
>>
>>   /* Get MCDRAM NUMA nodes.  */
>>   struct bitmask *node_mask = numa_allocate_nodemask ();
>>   struct bitmask *node_cpu = numa_allocate_cpumask ();
>>
>>   int i;
>>   for (i = 0; i < nodes_num; i++)
>>     {
>>       numa_node_to_cpus (i, node_cpu);
>>       /* NUMA node without CPU is MCDRAM node.  */
>>       if (numa_bitmask_weight (node_cpu) == 0)
>> numa_bitmask_setbit (node_mask, i);
>>     }
>>
>>   if (numa_bitmask_weight (node_mask) != 0)
>>     {
>>       /* On Knights Landing and Knights Mill, MCDRAM is High-Bandwidth
>> Memory.  */
>>       hbw_node_mask = node_mask;
>>     }
>>   else
>>     numa_bitmask_free (node_mask);
>>   numa_bitmask_free (node_cpu);
>> }
>>
>> /* Support all different memory types.  */
>>
>> static int
>> mbind_setup (unsigned int type, void *addr, size_t length,
>>     unsigned int mode, unsigned int flags)
>> {
>>   int err = ENXIO;
>>
>>   switch (type)
>>     {
>>     default:
>> #ifdef LIBMBIND_DEBUG
>>       printf ("Unsupported mbind type %d: from %p of size %p\n",
>>      type, addr, length);
>> #endif
>>       return EINVAL;
>>
>>     case GNU_MBIND_HBW:
>>       if (hbw_node_mask)
>> err = mbind (addr, length, mode, hbw_node_mask->maskp,
>>     hbw_node_mask->size, flags);
>>       break;
>>     }
>>
>>   if (err < 0)
>>     err = errno;
>>
>> #ifdef LIBMBIND_DEBUG
>>   printf ("Mbind type %d: from %p of size %p\n", type, addr, length);
>> #endif
>>
>>   return err;
>> }
>>
>> int
>> __gnu_mbind_setup (unsigned int type, void *addr, size_t length)
>> {
>>   return mbind_setup (type, addr, length, MPOL_BIND, MPOL_MF_MOVE);
>> }
>>
>> If other memory types need additional information, they can be
>> passed to __gnu_mbind_setup.  We just need to know what
>> information is needed.
>>
>>
>

Here is my glibc prototype.

-- 
H.J.
diff --git a/csu/init-first.c b/csu/init-first.c
index 099e7bc..c7b8f1f 100644
--- a/csu/init-first.c
+++ b/csu/init-first.c
@@ -75,6 +75,10 @@ _init (int argc, char **argv, char **envp)
   /* First the initialization which normally would be done by the
      dynamic linker.  */
   _dl_non_dynamic_init ();
+
+# ifdef INIT_MBIND
+  INIT_MBIND (argv[0], _dl_phdr, _dl_phnum, 0);
+# endif
 #endif
 
 #ifdef VDSO_SETUP
diff --git a/elf/dl-init.c b/elf/dl-init.c
index 5c5f3de..7bd6af6 100644
--- a/elf/dl-init.c
+++ b/elf/dl-init.c
@@ -35,6 +35,14 @@ call_init (struct link_map *l, int argc, char **argv, char **env)
      dependency.  */
   l->l_init_called = 1;
 
+#ifdef INIT_MBIND
+  if (l->l_phdr)
+    {
+      const char *name = l->l_name[0] == '\0' ? argv[0] : l->l_name;
+      INIT_MBIND (name, l->l_phdr, l->l_phnum, l->l_addr);
+    }
+#endif
+
   /* Check for object which constructors we do not run here.  */
   if (__builtin_expect (l->l_name[0], 'a') == '\0'
       && l->l_type == lt_executable)
diff --git a/elf/dl-support.c b/elf/dl-support.c
index 3c46a7a..aa240c4 100644
--- a/elf/dl-support.c
+++ b/elf/dl-support.c
@@ -385,3 +385,7 @@ _dl_non_dynamic_init (void)
 #ifdef DL_SYSINFO_IMPLEMENTATION
 DL_SYSINFO_IMPLEMENTATION
 #endif
+
+#ifdef INIT_MBIND
+# include <setup-mbind.c>
+#endif
diff --git a/elf/elf.h b/elf/elf.h
index 6d3b356..a743cda 100644
--- a/elf/elf.h
+++ b/elf/elf.h
@@ -728,6 +728,11 @@ typedef struct
 #define PT_LOPROC	0x70000000	/* Start of processor-specific */
 #define PT_HIPROC	0x7fffffff	/* End of processor-specific */
 
+/* GNU mbind segments */
+#define PT_GNU_MBIND_NUM	4096
+#define PT_GNU_MBIND_LO		0x6474e555
+#define PT_GNU_MBIND_HI		(PT_GNU_MBIND_LO + PT_GNU_MBIND_NUM - 1)
+
 /* Legal values for p_flags (segment flags).  */
 
 #define PF_X		(1 << 0)	/* Segment is executable */
diff --git a/sysdeps/unix/sysv/linux/x86/ldsodefs.h b/sysdeps/unix/sysv/linux/x86/ldsodefs.h
new file mode 100644
index 0000000..1b1c1f8
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86/ldsodefs.h
@@ -0,0 +1,26 @@
+/* Run-time dynamic linker data structures for loaded ELF shared objects.  x86
+   Copyright (C) 2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef	_LDSODEFS_H
+
+/* Get the real definitions.  */
+#include_next <ldsodefs.h>
+
+#include <init-mbind.h>
+
+#endif /* ldsodefs.h */
diff --git a/sysdeps/x86/Makefile b/sysdeps/x86/Makefile
index 0d0326c2..7887a3c 100644
--- a/sysdeps/x86/Makefile
+++ b/sysdeps/x86/Makefile
@@ -3,7 +3,7 @@ gen-as-const-headers += cpu-features-offsets.sym
 endif
 
 ifeq ($(subdir),elf)
-sysdep-dl-routines += dl-get-cpu-features
+sysdep-dl-routines += dl-get-cpu-features setup-mbind
 
 tests += tst-get-cpu-features
 tests-static += tst-get-cpu-features-static
diff --git a/sysdeps/x86/Versions b/sysdeps/x86/Versions
index e029237..a627762 100644
--- a/sysdeps/x86/Versions
+++ b/sysdeps/x86/Versions
@@ -1,5 +1,8 @@
 ld {
   GLIBC_PRIVATE {
     __get_cpu_features;
+
+    # Set up special memory.
+    __gnu_mbind_setup;
   }
 }
diff --git a/sysdeps/x86/init-mbind.h b/sysdeps/x86/init-mbind.h
new file mode 100644
index 0000000..b881fdf
--- /dev/null
+++ b/sysdeps/x86/init-mbind.h
@@ -0,0 +1,46 @@
+/* This file is part of the GNU C Library.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <unistd.h>
+#include <libintl.h>
+#include <setup-mbind.h>
+
+static inline void
+init_mbind (const char *filename, const ElfW(Phdr) *phdr, size_t phnum,
+	    ElfW(Addr) load)
+{
+  ElfW(Addr) pagesize = GLRO(dl_pagesize);
+  for (; phnum; phnum--, phdr++)
+    if (phdr->p_type >= PT_GNU_MBIND_LO
+	&& phdr->p_type <= PT_GNU_MBIND_HI)
+      {
+	ElfW(Addr) start = phdr->p_vaddr;
+	if (pagesize > phdr->p_align
+	    || (start & (pagesize - 1)) != 0)
+	  _dl_fatal_printf (N_("%s: invalid PT_GNU_MBIND segment\n"),
+			    filename);
+
+	int error_code = __gnu_mbind_setup (phdr->p_type - PT_GNU_MBIND_LO,
+					    (void *) (load + start),
+					    phdr->p_memsz);
+	if (error_code < 0)
+	  _dl_fatal_printf (N_("__gnu_mbind_setup failed on file %s: error 0x%x\n"),
+			    filename, -error_code);
+      }
+}
+
+#define INIT_MBIND init_mbind
diff --git a/sysdeps/x86/setup-mbind.c b/sysdeps/x86/setup-mbind.c
new file mode 100644
index 0000000..d235b2e
--- /dev/null
+++ b/sysdeps/x86/setup-mbind.c
@@ -0,0 +1,27 @@
+/* This file is part of the GNU C Library.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#include <setup-mbind.h>
+
+int weak_function
+__gnu_mbind_setup (unsigned int type __attribute__ ((unused)),
+		   void *addr __attribute__ ((unused)),
+		   size_t length __attribute__ ((unused)))
+{
+  return 0;
+}
diff --git a/sysdeps/x86/setup-mbind.h b/sysdeps/x86/setup-mbind.h
new file mode 100644
index 0000000..f26972f
--- /dev/null
+++ b/sysdeps/x86/setup-mbind.h
@@ -0,0 +1,21 @@
+/* This file is part of the GNU C Library.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stddef.h>
+
+/* Set up special memory.  */
+extern int weak_function __gnu_mbind_setup (unsigned int, void *, size_t);
diff --git a/sysdeps/x86_64/localplt.data b/sysdeps/x86_64/localplt.data
index 014a9f4..8f4d47c 100644
--- a/sysdeps/x86_64/localplt.data
+++ b/sysdeps/x86_64/localplt.data
@@ -11,6 +11,7 @@ libc.so: realloc + RELA R_X86_64_GLOB_DAT
 libm.so: matherr
 # The main malloc is interposed into the dynamic linker, for
 # allocations after the initial link (when dlopen is used).
+ld.so: __gnu_mbind_setup + RELA R_X86_64_GLOB_DAT
 ld.so: malloc + RELA R_X86_64_GLOB_DAT
 ld.so: calloc + RELA R_X86_64_GLOB_DAT
 ld.so: realloc + RELA R_X86_64_GLOB_DAT