This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC] improve tls access for tolower table and errno
- From: Florian Weimer <fweimer at redhat dot com>
- To: OndÅej BÃlka <neleai at seznam dot cz>, libc-alpha at sourceware dot org
- Cc: "H.J. Lu" <hjl dot tools at gmail dot com>
- Date: Sun, 07 Jun 2015 16:14:02 +0200
- Subject: Re: [RFC] improve tls access for tolower table and errno
- Authentication-results: sourceware.org; auth=none
- References: <20150606124002 dot GA11322 at domone>
On 06/06/2015 02:40 PM, OndÅej BÃlka wrote:
> Hi, as I mentioned before that inline strcasecmp would be problematic as
> it needs to get call to tolower which is suboptimal.
>
> On architectures with tls register you don't need to do call call for tls
> access but start small.
Making the offset part of the ABI (like we do for the stack canary) has
been discussed before:
<https://sourceware.org/ml/libc-alpha/2015-03/msg00132.html>
> A sample implementation would be following, where should I add
> initializer and is there other way to get %fs than assembly?
>
> #include <errno.h>
> #include <stdio.h>
>
> static long __errno_offset;
> __attribute__((constructor))
> void get_offset ()
> {
> char *offset;
> char *location = &errno;
> __asm__ ("mov %%fs:0, %0" : "=r" (offset));
>
> __errno_offset = location - offset;
> }
>
> static __always_inline
> int *
> __ep()
> {
> char *__offset;
> __asm__ ("mov %%fs:0, %0" : "=r" (__offset));
>
> return (int *)(__offset + __errno_offset);
> }
>
> #define errno2 (*__ep())
Constructor functions in header files are a nightmare. C++ has
something similar for <iostream>, and the overhead from that is
substantial. Many projects ban inclusion of <iostream> as a result.
The problem remains that errno is mostly used on error paths and
call __errno_location
movl (%rax), %eax
is much shorter than
movq __errno_offset(%rip), %rax
movq %fs:0, %rdx
movl (%rax, %rdx), %eax
(7 versus 19 bytes). On paths which are supposed to be executed rarely,
this is not desirable. There might be some wins because less spilling
is needed, but this seems rather theoretical because in most cases, the
__errno_location call clobbers registers which have been clobbered by
the preceding function call that failed. Therefore, I don't expect wins
on this front, either.
With the thread locale, performance concerns are different, but the
constructor issue is still valid.
Furthermore, future C++ versions may make caching the addresses of
thread-local variables invalid, so we should wait until the fate of
resumable functions and coroutines is decided, and what shape they take.
--
Florian Weimer / Red Hat Product Security