This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: What's the best solution for getting me a cancellable FUTEX_WAIT?


On Wed, Aug 06, 2014 at 05:09:24AM +0000, Steven Stewart-Gallus wrote:
> > Yes. It's a bad idea to be using syscalls directly in general. Often
> > they have arch-specific quirks which applications can't be expected to
> > get right for all archs (e.g. differing argument orders).
> 
> I don't see what badly designed kernel APIs have to do with this topic
> though.

They make problematic for applications to make syscalls directly,
since the application is then unlikely to be portable to all archs.

> > For some syscalls, libc needs to be aware of them happening since
> > they might affect state it has cached, or needs to modify the call
> > in subtle ways (e.g. adding flags, translating userspace/kernel
> > struct definitions, etc.), and often this is non-obvious to the
> > caller.
> 
> Sure for SOME system calls it makes sense for only GLibc to use
> them. But a whole lot of system calls are useful without being wrapped
> by GLibc.

Are you volunteering to document that list? :)

> > they might affect state it has cached
> 
> I consider most of these cases as bugs in GLibc. They cause race
> conditions when the cache cannot be atomically updated when the
> resource changes (for example, fork, a signal and getpid). They cause
> pain when users go behind libc's back (invoking clone directly to use
> flags like CLONE_NEWUSER).

Using clone() "directly" with CLONE_NEWUSER should work as long as you
don't use CLONE_VM. Updating of cached pid takes place automatically
in glibc. Making the syscall yourself is a very bad idea because clone
has one of the most notoriously arch-specific call signatures.

> > adding flags
> 
> While occasionally there are flags like O_LARGEFILE that are added
> transparently most of the time there aren't and the flags usually only
> enhance functionality and do not prevent the functionality from
> working in a reduced capacity.

Not the case. A good example of this and other issues below is
SYS_rt_sigaction. Not only does the struct vary by arch; it also
differs between userspace and kernelspace, and on some archs, you MUST
add SA_RESTORER and set pointers to functions written in asm that make
a SYS_sigreturn or SYS_rt_sigreturn syscall after adjusting the stack
pointer appropriately. This is NOT the kind of code that belongs in
individual applications.

Actually sigaction is also potentially an example of the above
(caching), since there's a proposal for glibc to wrap signal handlers
to save/restore errno and possibly other state. Going around libc and
calling SYS_rt_sigaction yourself would of course break this. And
perhaps more importantly, even now, if you happened to call
SYS_rt_sigaction with one of the signal numbers reserved for internal
use by the threads implementation, you could badly break multithreaded
code in the same process.

Basically, I think there's a huge volume of knowledge that's needed
for making syscalls directly from applications to be "safe", at
present it's undocumented, and documenting it tends to lock in current
libc implementation choices that may turn out to be bad in the future
(by officially "allowing" certain direct syscall usage that would
admit a better libc implementation if it were disallowed).

> > translating userspace/kernel struct definitions
> 
> I'll give you this case but it still leaves a lot of functionality
> usable via syscall.
> 
> > often this is non-obvious to the caller.
> 
> Then please file a bug with the kernel man pages or dedicate some time
> to improving them yourself.

I think the burden to document relies on whoever wants this to be a
documented public interface, not those who say it shouldn't be.

> > Also the whole syscall() API is incompatible with x32 and other
> > archs/ABIs where syscall arguments are not "long", and it encodes an
> > ugly kernel implementation detail (use of "long" for everything)
> > into application code.
> 
> syscall should really use intptr_t for it's arguments and
> results. That would be backward compatible with existing platforms and
> should solve the issue for x32.

On ILP32 and LP64 models (which are always used on Linux), intptr_t
has the same size/range/representation as long, so that does not solve
the problem.

> > For these reasons, [...]
> 
> I have no comments on this part
> 
> > I would prefer that the public futex function take "void *volatile"
> > rather than "int *" so that it can be used with _Atomic int objects
> > and volatile ints, too.
> 
> I sort of agree because it doesn't really need a signed integer and
> works with unsigned integers too but does the system call have
> alignment requirements?

Yes, the system call does have alignment requirements. However that
could be an additional documented part of the API rather than
(non-)enforced via the type system. I hadn't even thought of unsigned,
but of course that's a really good point too; in practice, the data
usually is unsigned (e.g. using 0x80000000 as a flag rather than a
sign bit) even when the type is nominally signed. In principle you
could also use futex with pointers on ILP32 targets, but of course
this would be bad since the code would not be portable to 64-bit...

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]