This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Remove signal handling for nanosleep (bug 16364)



On 09-11-2015 15:18, Florian Weimer wrote:
> On 11/09/2015 05:30 PM, Adhemerval Zanella wrote:
> 
>> I do not think this check is strictly required: the other failures (EFAULT and
>> EINVAL) should also indicate something very flaky on the system.
> 
> We should always check for unexpected errors.  We might have been
> overlooking something.
> 

Fair enough, I will change that.

>>> The test is racy in two ways: the child could exit before nanosleep in
>>> the parent starts, or the child exit could be delayed after nanosleep in
>>> the parent ends.  I'm not sure if there is way to make this more reliable.
>>
>> I am aware of that, that's why I try to mitigate this by making the time on
>> parent twice for the child and some magnitude higher than the syscalls itself.
>> However I also can't see a way to make this test entirely reliable: even by
>> doing some synchronization (either by shared semaphores or pthread barrier)
>> and/or sending the SIGCHLD directly using signal the two race scenarios you 
>> describe will still have a small window to occur.  I am not sure which 
>> strategy will be better and I think we should not rely or add hacks to try
>> to mitigate for such kernel failures.
> 
> I think you could make it more likely to hit the window if you forked
> several child processes.

Or if the system load is high enough.  My question is if we should push or
not this test even with this racy condition and if so which strategy would
be better to mitigate it.

> 
>>> The larger question is whether the EINTR check is sufficient, or if a
>>> time-based check is needed as well.  That is, if the kernel bug
>>> consistent of silent early termination of nanosleep.
>>
>> My understanding is on old kernels the nanosleep calls was not restarted
>> in a nanosleep call (with the restart_syscall), so it nanosleep will
>> early terminate.
> 
> To be sure, you should check how much time has elapsed in the nanosleep
> calls, with clock_gettime(CLOCK_MONOTONIC).  This would cover both the
> -1/EINTR case and the return value 0 case.  The existing code does not
> look at EINTR, so if there was a bug, it was the return value 0 scenario.

For nanosleep check itself, glibc already have posix/tst-nanosleep.c 
(which uses gettimeofday.c).

However my main questioning and the reason of this bikeshedding regarding
is if GLIBC should continue to provide a hack for non-compliant kernels
and if we should really check for this issue with a possible racy testcase.

> 
> Florian
> 
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]