This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Remove signal handling for nanosleep (bug 16364)

From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
To: Florian Weimer <fweimer at redhat dot com>
Cc: libc-alpha at sourceware dot org
Date: Mon, 9 Nov 2015 15:43:17 -0200
Subject: Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
Authentication-results: sourceware.org; auth=none
References: <1447023171-31542-1-git-send-email-adhemerval dot zanella at linaro dot com> <56405109 dot 9070404 at redhat dot com> <56408F14 dot 1040600 at linaro dot org> <56409561 dot 7050707 at redhat dot com> <5640A89D dot 80804 at linaro dot org> <5640AD98 dot 5030105 at redhat dot com> <5640CA33 dot 1040608 at linaro dot org> <5640D57C dot 8050300 at redhat dot com>


On 09-11-2015 15:18, Florian Weimer wrote:
> On 11/09/2015 05:30 PM, Adhemerval Zanella wrote:
> 
>> I do not think this check is strictly required: the other failures (EFAULT and
>> EINVAL) should also indicate something very flaky on the system.
> 
> We should always check for unexpected errors.  We might have been
> overlooking something.
> 

Fair enough, I will change that.

>>> The test is racy in two ways: the child could exit before nanosleep in
>>> the parent starts, or the child exit could be delayed after nanosleep in
>>> the parent ends.  I'm not sure if there is way to make this more reliable.
>>
>> I am aware of that, that's why I try to mitigate this by making the time on
>> parent twice for the child and some magnitude higher than the syscalls itself.
>> However I also can't see a way to make this test entirely reliable: even by
>> doing some synchronization (either by shared semaphores or pthread barrier)
>> and/or sending the SIGCHLD directly using signal the two race scenarios you 
>> describe will still have a small window to occur.  I am not sure which 
>> strategy will be better and I think we should not rely or add hacks to try
>> to mitigate for such kernel failures.
> 
> I think you could make it more likely to hit the window if you forked
> several child processes.

Or if the system load is high enough.  My question is if we should push or
not this test even with this racy condition and if so which strategy would
be better to mitigate it.

> 
>>> The larger question is whether the EINTR check is sufficient, or if a
>>> time-based check is needed as well.  That is, if the kernel bug
>>> consistent of silent early termination of nanosleep.
>>
>> My understanding is on old kernels the nanosleep calls was not restarted
>> in a nanosleep call (with the restart_syscall), so it nanosleep will
>> early terminate.
> 
> To be sure, you should check how much time has elapsed in the nanosleep
> calls, with clock_gettime(CLOCK_MONOTONIC).  This would cover both the
> -1/EINTR case and the return value 0 case.  The existing code does not
> look at EINTR, so if there was a bug, it was the return value 0 scenario.

For nanosleep check itself, glibc already have posix/tst-nanosleep.c 
(which uses gettimeofday.c).

However my main questioning and the reason of this bikeshedding regarding
is if GLIBC should continue to provide a hack for non-compliant kernels
and if we should really check for this issue with a possible racy testcase.

> 
> Florian
> 
>

References:
- [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Adhemerval Zanella
- Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Florian Weimer
- Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Adhemerval Zanella
- Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Florian Weimer
- Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Adhemerval Zanella
- Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Florian Weimer
- Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Adhemerval Zanella
- Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Florian Weimer

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]