This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Remove signal handling for nanosleep (bug 16364)

From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
To: Florian Weimer <fweimer at redhat dot com>
Cc: libc-alpha at sourceware dot org
Date: Mon, 9 Nov 2015 14:30:43 -0200
Subject: Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
Authentication-results: sourceware.org; auth=none
References: <1447023171-31542-1-git-send-email-adhemerval dot zanella at linaro dot com> <56405109 dot 9070404 at redhat dot com> <56408F14 dot 1040600 at linaro dot org> <56409561 dot 7050707 at redhat dot com> <5640A89D dot 80804 at linaro dot org> <5640AD98 dot 5030105 at redhat dot com>


On 09-11-2015 12:28, Florian Weimer wrote:
> On 11/09/2015 03:07 PM, Adhemerval Zanella wrote:
> 
>>> Could you write a glibc test case for this?  I don't feel comfortable
>>> with removing the workaround without a test, and I can't find an
>>> existing one.
>>
>> What about:
>>
>> 	[BZ #16364]
>> 	* sysdeps/unix/sysv/linux/sleep.c (__sleep): Simplify code to use
>> 	nanosleep without requiring to handle spurious wakeups.
>> 	* posix/tst-nanosleep-signal.c: New file.
>> 	* posix/Makefile (test): Add tst-nanosleep-signal.
> 
> Thanks, but see below for details.
> 
>> +  int ret;
>> +  pid_t f = fork ();
>> +  if (f == 0)
>> +    {
>> +      // child
>> +      struct timespec tv = { .tv_sec = 0, .tv_nsec = 250000000UL };
>> +      nanosleep (&tv, &tv);
>> +      exit (0);
> 
> 
> This needs to be “_exit (0);”, to avoid running termination handlers twice.
> 

I will change it.

>> +    }
>> +  else
>> +    {
>> +      // parent
>> +      struct timespec tv = { .tv_sec = 0, .tv_nsec = 500000000UL };
>> +      ret = nanosleep (&tv, &tv);
>> +    }
>> +  return ret;
>> +}
> 
> The parent needs to check for fork errors.
> 

Right, I will change it.

>> +
>> +void
>> +check_nanosleep_sigdfl(void)
> 
> Space before paren.
> 
> 
>> +{
>> +  /* For SIG_DFL nanosleep should not be interrupted by SIGCHLD.  */
>> +  signal(SIGCHLD, SIG_DFL);
> 
> Space before paren.
> 
>> +  if (check_nanosleep_base ())
>> +    {
>> +      puts ("error: nanosleep was interrupted with SIG_DFL");
>> +      errors = 1;
>> +    }
>> +}
> 
> This should check that errno is EINTR.

I do not think this check is strictly required: the other failures (EFAULT and
EINVAL) should also indicate something very flaky on the system.

> 
>> +
>> +void
>> +check_nanosleep_dummy(void)
> 
> Space before paren.
> 
>> +{
>> +  /* With a dummy handler nanosleep should be interrupted.  */
>> +  signal(SIGCHLD, dummy);
> 
> Space before paren.
> 
>> +  int ret = check_nanosleep_base ();
>> +  if (ret == 0)
>> +    {
>> +      puts ("error: nanosleep was not interrupted with dummy handler");
>> +      errors = 1;
>> +    }
>> +}
> 
> Missing EINTR check.

Also the idea here is to indicate any failure for nanosleep (same as before).

> 
>> +
>> +void
>> +check_nanosleep_sigign(void)
> 
> Space before paren.
> 
>> +{
>> +  /* For SIG_IGN nanosleep should not be interrupted by SIGCHLD.  */
>> +  signal(SIGCHLD, SIG_IGN);
> 
> Space before paren.
> 
> The test is racy in two ways: the child could exit before nanosleep in
> the parent starts, or the child exit could be delayed after nanosleep in
> the parent ends.  I'm not sure if there is way to make this more reliable.

I am aware of that, that's why I try to mitigate this by making the time on
parent twice for the child and some magnitude higher than the syscalls itself.
However I also can't see a way to make this test entirely reliable: even by
doing some synchronization (either by shared semaphores or pthread barrier)
and/or sending the SIGCHLD directly using signal the two race scenarios you 
describe will still have a small window to occur.  I am not sure which 
strategy will be better and I think we should not rely or add hacks to try
to mitigate for such kernel failures.

> 
> The larger question is whether the EINTR check is sufficient, or if a
> time-based check is needed as well.  That is, if the kernel bug
> consistent of silent early termination of nanosleep.

My understanding is on old kernels the nanosleep calls was not restarted
in a nanosleep call (with the restart_syscall), so it nanosleep will
early terminate.

> 
> Florian
>

Follow-Ups:
- Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Florian Weimer

References:
- [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Adhemerval Zanella
- Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Florian Weimer
- Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Adhemerval Zanella
- Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Florian Weimer
- Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Adhemerval Zanella
- Re: [PATCH] Remove signal handling for nanosleep (bug 16364)
  - From: Florian Weimer

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]