This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v3] getrandom system call wrapper [BZ #17252]


On Wed, 2016-10-12 at 17:58 +0200, Florian Weimer wrote:
> On 09/12/2016 09:25 AM, Florian Weimer wrote:
> > On 09/09/2016 05:23 PM, Torvald Riegel wrote:
> >> On Fri, 2016-09-09 at 16:28 +0200, Florian Weimer wrote:
> >>> On 09/09/2016 04:21 PM, Torvald Riegel wrote:
> >>>> On Thu, 2016-09-08 at 13:44 +0200, Florian Weimer wrote:
> >>>>> I have made the system call wrapper a cancellation point.  (If we
> >>>>> implement the simpler getentropy interface, it would not be a
> >>>>> cancellation point.)
> >>>>
> >>>> Why did you do that?
> >>>
> >>> I have to, because it can block indefinitely.
> >>
> >> That doesn't mean you have to make the default function a cancellation
> >> point.  There are many POSIX functions which can block indefinitely and
> >> which are not required to be cancellation points (eg, rwlocks only *may*
> >> be cancellation points).
> >>
> >> Can the system call really block indefinitely, or only for a long time
> >> and (ie, will return eventually)?
> >
> > Yes, if the system enters a deadlock condition where the waiting for
> > randomness prevents it from accumulating additional randomness.
> 
> This is what happens here:
> 
>    <https://bugzilla.redhat.com/show_bug.cgi?id=1383060>
> 
> systemd will eventually kill the blocked process and the boot continues, 
> but all network services will be missing.

I don't see how cancellation would be the best solution for this
problem.  It could be considered *a* solution, but for it to work, the
program (or python in this case) needs to
(1) be aware of the blocking behavior, and potential absence of
getrandom initialization
(2) have a fallback plan for when getrandom() fails
(3) run more than a single thread just to run the cancellation
(4) synchronize between the thread that executes getrandom() and the
cancelling thread so that pthread_cancel can cancel only the respective
getrandom() call; for example, the getrandom thread could set/unset a
flag in a critical section, and the cancelling thread needs to check the
flag and cancel in a critical section protected by the same lock.

All solutions have to be aware of (1).

Maybe doing the fallback plan from (2) is the right solution.  This is
what they seem to have done in the BZ you cited.

Having a getrandom call that can time out would be easier for programs
to use than cancellation.  It would still require picking the right
timeout (ie, less than systemd's), but that's somethign the
cancellation-based solution would have to do too.

Cancellation would be more a more flexible means than a timeout if
systemd would announce to the program that it will get killed in 5s or
something like that, and then the program could try to hurry up and
cancel the getrandom call.  But that seems like a complex and rather
rare use case, and thus wouldn't suggest to me that a cancellable
getrandom should be the default.
Instead, using a provided getrandom_cancellable wrapper seems like the
right thing, and trivial compared to all the other complexity the
program has to deal with.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]