This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: how to handle userspace string copy failures


Hi -

hunt wrote:

> [...]
> > We need to investigate to what extent this problem can be worked
> > around by clever other ways.  For example, can we arrange to
> > preemptively fault in more parts of programs when systemtap probes are
> > running?
> 
> We should do that, but it doesn't solve the short-term problem and it
> will not solve the problem for the long-term unless we find a way to
> always fulfill userspace copies,

"The problem" is not that some userspace pages will be inaccessible.
It is that so many interesting ones seem to be suddenly inaccessible
right now.


> > > and such failures should not terminate the script.
> > 
> > See the MAXERRORS parameter.
> 
> That would be for errors, which I do not consider this. At least it
> shouldn't be confused with real errors, like when an array overflows and
> no more data can be stored.

We do not have many kinds of "unusual condition" indications.  We have
soft errors, which are quiet and don't interrupt control flow, and
hard errors, which are noisy and do interrupt control flow.  I believe
there is no third category at the moment.

One could argue that array overflow could be turned into a soft error,
analogously to a string value that is too long and is quietly
truncated.  Whether it should be one or the other is a matter of
judgement: how much each kind of failure matters.


> > > At worst, I think we should print warnings.  I also propose that any
> > > user_string() request that fails should return "<unknown>".
> > 
> > I am uncomfortable with hard-coding such a decorated english term.  A
> > simple blank string would be fine.
> 
> blank strings do nothing to indicate that information was missing.

Nor does "<unknown>", except to an english-speaking user looking over
a literally transcribed output after the fact.  That's the point.


> [...]  What do you mean by "sentinel soft-error value". 

The "sentinel value" term is from baby computer science - a special
value intermingled into a data stream to identify an unusual
condition, like "9999" to end a list of input numbers.  A "soft-error
value" is a value that results from a soft error.  (*some* legal value
must result, since control flow is uninterrupted, and thus a value
must be propagated into the expressions.)

> How would this work? [...]

We retain user_string(addr) just as it is now: a hard error if it
faults for whatever reason.  We add a new function
user_string_mayfail(addr,str), which quietly and softly returns the
str argument if the access faulted.  Your syscall tapset routines
would presumably use the second variant. (For bonus points, we support
overloading user_string() with one vs. two parameters.)


- FChE


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]