This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: how to handle userspace string copy failures


On Thu, 2006-05-25 at 15:21 -0400, Frank Ch. Eigler wrote: 
> hunt wrote:
> 
> > [...]  So, for the record, we cannot guarantee always being able to
> > always access userspace 
> 
> We need to investigate to what extent this problem can be worked
> around by clever other ways.  For example, can we arrange to
> preemptively fault in more parts of programs when systemtap probes are
> running?

We should do that, but it doesn't solve the short-term problem and it
will not solve the problem for the long-term unless we find a way to
always fulfill userspace copies,

> > and such failures should not terminate the script.
> 
> See the MAXERRORS parameter.

That would be for errors, which I do not consider this. At least it
shouldn't be confused with real errors, like when an array overflows and
no more data can be stored.

> > At worst, I think we should print warnings.  I also propose that any
> > user_string() request that fails should return "<unknown>".
> 
> I am uncomfortable with hard-coding such a decorated english term.  A
> simple blank string would be fine.

blank strings do nothing to indicate that information was missing.

> I would be happier if the decision for treatment as a soft vs. hard
> error were left up to the caller script.  One way to do this would be
> to fork user_string() into two variants, one of which signals the
> current sort of error-level fault (as does kernel_string()), and one
> that just returns a sentinel soft-error value.  Hey, that sentinel
> value could even be passed to it as an additional argument.

What do you mean by "sentinel soft-error value". How would this work?

I think we need some more thought given to error handling. userspace
copy fails? increment error counter. map overflows? increment error
counter. When the counter hits a user-defined threshold, then terminate.
Any script doing userspace copies (which is almost all of them) needs to
set MAXERRORS so it won't stop the first time a userspace copy doesn't
complete, but doing so ignores serious errors. Also MAXERRORS is a
define so changing it means a recompile, which is a problem for
cross-compiled scripts.

Instead maybe we need to classify errors into groups, set default
behavior for the groups and allow the user to set new defaults.

Or we can just provide two variants of each function that might cause a
error. We need to support simple scripts that don't care and shouldn't
have to be bothered to handle expected, but rare, problems like
userspace copy errors. We also need to support more sophisticated
scripts that might need to detect when that happens and handle it.







Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]