This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Using systemtap for rewriting syscalls?

From: Josh Stone <jistone at redhat dot com>
To: Riccardo Murri <riccardo dot murri at uzh dot ch>, systemtap at sourceware dot org
Date: Tue, 5 Apr 2016 16:55:54 -0700
Subject: Re: Using systemtap for rewriting syscalls?
Authentication-results: sourceware.org; auth=none
References: <loom dot 20160405T230430-700 at post dot gmane dot org>

On 04/05/2016 02:14 PM, Riccardo Murri wrote:
> Hello,
> 
> I'm completely new to systemtap, so please pardon me if this question
> is trivial or is already answered in the docs -- there's so much to read!
> 
> On one of the systems I manage, a directory has become unreadable but
> I cannot take the system down for repairs for some time.  However,
> directories below it are fine, and the unreadable directory is only an
> issue with programs that recursively perform `lstat()` on every path
> component.
> 
> Would it be possible to use systemtap to live-patch the system to
> return a fixed value for `lstat()` if the path argument is the path to
> the unreadable dir?  In pseudo-code:
> 
>     probe kernel.syscall.lstat {
>       if (argument_path == "/path/to/bad/dir") {
>         /* return fake statbuf */
>         memcpy({.st_dev=..., .st_ino=..., ...}, argument_buf);
>       } else {
>         /* forward call to kernel */
>         do_real_lstat(argument_buf, argument_path);
>     }
> 
> If that makes sense, where can I start looking for examples to adapt
> and/or relevant documentation?
> 
> Thank you very much for any hint!

In general, stap can't completely replace a function.  However, with
guru mode "-g" you can often change parameters to cause a function to
take a shorter error path.  This requires looking at the function in
question to know how it works, of course.

In this case, syscalls get a little hairy in how arguments are
represented thanks to SYSCALL_DEFINE wrappers, which puts a layer of
inlined argument casting from long register values.  Plus the output of
lstat has to poke user memory, which is possible but not so easy.

So I'd suggest probing this one a little bit down the call chain, in
vfs_fstatat.  On entry, check your conditions and fill in the mock
values, then trigger a quick error.  Then catch the return to correct
the error back to a mock success.  Something like this:

global mocked;
probe kernel.function("vfs_fstatat").call {
  if (user_string($filename) == "/path/to/bad/dir") {
    mocked[tid()] = 1;  // remember for the .return
    $stat->dev = 123;
    $stat->ino = 456;
    // ...
    $flag = -1;  // bad flag bits will trigger EINVAL
  }
}
probe kernel.function("vfs_fstatat").return {
  if (tid() in mocked) {
    delete mocked[tid()];
    $return = 0; // mock success!
  }
}

PS- we're also assuming here that stats on a fully-specified filename
"/path/to/bad/dir" are the only thing you need to squash.  There are
lots of indirect ways a given path might be reached, so I hope this is
really enough for you...

Follow-Ups:
- Re: Using systemtap for rewriting syscalls?
  - From: Riccardo Murri

References:
- Using systemtap for rewriting syscalls?
  - From: Riccardo Murri

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]