This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

converting audit subsystem to markers for systemtap access


Hi -

With kernel markers now in the Linus tree, we would like to
investigate using them more broadly.  They could replace and
generalize existing special-purpose hooks, with two benefits: they can
reduce average overhead for their current users (for mostly-dormant
instrumentation); and they can expose the events to new consumers such
as systemtap.  This can make systemtap probes on such events faster,
more robust, and more maintainable.  It may not be quite a win-win
from the point of view of the current instrumentation maintainers, but
should be one to a user who gains more robust visibility into the
kernel.

Here is one way marker conversion could be done for the audit
subsystem.  It aims to retain current audit behavior, and just
interject hook for others to use too.  This is based on a brief source
code scan, so it's only a rough outline.  Among the main data for
audit are the system call entry end exit events (audit_syscall_entry
and _exit).  These are called near the low-level ptrace-related code
dealing with syscall dispatching, and look e.g. like this for x86.

arch/x86/kernel/ptrace_32.c:
__attribute__((regparm(3)))
int do_syscall_trace(struct pt_regs *regs, int entryexit)
{
[...] 
        if (unlikely(current->audit_context) && !entryexit)
                audit_syscall_entry(AUDIT_ARCH_I386, regs->orig_eax,
                                    regs->ebx, regs->ecx, regs->edx, regs->esi);
[...]

Note that auditing is conditional on a context per-task struct created
at fork time (audit_alloc), which is done only if an auditing daemon
is attached to the kernel via netlink.  One could convert this call to
markers in at least two ways:

(a) within the conditional

        if (unlikely(current->audit_context) && !entryexit)
           trace_mark (audit_syscall_entry, "%d %d %d %d %d %d",
              AUDIT_ARCH_I386, regs->orig_eax, regs->ebx,
              regs->ecx, regs->edx, regs->esi);

    The audit code would use the marker_probe_register/marker_arm to
    wrap its existing audit_syscall_entry() function; the systemtap
    user would 'probe kernel.mark("audit_i386_syscall_entry") { $1
    ... $5 }'.

    This would be a net performance loss to the audit side if auditd
    was running; a performance tie without auditd; and would allow
    systemtap to only see already audit-marked processes.

(a) outside the conditional

    if (!entryexit) /* entry as opposed to exit */
       trace_mark (audit_syscall_entry, "%d %d %d %d %d %d",
              AUDIT_ARCH_i386, regs->orig_eax, regs->ebx,
              regs->ecx, regs->edx, regs->esi);

    The audit-side marker backend would then contain the
        if (unlikely(current->audit_context))
           audit_syscall_entry(... incoming marker params ...)
    test and call.

    This could make a performance gain for the kernel if auditd is not
    running, since a single systemwide dormant marker should be
    cheaper to bypass than a per-task field fetch!  It also lets
    systemtap users of the marker see all processes' syscalls, even if
    auditd is not running, so if the audit context is not set.

    If auditd is running (and it attaches to the marker), it would
    suffer an additional indignity, er, indirection, but run otherwise
    unaffected.


Note that this is a low-level hook, in that the system call arguments
are passed onward as simple integers/pointers.  A separate level in
the audit code (auditsc.c) performs semantic decoding and trace record
formatting of syscall arguments/results.  It would be nice to somehow
share some of this code with systemtap, since its result is similar to
the current tapset argstr computations.  Let's leave this aspect to
followup work.  In the mean time, the systemtap tapset code can do
exactly the same decoding as it does now, but based on marker $arg1
context variables instead of dwarf-level ones.


David/Steve, does this sound interesting enough to explore in code?


- FChE


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]