SystemTap 3.2 includes an early prototype of SystemTap's new BPF backend (stapbpf). It represents a first step towards leveraging powerful new tracing and performance analysis capabilities recently added to the Linux kernel. In this post I will compare the translation process of stapbpf with the default backend (stap) and compare some differences in functionality between these two backends.

Stap and stapbpf share common parsing and semantic analysis stages. As input for translation, they both receive data structures representing a parse tree of the script, complete with type information and references to the definitions of all variables and functions. To see a summary of this information, the '-p2' option can be used with the stap command.

$ cat sample.stp
probe kernel.function("sys_read") { printf("hi from sys_read!\n"); exit() }

$ stap -p2 sample.stp
# functions
exit:unknown ()
kernel.function("SyS_read@fs/read_write.c:542") /* pc=_stext+0x273da0 */ /* <- kernel.function("SyS_read@fs/read_write.c:542") */

$ stap -p2 --runtime=bpf sample.stp
# functions
_set_exit_status:long ()
exit:unknown ()
# probes
kernel.function("SyS_read@fs/read_write.c:542") /* pc=_stext+0x273da0 */ /* <- kernel.function("SyS_read@fs/read_write.c:542") */

You can see that stapbpf's exit function involves an additional call to _set_exit_status but otherwise the two backends are probing the exact same location.

From this point, the translation processes diverge. Stap's goal is to convert the script into a kernel module. To accomplish this, stap translates the parse tree into the C source code of the desired kernel module. At runtime, GCC is used to compile this source code into the actual kernel module. The '-p4' option can be used with the stap command to produce the kernel object file.

# stap -p4 sample.stp
[...]_1316.ko
# staprun [...]_1316.ko
hi from sys_read!

Instead of C, stapbpf translates the script directly into BPF bytecode to be executed by an in-kernel virtual machine. The bytecode is then stored in a BPF-ELF file intended for use by the stapbpf runtime.

# stap -p4 --runtime=bpf sample.stp
stap_1348.bo
# stapbpf stap_1348.bo
hi from sys_read!

Unlike stap's kernel modules, producing the BPF bytecode requires no external compiler. This helps keep stapbpf's compile times and installation footprint low. With the '-v' option we can see the duration of each stage of translation.

# stap -v -p4 sample.stp
[...]
Pass 3: translated to C [...] in 0usr/0sys/4real ms.
Pass 4: compiled C [...] in 1330usr/310sys/1559real ms.

# stap -v -p4 --runtime=bpf sample.stp
[...]
Pass 4: compiled BPF into "stap_3792.bo" in 0usr/0sys/0real ms.

Notice that pass 3 and 4 takes 1563ms for stap but <1ms for stapbpf (which combines pass 3 and 4 into a single pass).

When loading BPF bytecode programs into the kernel, they are first checked for safety by a verifier inside the kernel. It checks for undesirable behaviors such as out of bound jumps, out of bounds stack loads/stores and reads from uninitialized addresses. It also checks for the presence of unreachable instructions and infinite loops. Any BPF program which does not pass the verification will not be loaded into the BPF virtual machine. Although the default stap is held to similar standards and is known to be very safe to use, stapbpf has the advantage of inheriting BPF's simpler security model.

However this advantage does come with some trade-offs. For example, BPF does not support writing to kernel memory. Although stap disables this capability by default, it does provide a "guru mode" that acts as an escape hatch for the user who wishes to have this level of control over their operating system. This means that stapbpf does not share stap's ability to, for example, administer security band-aids to a live system. Even more restricting is that the verifier rejects any program with loops. While it would be possible for stapbpf to unwind loops, BPF also imposes a limit of 4096 instructions per program.

# stap --runtime=bpf contains_loops.stp
Error loading /tmp/stapxSM7Kg/stap_8316.bo: bpf program load failed: Invalid argument
[...]
Pass 5: run failed.

# stap --runtime=bpf too_many_insns.stp
Error loading /tmp/stapqxRXi4/stap_8432.bo: bpf program load failed: Argument list too long
[...]
Pass 5: run failed.

The following table is a summary comparing stap and stapbpf. Features which BPF permits but are not yet implemented in stapbpf are indicated with 'possible'.

stap

stapbpf

non-blocking probe handlers

yes

yes

protected probe execution environment

yes

yes

lock-protected global variables

per probe locking

per operation locking

kprobes (DWARF)

yes

yes

kprobes (DWARF-less)

yes

possible

uprobes

yes

possible

tracepoints

yes

possible

timer-based probing

yes

possible

probe dynamically loaded kernel objects

yes

possible

able to change state in probed program

yes

possible (userspace only)

means available to bypass protection for advanced users

yes

no

loop support (for, while, foreach)

yes

no

string support (variables, literals)

yes

limited*

probe handler length limit

1000 statements

4096 instructions

means available to increase handler length limit

yes

no

kernel verifies safety of program

no

yes

* There is support for printf's format string literal.

It can be seen that stapbpf is able to provide only a subset of stap's functionality. However for systems whose security policies either preclude the full kernel module backend or require software with a security model simpler than stap's, stapbpf aims to provide a convenient way to utilize this subset.

None: stapstapbpfComparison (last edited 2017-10-31 16:25:41 by AaronMerey)