5 Language elements

5.1 Identifiers

Identifiers are used to name variables and functions. They are an alphanumeric sequence that may include the underscore (_) and dollar sign ($) characters. They have the same syntax as C identifiers, except that the dollar sign is also a legal character. Identifiers that begin with a dollar sign are interpreted as references to variables in the target software, rather than to SystemTap script variables. Identifiers may not start with a plain digit.

5.2 Data types

The SystemTap language includes a small number of data types, but no type declarations. A variable’s type is inferred from its use. To support this inference, the translator enforces consistent typing of function arguments and return values, array indices and values. There are no implicit type conversions between strings and numbers. Inconsistent type-related use of an identifier signals an error.

5.2.1 Literals

Literals are either strings or integers. Literal integers can be expressed as decimal, octal, or hexadecimal, using C notation. Type suffixes (e.g., L or U) are not used.

5.2.2 Integers

Integers are decimal, hexadecimal, or octal, and use the same notation as in C. Integers are 64-bit signed quantities, although the parser also accepts (and wraps around) values above positive 263 but below 264.

5.2.3 Strings

Strings are enclosed in quotation marks (“string”), and pass through standard C escape codes with backslashes. A string literal may be split into several pieces, which are glued together, as follows.

     str1 = "foo" "bar"
       /* --> becomes "foobar" */
     
     str2 = "a good way to do a multi-line\n"
            "string literal"
       /* --> becomes "a good way to do a multi-line\nstring literal" */
     
     str3 = "also a good way to " @1 " splice command line args"
       /* --> becomes "also a good way to foo splice command line args",
          assuming @1 is given as foo on the command line */

Observe that script arguments can also be glued into a string literal.

Strings are limited in length to MAXSTRINGLEN. For more information about this and other limits, see Section 1.6.

5.2.4 Associative arrays

See Section 7

5.2.5 Statistics

See Section 8

5.3 Semicolons

The semicolon is the null statement, or do nothing statement. It is optional, and useful as a separator between statements to improve detection of syntax errors and to reduce ambiguities in grammar.

5.4 Comments

Three forms of comments are supported, as follows.

     # ... shell style, to the end of line
     // ... C++ style, to the end of line
     /* ... C style ... */

5.5 Whitespace

As in C, spaces, tabs, returns, newlines, and comments are treated as whitespace. Whitespace is ignored by the parser.

5.6 Expressions

SystemTap supports a number of operators that use the same general syntax, semantics, and precedence as in C and awk. Arithmetic is performed per C rules for signed integers. If the parser detects division by zero or an overflow, it generates an error. The following subsections list these operators.

5.6.1 Binary numeric operators

* / % + - >> >>> << & ^ | && ||

5.6.2 Binary string operators

. (string concatenation)

5.6.3 Numeric assignment operators

= *= /= %= += -= >>= <<= &= ^= |=

5.6.4 String assignment operators

= .=

5.6.5 Unary numeric operators

+ - ! ~ ++ --

5.6.6 Numeric & string comparison, regular expression matching operators

< > <= >= == != =~ !~

The =~ and !~ operators perform regular expression matching. The second operand must be a string literal containing a syntactically valid regular expression. The =~ operator returns 1 on a successful match and 0 on a failed match. The !~ operator returns 1 on a failed match. The regular expression syntax supports most of the features of POSIX Extended Regular Expressions, except for subexpression reuse (\1) functionality. After a successful match, the matched substring and subexpressions can be extracted using the matched tapset function. The ngroups tapset function returns the number of subexpressions in the last successfully matched regular expression.

5.6.7 Ternary operator

cond ? exp1 : exp2

5.6.8 Grouping operator

( exp )

5.6.9 Function call

General syntax:

fn ([ arg1, arg2, ... ])

5.6.10 $ptr->member

ptr is a kernel pointer available in a probed context.

5.6.11 Pointer typecasting

Typecasting is supported using the @cast() operator. A script can define a pointer type for a long value, then access type members using the same syntax as with $target variables. After a pointer is saved into a script integer variable, the translator loses the necessary type information to access members from that pointer. The @cast() operator tells the translator how to read a pointer.

The following statement interprets p as a pointer to a struct or union named type_name and dereferences the member value:

     @cast(p, "type_name"[, "module"])->member

The optional module parameter tells the translator where to look for information about that type. You can specify multiple modules as a list with colon (:) separators. If you do not specify the module parameter, the translator defaults to either the probe module for dwarf probes or to kernel for functions and all other probe types.

The following statement retrieves the parent PID from a kernel task_struct:

     @cast(pointer, "task_struct", "kernel")->parent->tgid

The translator can create its own module with type information from a header surrounded by angle brackets (< >) if normal debugging information is not available. For kernel headers, prefix it with kernel to use the appropriate build system. All other headers are built with default GCC parameters into a user module. The following statements are examples.

     @cast(tv, "timeval", "<sys/time.h>")->tv_sec
     @cast(task, "task_struct", "kernel<linux/sched.h>")->tgid

In guru mode, the translator allows scripts to assign new values to members of typecasted pointers.

Typecasting is also useful in the case of void* members whose type might be determinable at run time.

     probe foo {
        if ($var->type == 1) {
           value = @cast($var->data, "type1")->bar
        } else {
           value = @cast($var->data, "type2")->baz
        }
        print(value)
     }

5.6.12 <value> in <array_name>

This expression evaluates to true if the array contains an element with the specified index.

5.6.13 [ <value>, ... ] in <array_name>

The number of index values must match the number of indexes previously specified.

5.7 Literals passed in from the stap command line

Literals are either strings enclosed in double quotes (” ”) or integers. For information about integers, see Section 5.2.2. For information about strings, see Section 5.2.3.

Script arguments at the end of a command line are expanded as literals. You can use these in all contexts where literals are accepted. A reference to a nonexistent argument number is an error.

5.7.1 $1 … $<NN> for literal pasting

Use $1 $<NN> for pasting the entire argument string into the input stream, which will be further lexically tokenized.

5.7.2 @1 … @<NN> for strings

Use @1 … @<NN> for casting an entire argument as a string literal.

5.7.3 Examples

For example, if the following script named example.stp

     probe begin { printf("%d, %s\n", $1, @2) }

is invoked as follows

     # stap example.stp ’5+5’ mystring

then 5+5 is substituted for $1 and ”mystring” for @2. The output will be

     10, mystring

5.8 Conditional compilation

5.8.1 Conditions

One of the steps of parsing is a simple preprocessing stage. The preprocessor supports conditionals with a general form similar to the ternary operator (Section 5.6.7).

     %( CONDITION %? TRUE-TOKENS %)
     %( CONDITION %? TRUE-TOKENS %: FALSE-TOKENS %)

The CONDITION is a limited expression whose format is determined by its first keyword. The following is the general syntax.

     %( <condition> %? <code> [ %: <code> ] %)

5.8.2 Conditions based on available target variables

The predicate @defined() is available for testing whether a particular $variable/expression is resolvable at translation time. The following is an example of its use:

       probe foo { if (@defined($bar)) log ("$bar is available here") }

5.8.3 Conditions based on kernel version: kernel_v, kernel_vr

If the first part of a conditional expression is the identifier kernel_v or kernel_vr, the second part must be one of six standard numeric comparison operators “<”, “<=”, “==”, “!=”, “>”, or “>=”, and the third part must be a string literal that contains an RPM-style version-release value. The condition returns true if the version of the target kernel (as optionally overridden by the -r option) matches the given version string. The comparison is performed by the glibc function strverscmp.

kernel_v refers to the kernel version number only, such as “2.6.13”.

kernel_vr refers to the kernel version number including the release code suffix, such as “2.6.13-1.322FC3smp”.

5.8.4 Conditions based on architecture: arch

If the first part of the conditional expression is the identifier arch which refers to the processor architecture, then the second part is a string comparison operator ”==” or ”!=”, and the third part is a string literal for matching it. This comparison is a simple string equality or inequality. The currently supported architecture strings are i386, i686, x86_64, ia64, s390, and powerpc.

5.8.5 Conditions based on privilege level: systemtap_privilege

If the first part of the conditional expression is the identifier systemtap_privilege which refers to the privilege level the systemtap script is being compiled with, then the second part is a string comparison operator ”==” or ”!=”, and the third part is a string literal for matching it. This comparison is a simple string equality or inequality. The possible privilege strings to consider are "stapusr" for unprivileged scripts, and "stapsys" or "stapdev" for privileged scripts. (In general, to test for a privileged script it is best to use != "stapusr".)

This condition can be used to write scripts that can be run in both privileged and unprivileged modes, with additional functionality made available in the privileged case.

5.8.6 True and False Tokens

TRUE-TOKENS and FALSE-TOKENS are zero or more general parser tokens, possibly including nested preprocessor conditionals, that are pasted into the input stream if the condition is true or false. For example, the following code induces a parse error unless the target kernel version is newer than 2.6.5.

     %( kernel_v <= "2.6.5" %? **ERROR** %) # invalid token sequence

The following code adapts to hypothetical kernel version drift.

     probe kernel.function (
         %( kernel_v <= "2.6.12" %? "__mm_do_fault" %:
             %( kernel_vr == "2.6.13-1.8273FC3smp" %? "do_page_fault" %: UNSUPPORTED %)
         %)) { /* ... */ }
     
     %( arch == "ia64" %?
         probe syscall.vliw = kernel.function("vliw_widget") {}
     %)
     

The following code adapts to the presence of a kernel CONFIG option.

     %( CONFIG_UTRACE == "y" %?
         probe process.syscall {}
     %)

5.9 Preprocessor macros

This feature lets scripts eliminate some types of repetition.

5.9.1 Local macros

The preprocessor also supports a simple macro facility.

Macros taking zero or more arguments are defined using the following construct:

     @define NAME %( BODY %)
     @define NAME(PARAM_1, PARAM_2, ...) %( BODY %)

Macro arguments are referred to in the body by prefixing the argument name with an @ symbol. Likewise, once defined, macros are invoked by prefixing the macro name with an @ symbol:

     @define foo %( x %)
     @define add(a,b) %( ((@a)+(@b)) %)
     
        @foo = @add(2,2)

Macro expansion is currently performed in a separate pass before conditional compilation. Therefore, both TRUE- and FALSE-tokens in conditional expressions will be macroexpanded regardless of how the condition is evaluated. This can sometimes lead to errors:

     // The following results in a conflict:
     %( CONFIG_UTRACE == "y" %?
         @define foo %( process.syscall %)
     %:
         @define foo %( **ERROR** %)
     %)
     
     // The following works properly as expected:
     @define foo %(
       %( CONFIG_UTRACE == "y" %? process.syscall %: **ERROR** %)
     %)

The first example is incorrect because both @defines are evaluated in a pass prior to the conditional being evaluated.

5.9.2 Library macros

Normally, a macro definition is local to the file it occurs in. Thus, defining a macro in a tapset does not make it available to the user of the tapset.

Publically available library macros can be defined by including .stpm files on the tapset search path. These files may only contain @define constructs, which become visible across all tapsets and user scripts.