This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi - > [...] > - self->xxx means xxx is a thread-local variable I'm unfond of the pointer syntax in the script language (see below), but this particular case can be mapped easily in the parser to an array index operation like "xxx[$pid]". > - $xxx is shorthand for values to be substituted by runtime library > functions or probe variables, similar to Perl interpolation We may need to consider a naming system that can be composed into richer identifiers. There are several types of variables to access: - "macros" like "$timestamp", which map to snippets of code - target-side variables: local (function parameters, locals), global - probe-side special variables like your "$syscall_name" > [...] > probe syscall:exit("read") > read_times[$syscall_name] > [...] I am aware of no plausible run-time library function that can return the name of the current system call. Rather, I imagine this sort of facility working by having a library of systemtap script fragments that provide definitions for probe points or helper variables: probe syscall("read") = kernel:function("sys_read") { self->syscall_name = "read" } and $pid = [[ in_interrupt () ? 0 : current->pid ]] # possible embedded C > [...] > Will this still work if count isn't a int value but say an int *? > self->my_count = *count; > Seems to - if jprobes is being used, it's just a straight pass-thru. Passing through in this sense concerns me. If the scripting language's type system is to remain as minimal and implicit as possible, then operations like pointer dereferences and especially structure accesses need to be represented and analyzed. (See more below.) > [...] > - To set up the probes, this example loops over each syscall and > registers the single probe handler for each one. [...] > It seems to me that we need a way to enable and disable > probes as needed or 'just in time'. For example, here's a probe that > we should be able to write: > > /* trace all functions called from open */ > probe syscall:entry("open") > { > self->trace_all = 1; > enable(*:entry(*)); /* enable probes on _all_ functions */ > } I don't know if this will be possible. Among other reasons we discussed yesterday, "all functions" in the kernel is far too wide a net. If instrumentation were to be inserted anew every time, imagine the thousands of pages of kernel text being modified, when any process runs "open". Else if breakpoints were inserted en masse at startup time, and enabled/disabled by having them each execute some predicate, overall performance would still come to a crawl. > [...] It should support the print() function from probe handlers, > and it should also support queries from userspace applications > such that they can retrieve data from the probe at any time [...] > a simple protocol built on top of netlink seems to me to be the > best fit. [...] I wonder what sort of tool would want to extract data piecemeal like this. Are you imagining someone actually writing some user-level C code to pull out data snapshots from a specific running probe? I wonder if this situation is likely to become common enough to warrant a two-way API. By the way, one reason I prototyped that /proc-based data snapshot mechanism that way was in recognition of the problem of consistency. It suspends the probes, takes a snapshot of all global variables during the incoming open() syscall. It then lets the probes run again and streams the textual snapshot out during subsequent read()'s. The snapshot is thrown away at close(). If, as is likely, multiple pieces of data need to be pulled out of the probes, it is important that those pieces be consistent with each other: that they correspond to a locked snapshot taken at the same instant. Being able to pull out just one variable at a time would make this property achievable only if it involved long-term suspension of probe data collection between the adjacent pull operations. > [...] Notes: - the main problem this probe illustrates is that it's > not yet clear how to access data represented by composite data > types, or how to handle types like atomic_t which need to use an > accessor function. The location and size of the struct members is > known from dwarf2 info, but how do we seamlessly access and use it > in the probe? Indeed, this is one of the big open holes in the design. It would be great if someone came up with a notation and execution algorithm that requires neither psychic abilities within the translator, nor any excessive presence of C typing declarations within the script. Maybe cleverly embedded C code could do the trick, as long as it is hidden out of the way in installed script libraries. These might be presented to user scripts in a functional notation, like file$f_dentry$d_inode (filp) that maps to C code such as ({ struct file* f = $filp; struct dentry *d; check (f) ? (d = f->f_dentry, (check (d) ? d->d_inode: 0) : 0); }) But hand-writing a multitude of these is too much. We need to think of a way of expressing dwarf type/expression evaluation, without having to be totally explicit. > - What about looping over external lists e.g. starting with > list_head? That's much the same problem. - FChE
Attachment:
pgp00000.pgp
Description: PGP signature
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |