This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Hashtable


Frank, David,

I appreciate your patience with me, my crazy ideas, and often very
naive questions.

On Tue, Jul 18, 2017 at 5:13 PM, Frank Ch. Eigler <fche@redhat.com> wrote:
>
> arkady.miasnikov wrote:
>
>
>> I am trying to implement full bypass of the code generated by the
>> SystemTap in some special cases. [...]
>
> OK, interesting.
>
>
>> My goal is to be able to write code like this:
>> probe syscall.dup2
>> {   /* bypass */  /* arg: oldfd */
>>     printk(KERN_ALERT "tid=%d oldfd=%d\n", current->pid, l->l_oldfd);
>> // Unmanaged C code
>> }
>
> So like an embedded-C probe handler?

I am still working on it. I have changed the goal somewhat:

The syntax I am trying to implement is something like
cprobe syscall.dup2
%{
    printk("%d", STAP_ARG_oldfd);
%}

where "cprobe" is a new keyword
This is a patch (not  a production code)
https://github.com/larytet/SystemTap/commit/aa3de76dc11a94dcd0456d493e381bc69bcddb16

>
>
>> [...]  When generating the code I add the relevant dwarf_tvar_get_
>> calls and initialize "local" variables in the context
>> structure. Because the probes I am targeting are very short and simple
>> and do not involve nested calls I shortcut lot of variables copying.
>
> How much more efficient can this get, beyond
>
>     function foo (l) %{
>         STAP_PRINTF ("tid=%d oldfd=%d\n", current->pid, STAP_ARG_l);
>     %}
>     probe syscall.dup2 { foo ($oldfd) }
>
> ?  Which parts of the latter can we elide?  Could we improve the
> translator so that this elision can be done for general systemtap
> scripts, not just your special case?
>

I have a very specific system test which I shall stand. The test looks
like bash "while [ 1 ]; do echo >> filename;done" running on at least
4 cores. With empty probes (STAP_ALIBI, for example) the performance
impact of the 10 open/read/write/close + return probes is ~8%. The
performance impact of a single prove is negligible. The problems arise
when there are many probes concurrently executed on multicore
machines.

Ten simple probes which call a function adding a value to a map have
performance overhead in 30-35% range. I replaced the maps with the
custom hashtable and it helped to reduce the impact by third to 20-25%
range. This is better, but my target is obviously 10%-15% range
(double of a bare minimum 8%).
This is how a typical probe looks like
https://github.com/larytet/lockfree_hashtable/blob/master/dup_probe.stp#L429

probe syscall.dup
{
   tid = tid()
   hashtable_u32_insert(tid, oldfd)
}

probe syscall.dup
{
   tid = tid()
   oldfd = hashtable_u32_remove(tid)
   write_event_to_shared_memory(tid, odlfd)
}

I checked the C code generated by the translate.cxx and the assembler
generated by GCC. The end result is very good considering the
non-trivial task of handling a script language. In my specific case it
can be improved. For example, I do not need lot of stack memory and I
can remove all c->locals[c->nesting] . I can not do it for all my
probes, but for system calls probes I probably can. I can replace
dwarf_tvar_get() with inline functions and help GCC to optimize out
temporary variables, I can drop atomic_read/atomic_write,  and so on.
In the perfect world I would like to see a probe with 10 lines in C
and under 200 hot (in cache) opcodes.

I have stepped through the elaborate.cxx and specifically
semantic_pass().  It is very well written. I am very impressed. This
is a non trivial parser implemented in C with high level code
optimization.  This is an outstanding work. The code is very clean.
Unfortunately I do not see fast&dirty ways to improve the code besides
a special syntax for "unmanaged" probes in C. Which brought me to the
"cprobe syscall.dup2" syntax.

I have a very specific task - roughly 100 probes which cover most of
the system calls. Most of the probes are very short. I have to run
smoothly and remain under radar on multicore systems. Because the
system collects lot of information across many different kernels any
custom Linux driver will be a pain. SystemTap solves 95% of what I
need. I need just one small step forward.

>
> - FChE


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]