This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
[PATCH -tip v7 24/26] kprobes: Enlarge hash table to 4096 entries
- From: Masami Hiramatsu <masami dot hiramatsu dot pt at hitachi dot com>
- To: linux-kernel at vger dot kernel dot org, Ingo Molnar <mingo at kernel dot org>
- Cc: Ananth N Mavinakayanahalli <ananth at in dot ibm dot com>, Sandeepa Prabhu <sandeepa dot prabhu at linaro dot org>, Frederic Weisbecker <fweisbec at gmail dot com>, x86 at kernel dot org, Steven Rostedt <rostedt at goodmis dot org>, fche at redhat dot com, mingo at redhat dot com, systemtap at sourceware dot org, "H. Peter Anvin" <hpa at zytor dot com>, Thomas Gleixner <tglx at linutronix dot de>
- Date: Thu, 27 Feb 2014 16:34:14 +0900
- Subject: [PATCH -tip v7 24/26] kprobes: Enlarge hash table to 4096 entries
- Authentication-results: sourceware.org; auth=none
- References: <20140227073315 dot 20992 dot 6174 dot stgit at ltc230 dot yrl dot intra dot hitachi dot co dot jp>
Currently, since the kprobes expects to be used
with less than 100 probe points, its hash table
just has 64 entries. This is too little to handle
several thousands of probes.
Enlarge this to 4096 entires which just consumes
32KB (on 64bit arch) for better scalability.
Without this patch, enabling 17787 probes takes
more than 2 hours! (9428sec, 1 min intervals for
each 2000 probes enabled)
Enabling trace events: start at 1392782584
0 1392782585 a2mp_chan_alloc_skb_cb_38556
1 1392782585 a2mp_chan_close_cb_38555
....
17785 1392792008 lookup_vport_34987
17786 1392792010 loop_add_23485
17787 1392792012 loop_attr_do_show_autoclear_23464
I profiled it and saw that more than 90% of
cycles are consumed on get_kprobe.
Samples: 18K of event 'cycles', Event count (approx.): 37759714934
+ 95.90% [k] get_kprobe
+ 0.76% [k] ftrace_lookup_ip
+ 0.54% [k] kprobe_trace_func
And also more than 60% of executed instructions
were in get_kprobe too.
Samples: 17K of event 'instructions', Event count (approx.): 1321391290
+ 65.48% [k] get_kprobe
+ 4.07% [k] kprobe_trace_func
+ 2.93% [k] optimized_callback
And annotating get_kprobe also shows the hlist
is too long and takes a time on tracking it.
| struct hlist_head *head;
| struct kprobe *p;
|
| head = &kprobe_table[hash_ptr(addr, KPROBE_HASH_BITS)];
| hlist_for_each_entry_rcu(p, head, hlist) {
86.33 | mov (%rax),%rax
11.24 | test %rax,%rax
| jne 60
| if (p->addr == addr)
| return p;
| }
With this fix, enabling 20,000 probes just takes
40 min (2303 sec, 1 min intervals for
each 2000 probes enabled)
Enabling trace events: start at 1392794306
0 1392794307 a2mp_chan_alloc_skb_cb_38556
1 1392794307 a2mp_chan_close_cb_38555
....
19997 1392796603 nfs4_negotiate_security_12119
19998 1392796603 nfs4_open_confirm_done_11767
19999 1392796603 nfs4_open_confirm_prepare_11779
And it reduced cycles on get_kprobe (with 20,000 probes).
Samples: 5K of event 'cycles', Event count (approx.): 4540269674
+ 68.77% [k] get_kprobe
+ 8.56% [k] ftrace_lookup_ip
+ 3.04% [k] kprobe_trace_func
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---
kernel/kprobes.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index abdede5..302ff42 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -54,7 +54,7 @@
#include <asm/errno.h>
#include <asm/uaccess.h>
-#define KPROBE_HASH_BITS 6
+#define KPROBE_HASH_BITS 12
#define KPROBE_TABLE_SIZE (1 << KPROBE_HASH_BITS)