This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
[Fwd: Optimize global stap variable for further performance improvement(~8%)]
- From: Li Guanglei <guanglei at cn dot ibm dot com>
- To: "systemtap at sourceware dot org" <systemtap at sourceware dot org>
- Date: Thu, 03 Aug 2006 22:33:48 +0800
- Subject: [Fwd: Optimize global stap variable for further performance improvement(~8%)]
- Organization: IBM CSTL
Hi,
Below is a mail discussing about improving the LKET's performance. I
used a multi-thread(8 threads) app which will call getsid() in a loop
running on a 4-way ppc64 box(8 logical CPUs)
The testing data shows that we need some additional optimization for
those read only global variables(Or those only be written in probe
begin/end). I searched mailinglist and found a topic about "global
constant":
http://sources.redhat.com/ml/systemtap/2006-q1/msg00487.html
So it seems to me there are two options:
<1> introduce "const" type as suggested by Mark McLoughlin
<2> if the translator finds a global variable is only written in probe
begin/end, then elides the rw_lock of this variable.
Any comments?
- Guanglei
-------- 原始信息 --------
主题: Optimize global stap variable for further performance
improvement(~8%)
日期: Thu, 03 Aug 2006 17:14:08 +0800
发件人: Li Guanglei <guanglei@cn.ibm.com>
组织: IBM CSTL
收件人: Jose Santos <jrs@us.ibm.com>
抄送: Jian Gui <guijian@cn.ibm.com>, Xue Peng Li <xuepengl@cn.ibm.com>
Hi,
The current HookID/GroupID are defined as a stap variable and a same
name prefixed with "_" is also defined with the same value to be used
by embedded c codes, e.g:
global
GROUP_SYSCALL,
HOOKID_SYSCALL_ENTRY, HOOKID_SYSCALL_RETURN,
...
%{
/* used in embedded c codes */
/* Group ID Definitions */
int _GROUP_SYSCALL = 2;
int _HOOKID_SYSCALL_ENTRY = 1;
int _HOOKID_SYSCALL_RETURN = 2;
...
%}
And the translator will assign each global variable a rw_lock.
Although these IDs will only be written in "probe begin" but each
probe handlers has to call "read_trylock":
while (! read_trylock (& global_HOOKID_SCSI_IOENTRY_lock)&&
(++numtrylock < MAXTRYLOCK))
ndelay (TRYLOCKDELAY);
Although a read lock won't contention with each other, but my test
shows removing this read lock will have a improvement of ~8%
Here is the testing data:
===== With Data Transfer ======
Original LKET
4254
Modified LKET without using global HookID/GroupID stap variable:
3930
~7.62% improvement
====== Without data transfer ====
Original LKET:
3699
Modified LKET without using global HookID/GroupID stap variable:
3332
~9.9% improvement
So we should start to eliminate all the global hookid/groupid stap
variables.
- Guanglei