This is the mail archive of the
binutils@sources.redhat.com
mailing list for the binutils project.
Re: [PATCH] Use hashtab.[ch] hashtable for IA-64 loc_hash_table
Hi Jakub,
>>>>> On Thu, 11 Sep 2003 19:59:02 +0200, Jakub Jelinek <jakub@redhat.com> said:
Jakub> On Wed, Sep 10, 2003 at 10:29:41AM -0700, David Mosberger wrote:
>> Each histogram sample counts as 1.00056m seconds
>> % time self cumul calls self/call tot/call name
>> 17.90 0.88 0.88 170k 5.21u 5.60u get_dyn_sym_info
>> 9.69 0.48 1.36 1.05M 457n 494n vfprintf
>> 7.24 0.36 1.72 1.79M 200n 201n sec_merge_hash_lookup
>> 5.98 0.30 2.01 5.93M 49.7n 49.7n _IO_str_overflow
>> get_dyn_sym_info() is ia64-specific and looks like it's doing a linear
>> list-traversal. Seems like a fairly obvious candidate for
>> optimization.
>> The time spent in vfprintf and _IO_str_overflow is also interesting:
>> those come from get_local_sym_hash(), which does:
>> sprintf (addr_name, "%x:%lx", ..)
>> Surely we can do better than that... ;-)
Jakub> The following patch gets rid of that sprintf (and generally,
Jakub> using string keys when the keys are really pair of two
Jakub> (typically small) integers).
Cool!
Jakub> It compiled, no make check regressions and built glibc.
Jakub> Unfortunately, I don't have access to sufficiently idle
Jakub> system to be able to do any useful benchmarking.
Happy to help with that: it shaved another 2 seconds out of the
"minium kernel rebuild"!
$ time make CROSS_COMPILE=/opt/gcc-pre3.4/bin/ vmlinux
make[1]: `arch/ia64/kernel/asm-offsets.s' is up to date.
CHK include/linux/compile.h
CPP arch/ia64/kernel/gate.lds.s
AS arch/ia64/kernel/gate.o
GATE arch/ia64/kernel/gate.so
AS arch/ia64/kernel/gate-data.o
GATE arch/ia64/kernel/gate-syms.o
LD arch/ia64/kernel/built-in.o
CC kernel/configs.o
LD kernel/built-in.o
GEN .version
CHK include/linux/compile.h
UPD include/linux/compile.h
CC init/version.o
LD init/built-in.o
LD .tmp_vmlinux1
KSYM .tmp_kallsyms1.S
AS .tmp_kallsyms1.o
LD .tmp_vmlinux2
KSYM .tmp_kallsyms2.S
AS .tmp_kallsyms2.o
LD vmlinux
real 0m23.148s
user 0m16.874s
sys 0m1.879s
The link-time still dominates, but it's getting to the point where I
need figure out why gate.so and gate-syms.o get rebuilt each time...
In terms of the linker's execution profile, here is where things stand
now (this is still with hash.c:DEFAULT_SIZE set to 65521):
% time self cumul calls self/call tot/call name
25.34 0.87 0.87 124k 7.05u 7.15u get_dyn_sym_info
10.29 0.36 1.23 1.75M 203n 204n sec_merge_hash_lookup
5.22 0.18 1.41 3.90M 46.2n 46.2n ld:__umoddi3
4.32 0.15 1.56 - - - elf64_ia64_relocate_sect
ion
4.00 0.14 1.70 524k 263n 271n elf64_ia64_global_dyn_sy
m_thunk
The call-graph info for get_dyn_sym_info() isn't terribly interesting,
because most of the time is spent in get_dyn_sym_info() itself (again,
I suspect this is due to the linear list traversal).
The call-graph info for sec_merge_hash_lookup() is more useful:
% time self children called
0.149 840u 736k sec_merge_add
0.206 1.16m 1.02M _bfd_merged_section_offset
11.2 0.355 2.00m 1.75M sec_merge_hash_lookup
0.117 0.00 1.91M/1.98M memcmp
2.00m 0.00 80.9k/80.9k sec_merge_hash_newfunc
82.6m 0.00 1.79M/3.90M ld:__umoddi3
So sec_merge_hash_lookup() burns 0.355 seconds on it's own, and
another 0.117 seconds through calls to memcmp(). The other calls are
relatively cheap (2msec for sec_merge_hash_new_func and 82.6msec for
ld:__umoddi3).
--david